AWS
01-05-2004, 07:09 AM
This is a general question concerning the creation of an iSilo document...
I know that specifying a link depth greater than 2 or 3 can create a large document but I have had some success in capturing complete web sites by specifying that offsite links not be followed. However, I have been working on one particular site with a link depth of 6, no off site links to be included. Now, after over 12 hours of "Retrieving Resources", I am wondering how much longer this is going to take, and, how large will the file become.
My general question is this:
When creating an iSilo document from a site with the following structure...
index.htm
|
-products.htm
|
-support.htm
|
-contact.htm
|
-images folder
If there is a common image (such as a logo) used by all three pages and the image file resides in the images folder, does iSilo import a copy of the image 4 times (once for each page captured) or is it intelligent enough to copy the image once and reference it in each of the 4 pages?
More importantly, if iSilo is set for a link depth of 2, and each page in the above structure has a sidebar menu that links to all pages in the site, does isilo import duplicate copies of each page for each link in all 2nd level sidebars?
Another twist on this scenario is how iSilo handles the parsing of pages with "bread crumbs". The line of links on the top of many site's pages which show the path taken to the current page using a list of links such as:
iSilo forum > Support > iSiloX and iSiloXC
In such a case where there are 10 pages under "iSiloX and iSiloXC" and each has a "Support" link on it, does iSiloX link to the "Support" page and import it 10 times or only once?
I ask these questions because in watching iSiloX take over 12 hours "Retrieving Resources" for my current project (which uses such a design) I am sure I have seen the same URL being retrieved many times.
Does iSiloX retrieve the multilple references and then sort and clean up the repetitive pages? Or will the resulting file grow exponentially because there are dozens, if not hundreds, of identical pages compressed into the final file?
AWS
I know that specifying a link depth greater than 2 or 3 can create a large document but I have had some success in capturing complete web sites by specifying that offsite links not be followed. However, I have been working on one particular site with a link depth of 6, no off site links to be included. Now, after over 12 hours of "Retrieving Resources", I am wondering how much longer this is going to take, and, how large will the file become.
My general question is this:
When creating an iSilo document from a site with the following structure...
index.htm
|
-products.htm
|
-support.htm
|
-contact.htm
|
-images folder
If there is a common image (such as a logo) used by all three pages and the image file resides in the images folder, does iSilo import a copy of the image 4 times (once for each page captured) or is it intelligent enough to copy the image once and reference it in each of the 4 pages?
More importantly, if iSilo is set for a link depth of 2, and each page in the above structure has a sidebar menu that links to all pages in the site, does isilo import duplicate copies of each page for each link in all 2nd level sidebars?
Another twist on this scenario is how iSilo handles the parsing of pages with "bread crumbs". The line of links on the top of many site's pages which show the path taken to the current page using a list of links such as:
iSilo forum > Support > iSiloX and iSiloXC
In such a case where there are 10 pages under "iSiloX and iSiloXC" and each has a "Support" link on it, does iSiloX link to the "Support" page and import it 10 times or only once?
I ask these questions because in watching iSiloX take over 12 hours "Retrieving Resources" for my current project (which uses such a design) I am sure I have seen the same URL being retrieved many times.
Does iSiloX retrieve the multilple references and then sort and clean up the repetitive pages? Or will the resulting file grow exponentially because there are dozens, if not hundreds, of identical pages compressed into the final file?
AWS