mr_Ray
05-23-2005, 03:12 AM
Can anyone help with this?
I've made myself a local dump of a significant fraction of wikipedia content (500,000+ HTML files, ~3GB) and am having a problem making an iSilo document from it.
What happens is that after running for an hour or two it just stops, showing '0K - error' and no file is produced, and no error message. I tried turning on logging to see if I could track down the problem, but it just shows a long stream of missing links (not all files are present) and nothing else. It appears from the log to be going through in alphabetical order, and seems to be failing at around the G* mark.
Setup: I'm not asking it to do any extra conversion/formatting tasks, and is working from an alphabetical index file which in turns links to other indexes for each letter. It's set to follow 2 links deep, so it should be picking everything up. I originally tried with iSilox v4.2, and tried upgrading to 4.26 with no improvement.
I don't think it's having a specific problem with a certain file since I've tried it a few times and it stops in dofferent places. What is worth mentioning is that at the point it fails, task manager shows it using 903MB RAM, which stays in use even after the conversion fails until I restart iSiloX. I'm running on a pretty high spec PC with 2GB RAM, and tens of GB free disc space on both the temp and source/destination drives, so I doubt that it's a lack of resources.
So.... any idea where the problem could be? Yes, I'm aware that this is asking a lot of it to convert the majority of the content for the world's largest encyclopedia, but I remain hopeful.
I've made myself a local dump of a significant fraction of wikipedia content (500,000+ HTML files, ~3GB) and am having a problem making an iSilo document from it.
What happens is that after running for an hour or two it just stops, showing '0K - error' and no file is produced, and no error message. I tried turning on logging to see if I could track down the problem, but it just shows a long stream of missing links (not all files are present) and nothing else. It appears from the log to be going through in alphabetical order, and seems to be failing at around the G* mark.
Setup: I'm not asking it to do any extra conversion/formatting tasks, and is working from an alphabetical index file which in turns links to other indexes for each letter. It's set to follow 2 links deep, so it should be picking everything up. I originally tried with iSilox v4.2, and tried upgrading to 4.26 with no improvement.
I don't think it's having a specific problem with a certain file since I've tried it a few times and it stops in dofferent places. What is worth mentioning is that at the point it fails, task manager shows it using 903MB RAM, which stays in use even after the conversion fails until I restart iSiloX. I'm running on a pretty high spec PC with 2GB RAM, and tens of GB free disc space on both the temp and source/destination drives, so I doubt that it's a lack of resources.
So.... any idea where the problem could be? Yes, I'm aware that this is asking a lot of it to convert the majority of the content for the world's largest encyclopedia, but I remain hopeful.