PDA

View Full Version : Persistent <CR>'s


MSL
04-11-2005, 04:39 PM
I've tried all of the text options that make any sense, and I'm STILL not getting proper formatting from what appears to be a very SIMPLE and straight-forward set of HTML pages.

Here's the link:

http://www.ccel.org/d/dostoevsky/karamozov/htm/

Here's the problem:

ALEXEY Fyodorovitch
Karamazov was the third son of
Fyodor

Pavlovitch Karamazov, a
landowner well known in our
district in his

own day, and still remembered
among us owing to his gloomy
and

tragic death, which happened
thirteen years ago, and which
I shall

describe in its proper place.

As I hope the example above shows, sentences are broken by two(2) line-breaks or <CR>'s. This continues for the entire length of the book, effectively rendering it unreadable. The width is fine, and no horizontal scrolling is required.

iSilo
04-11-2005, 05:44 PM
Unfortunately there is no option in iSiloX for reformatting that text. Potentially you would need an option to remove all double line breaks except in the case where a space follows the double line break, in which case you replace the line break with a single line break.

jeremielariviere
04-12-2005, 10:19 AM
What Device, Resolution, and font size are you using?
Jeremie

hank
05-06-2005, 01:28 AM
Yeah, this has been a problem since line feeds were invented -- mainframte to DOS, DOS to Mac, it's always a nasty mess to clean up.

Doesn't make a difference what platform or font size you're using, unless your text just happens to wrap correctly for the screen you have, and if so it's not portable.

I download html to the Mac and clean the text up with Appleworks or Rixedit or TextSoap, sometimes all three, then use PorDiBle to move it to the Palm OS format. There are still problems with characters that won't render.

TextSoap, cleaning up HTML, saving as DOS Latin, after removing high ASCII, having fixed all the curly quotes, em and en dashes, and other grubbage, tends to leave a clean enough text file that PorDiBle won't complain about non-convertible characters in the output file.

Tee Dee Ous.

Better way, would be nice.

Promise of a true Regular Expression cleanup in the next TextSoap version is hopeful. If they do grep even as well as Microsoft does with their Word wildcards, it'll become much easier to clean up raggedy-ASCII-text.

Now back to iSilo -- you can download the HTML, clean up the HTML file then point iSiloX at the file on your hard drive. It's even more tedious (wry grin) than the above, but it does work fine if you want the iSilo output file.