PDA

View Full Version : Stripping ISO-8859-2 characters problem


Cezex
01-07-2004, 07:02 AM
Hi !

Could You tell me why iSiloXC is stripping ISO-8859-2 characters (i.e. ĘÓˇ¦ŁŻ¬ĆŃęó±¶łżĽćń) when I set <charset>none</charset> in document section or simply delete these tags ?

Is there any way to generate document with ISO characters in it ?

Best regards,
Cezex

iSilo
01-07-2004, 09:43 AM
You may need to specify the source character encoding.


<Source>
...
<CharSet>ISO-8859-2</CharSet>
</Source>

Cezex
01-09-2004, 05:59 AM
You may need to specify the source character encoding.
I did it in a first place, nothing changed :( Characters are stripped or replaced with Windows-1250 encoding (I don't remember precisely right now).

iSilo
01-09-2004, 10:01 AM
Could you post one of the source files you are trying to convert?

Cezex
01-09-2004, 11:04 AM
Here it is:
- ISO http://www.redfish.org.pl/redfish/palmnews/portal.php?0;0;3;0;999
- WIN http://www.redfish.org.pl/redfish/palmnews/portal.php?1;0;3;0;999
- without polish letters http://www.redfish.org.pl/redfish/palmnews/portal.php?2;0;3;0;999

iSilo
01-09-2004, 03:17 PM
The content at http://www.redfish.org.pl/redfish/palmnews/portal.php?0;0;3;0;999 already internally specifies that the text is encoded using iso-8859-2, so there is no need to specify the default source encoding unless there is some page there that does not specify its source encoding. So as long as you set the output document encoding to none, it should work.

Testing a .ixl file with the following works just fine:

<?xml version="1.0"?>
<iSiloXDocumentList>
<iSiloXDocument>
<Source>
<Sources>
<Path>http://www.redfish.org.pl/redfish/palmnews/portal.php?0;0;3;0;999</Path>
</Sources>
</Source>
<Destination>
<Title>ISO-8859-2</Title>
<Files>
<Path>iso88592.pdb</Path>
</Files>
</Destination>
<DocumentOptions>
<CharSet>none</CharSet>
</DocumentOptions>
</iSiloXDocument>
</iSiloXDocumentList>

Cezex
01-11-2004, 03:20 AM
OK, now it works. Thanks a lot !