I spend yesterday trying to guess to restrictions of the pocketbooks dictionary converter.exe* to get the whole of the Oxford Dictionary 2nd Edition into dic-format. Oxford dictionary has entries up to 115k characters, so it not odd converter.exe crashes, just irritating. Duden (de-de) en Oxford Learners Dictionary 8th Ed. (en-en) work with a little tweaking of the xdxf-files.**
Wish I had a clue of that format so I could skip the program converter.exe: The Perl script already runs up to 250 lines!
Does anyone have or know a link to the source code of converter.exe? Does anyone know the format of pocketbook's dic-format, so I can generate it straight from xdxf- or cvs-format?
The restrictions known of converter.exe are
Possible resolutions are:
If someone has tinkered with this before and has pointers for me, I would be much obliged.
____________________________________
* I used DictionaryConverter-neu 171109. Search this forum or look here for more info.
** For the conversion of dictionaries to xdxf-format I used linguae. Search this forum or look here for more info.
*** This is different from @Rkomar's post that states that he converted a dictionary with 33283 lines. It seems to be the limit on one dictionary entry.
Wish I had a clue of that format so I could skip the program converter.exe: The Perl script already runs up to 250 lines!
Does anyone have or know a link to the source code of converter.exe? Does anyone know the format of pocketbook's dic-format, so I can generate it straight from xdxf- or cvs-format?
The restrictions known of converter.exe are
- A line should not be >4096 bytes. It cuts the line after this length and messages that the XML is missing closing tags.
- If '&' or '>' are found in the XML content outside of tags, etc., it quits and messages about malformed XML.
- If an dictionary entry, a block enclosed by <ar> and </ar> tags exceeds roughly 57k chars it crashes without messaging.***
Possible resolutions are:
- Split the dictionary entry at the tags or use something like prettify, auto-ident.
- '&' and '<' should be replaced with '&' and '<'.
- I went to bed before I could resolve this. Hopefully, I can resolve this by splitting an entry in multiple entries with identical or slightly different lemma's.
If someone has tinkered with this before and has pointers for me, I would be much obliged.
____________________________________
* I used DictionaryConverter-neu 171109. Search this forum or look here for more info.
** For the conversion of dictionaries to xdxf-format I used linguae. Search this forum or look here for more info.
*** This is different from @Rkomar's post that states that he converted a dictionary with 33283 lines. It seems to be the limit on one dictionary entry.