

Translating it to IT implies doing it PLUS doubling the "B". Translating it into PT or ES means adding an "O" at the end. If your PDF contains, say "republican" (EN) and nothing else in a specific font, it will embed 10 chars from this font into your PDF. Why waste file space storing Cyrillic, Greek, Hebrew, Arabic, Symbol and other chars you don't use, for each and every font your PDF contains? Now, with OpenType fonts holding up to thousands of glyphs, this is extremely important. Even at that time, on setup request, Acrobat would only embed the characters actually used, to save space. The issue was important in the days of the TrueType 256-glyph (characters) fonts. One point in PDF files being smaller is the embedding of partial fonts. Of course you will run into problems if you replace fonts later on.įont embedding is often an issue, especially if you are translating from a language that seldom uses diacritics (English) to another that uses them. Provided that your OCR "knows" that langiage and its letters.

by using the PDF/A option), you can view & OCR the file correctly. The question was about the nature of PDF content. However I think that with "standard" editable pdfs, the Infix route may be less time-consuming compared to the OCR/Word process.įonts (.) are a problem for me, and I wish only Arial, Courier New and my own handwriting existed. So the approach does have benefits, but it's still not as "clean" a workflow as I hoped with these particular files. My feeling is that it looks more promising than a raw Omnipage OCR, but raw Infix output is not usable either without faffing about with the source pdf files.

But then I am clueless about DTP, so I am learning on the job the hard way.

In these specific files, I find there is still a bit of work upstream to properly reformat "stories" in tables: two or more adjacent cells in a row/column are often in the same story, font replacements can seriously disrupt the table layout, line breaks missing. Thanks people (José, Tomas.) for mentioning it in these forums. So I've been toying with Infix for the past day or two, with the aim of using MemoQ to translate. Since the deadline for these pdfs is very generous, I have ample time to try and find a more elegant approach than brute force, like overwriting the frigging pdfs and sod it with automated repeats and similarities. Omnipage Pro required too much work to manually draw countless area types (text/images/tables/etc.) and approach usability in a CAT tool. Trados (2009) got me a bilingual file, but it was fairly useless (tags, segmenting.). InDesign underlying files nowhere to be found. I'm currently on thick editable pdf files with large tables, images embedded within, 15-20 fonts (including non-alphabetical), the works. Any insights based on personal experience?
