Win2PDF 10 build 172: Additional Language Support for Win2PDF OCR Add-on

Our last post detailed the new Excel conversion enhancements in Win2PDF 10 build 172. Now, let’s look at the remaining changes in this release.

Most significant is the additional language support in the Win2PDF OCR Add-on setup program. Optical Character Recognition (OCR) is a powerful and useful tool used to convert scanned documents, images, previously archived documents, and other types of files to a PDF file with searchable text.

Win2PDF OCR Add-on

Users will now have the ability to dynamically download and install additional OCR training languages. The following language are now supported:

English, Deutsch (German), Français (French), Italiano (Italian), Nederlands (Dutch), Ελληνικά (Greek), Português (Portuguese), Español (Spanish), العربية (Arabic), Հայերեն (Armenian), Български (Bulgarian), বাংলা (Bengali), Català (Catalan), Corsu (Corsican), Hrvatski (Croatian), Čeština (Czech), 简体中文 (Chinese Simplified), Dansk (Danish), Suomi (Finnish), עברית (Hebrew), हिंदी (Hindi), Magyar (Hungarian), Íslenska (Icelandic), 日本語 (Japanese), 한국어 (Korean), Latviešu (Latvian), Lietuvių (Lithuanian), Norsk (Norwegian), Polski (Polish), Română (Romanian), Русский (Russian), Српски (Serbian), Slovenčina (Slovak), Slovenščina (Slovenian), Svenska (Swedish), ภาษาไทย (Thai), Türkçe (Turkish), Українська (Ukrainian), Tiếng Việt (Vietnamese)

For example, if you wanted to use French training data in addition to English (and you’re using an English operating system), you’d select the following during the setup process.

Once the additional language training data is installed, you’ll be able to take any document or image file (with non-searchable French or English text) and convert it to a searchable PDF using Win2PDF and the “Searchable (OCR PDF)” output option. [Note: You’ll also be able to select this option using Batch Convert (Pro only), the MAKESEARCHABLE command line feature, or the Win2PDF Desktop Export features.]

In our example, since we installed the French training data, we’ll be able to OCR any document that contains French words and characters.

You can install any combination of language training data, but the training data is large and takes additional processing time so it’s best to only install language data for languages that you regularly use in documents.

And finally, here are the remainder of the version 10 build 172 enhancements:

  • Updated the PDF2DOCX and extracttext commands to automatically make the PDF searchable if it isn’t already searchable.
  • Added ispdfsearchable command line option that can be used to check the searchability of an existing PDF file.
  • Added addprinter command line option to add a Win2PDF printer.
  • Updated all command line options to create the output folder if it doesn’t exist.
  • Added Windows Explorer “Open With” file associations for the following file types:
    • .PDF, .BMP, .DIB, .TIF, .TIFF, .JPG, .JPE, .JPEG, .JFIF, .PNG, .GIF, .HTML, .HTM, .MHTML, .XPS, .OXPS.
    • For Win2PDF Pro, also added: .DOC, .RTF, .TXT, .DOCX, .ODT, .XLS, .XLSX, .XLSB, .CSV, .ODS

All of these features can be updated at no cost for Win2PDF version 7 and later users.

Create PDF Documents With Searchable Text from Google Chrome and Microsoft Edge

Win2PDF now has a feature that allows you to print documents that would normally contain non-searchable text to PDF files with searchable text.

Why this feature? When would you use it?  Well, there is one area in particular where this is useful, and that’s when it comes to printing PDF files from Google’s Chrome web browser, Microsoft’s newest Edge browser, or from other Google apps like Docs. Due to the way Google and Microsoft have developed their browsers and apps, printing from these programs creates PDF files that are image-based and not-searchable (or selectable) as actual text. When documents or web pages are printed to a paper printer, this isn’t noticeable or an issue. However it is a problem if you are using Win2PDF or another PDF printer since the files will be larger, non-searchable, and non-selectable.

We’ve solved this problem by adding a new save format called “Portable Document Format – Searchable (OCR PDF)”. When you use this save option when printing from Chrome, Edge, or Google Docs, the resulting PDF file will contain searchable text. It applies Optical Character Recognition (OCR) to the file and converts the image-based text into searchable text automatically.

This has been frequently reported to our Win2PDF help desk as a problem for users and prior to this feature we had to explain a multi-step process to get the desired results. Now, it’s just a single save like it would be from any other application.

This feature is still in our pre-release testing phase, but we want users to try this and give us some feedback. To try this feature, please do the following:

    1. Download and install Win2PDF 10.0.78 (or higher). This version can be downloaded from the Win2PDF 10 Update section of our knowledgebase.
    2. Download and install the Win2PDF Desktop with OCR Download.
    3. After you install the separate Win2PDF Desktop with OCR package, Win2PDF displays an extra save as type labeled “Portable Document Format – Searchable (OCR PDF)

While this is useful when you are creating new PDFs from Chrome or Edge, what about existing files that had previously been saved as image only, or that you received as email? Is there a way to “fix” those so that they are searchable?

Yes. Just open the original PDF in the Win2PDF Desktop App and Select Export  -> PDF – Searchable (OCR) from the File menu.

make-pdf-searchable-menu

The Searchable OCR PDF is only available in our pre-release software and we’re working on improvements, but give it a try and if you have any feedback or issues, let us know by sending an email to [email protected] or opening a ticket at our Helpdesk page.