You are what you translate: OCR: Optical Character Recognition

Hi!

The OCR process change documents that are an image (for example, PDF, JPG or PNG documents) in texts that we can use with a word processor. That's very useful, because that way you can work on this documents, and translators will be able to see how many words has a document. You will know the approximate time that you can take to translate it and, if you have to give a budget or a bill, you could say to your client how much will cost. I began to investigate OCR because I had to make a bill with a PDF document.

Once you have installed the program for OCR, I use gscan2pdf, you have to open a new document or, if you have the document on paper, you have to scan it before. In this case I will show you the example with a PNG document.

Then you have to open the document in the program. You have to select in “tools”, the option “OCR”. It will appear a window and there you must select the OCR motor, the language of the source document and then “start OCR”. All the words of the document will appear in the program in the window “OCR Results”. At this point you will able to click on the results and copy and paste it in a word processor. After that, you will able to work on the document.

Besides the OCR programs, there are websites where you can change your PDF, PNG or JPG documents in texts that you can use in a word processor (and if you are a translator, you will able to translate the text more easily).