Scanning in an image and converting it to text is relatively straightforward in Linux provided you have the correct software installed. I plumped for Tesseract as it was reputedly the best command line OCR program but I also wanted to have a graphical user interface with it so I used gImageReader as a front-end to Tesseract.

Here's how to install both of them.

Firstly, install tesseract (and the associated language files if needed):

sudo apt-get install tesseract-ocr

Install a language file (e.g. -eng, -deu, -fra, -ita, -ndl, -por, -spa, …)

sudo apt-get install tesseract-ocr-eng

Next, install gImageReader as a frontend to tesseract.

Add the application repository:

sudo add-apt-repository ppa:sandromani/gimagereader

Update the repository sources

sudo apt-get update

Install the application

sudo apt-get install gimagereader

Now you should be ready to go. gImageReader can be accessed on your graphics menu. Happy Character Recognising!


comments powered by Disqus