<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>uncategorized Archives  | Eric Thern :: thern dot org ::</title>
	<atom:link href="http://thern.org/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://thern.org</link>
	<description>Recently I had to translate a large stack of paperwork into text. Instead of doing this manually (by typing in all the content myself), I decided to test out some of the OCR software for Linux.
There&#8217;s a few out there, but the most popular is by far gocr (http://jocr.sourceforge.net/) and perhaps tesseract (http://code.google.com/p/tesseract-ocr/).
Scanning the documents
I used xsane to scan the documents, since it seemed like the easiest option. There is an option to Save file automatically upon scanning, and an option to increment the file number, so you can start with page 1 and it will auto save and auto increment. I used tiff</description>
	<lastBuildDate>Thu, 04 Aug 2011 18:28:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>uncategorized Archives  | Eric Thern :: thern dot org ::</title>
		<link>http://thern.org/uncategorized/ocr-character-recognition-translating-images-to-text/</link>
		<comments>http://thern.org/uncategorized/ocr-character-recognition-translating-images-to-text/#comments</comments>
		<pubDate>Sun, 07 Oct 2007 22:18:33 +0000</pubDate>
		<dc:creator>Eric Thern</dc:creator>
				<category><![CDATA[uncategorized]]></category>
		<category><![CDATA[djpeg]]></category>
		<category><![CDATA[gocr]]></category>
		<category><![CDATA[image text recognition]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[optical character recognition]]></category>
		<category><![CDATA[tesseract]]></category>

		<guid isPermaLink="false">http://thern.org/eric/?p=56</guid>
		<description><![CDATA[<p>Recently I had to translate a large stack of paperwork into text. Instead of doing this manually (by typing in all the content myself), I decided to test out some of the OCR software for Linux.</p> <p>There&#8217;s a few out there, but the most popular is by far gocr (<a title="http://jocr.sourceforge.net/" href="http://jocr.sourceforge.net/">http://jocr.sourceforge.net/</a>) and perhaps tesseract [...]]]></description>
		<wfw:commentRss>http://thern.org/uncategorized/ocr-character-recognition-translating-images-to-text/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

