c# - Unable to read the text from an image using tessnet2 and Tesseract-OCR -

January 15, 2011

i have written below.net code read text image:

the platform used write code: windows 10,visual studio 2015,tesseract-ocr-setup-4.00.00dev , tessnet2

 using system;  using system.collections.generic;  using system.linq;  using system.text;  using system.threading.tasks; using tessnet2;  using system.drawing; using system.drawing.drawing2d; using system.drawing.imaging; using system.io;  namespace consoleapplication2  {        class program     {     static void main(string[] args)     {         var image = new bitmap(@"d:\python\download.jpg");         var ocr = new tesseract();         ocr.init(@"c:\program files (x86)\tesseract-ocr\tessdata", "eng",false);         var result = ocr.doocr(image, rectangle.empty);         foreach (tessnet2.word word in result)         {             console.writeline(word.text);             file.appendalltext(@"d:\python\writefile.txt",word.text);          }         console.readline();     }    } }

i have both tried both cpu "any cpu" , x86. tried changing target framework versions project properties.

however, i'm getting below error:

an unhandled exception of type 'system.io.fileloadexception' occurred in  mscorlib.dll  additional information: mixed mode assembly built against version  'v2.0.50727'  of runtime , cannot loaded in 4.0 runtime without additional    configuration information.

edit: written in app.config remove error , looks below:

  <?xml version="1.0" encoding="utf-8"?> <configuration> <startup uselegacyv2runtimeactivationpolicy="true">   <supportedruntime version="v4.0" sku=".netframework,version=v4.5"/>  </startup>

installed nuget referring this: https://www.nuget.org/packages/nuget.tessnet2/

i'm not able read image. image have downloaded 1 of google image has text in it.

here message i'm getting:

and when checked in path c:\program files (x86)\tesseract-ocr\tessdata

this how looks like:

what doing wrong? how fix this?

the issue resolved: downloading lang packages here: https://github.com/tesseract-ocr/langdata

which missing previously.the important thing tessnet2 work languages packages, here (https://github.com/tesseract-ocr/langdata) languages want. sample, use english language.

download language , extract "..\tesseract-ocr\tessdata" folder.

note: looks default language package not come in tessdata during installation.

here modified version of code :

 using system;  using system.collections.generic;  using system.linq;  using system.text;  using system.threading.tasks;  using tessnet2;  using system.drawing;  using system.drawing.drawing2d;  using system.drawing.imaging;  using system.io;   namespace consoleapplication2  { class program {     static void main(string[] args)     {         var image = new bitmap(@"d:\python\download.jpg");         tessnet2.tesseract ocr = new tessnet2.tesseract();         ocr.init(@"c:\program files (x86)\tesseract-ocr\tessdata", "eng",false);         list<tessnet2.word> result = ocr.doocr(image, rectangle.empty);         foreach (tessnet2.word word in result)         {             console.writeline("{0} : {1}",word.confidence,word.text);          }          console.read();     }  } }

cheers!!!

Search This Blog

LP

c# - Unable to read the text from an image using tessnet2 and Tesseract-OCR -

Comments

Post a Comment

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -