c# - Unable to read the text from an image using tessnet2 and Tesseract-OCR -


i have written below.net code read text image:

the platform used write code: windows 10,visual studio 2015,tesseract-ocr-setup-4.00.00dev , tessnet2

 using system;  using system.collections.generic;  using system.linq;  using system.text;  using system.threading.tasks; using tessnet2;  using system.drawing; using system.drawing.drawing2d; using system.drawing.imaging; using system.io;  namespace consoleapplication2  {        class program     {     static void main(string[] args)     {         var image = new bitmap(@"d:\python\download.jpg");         var ocr = new tesseract();         ocr.init(@"c:\program files (x86)\tesseract-ocr\tessdata", "eng",false);         var result = ocr.doocr(image, rectangle.empty);         foreach (tessnet2.word word in result)         {             console.writeline(word.text);             file.appendalltext(@"d:\python\writefile.txt",word.text);          }         console.readline();     }    } } 

i have both tried both cpu "any cpu" , x86. tried changing target framework versions project properties.

however, i'm getting below error:

an unhandled exception of type 'system.io.fileloadexception' occurred in  mscorlib.dll  additional information: mixed mode assembly built against version  'v2.0.50727'  of runtime , cannot loaded in 4.0 runtime without additional    configuration information. 

edit: written in app.config remove error , looks below:

  <?xml version="1.0" encoding="utf-8"?> <configuration> <startup uselegacyv2runtimeactivationpolicy="true">   <supportedruntime version="v4.0" sku=".netframework,version=v4.5"/>  </startup> 

installed nuget referring this: https://www.nuget.org/packages/nuget.tessnet2/

i'm not able read image. image have downloaded 1 of google image has text in it.

here message i'm getting:

enter image description here

and when checked in path c:\program files (x86)\tesseract-ocr\tessdata

this how looks like:

enter image description here

what doing wrong? how fix this?

the issue resolved: downloading lang packages here: https://github.com/tesseract-ocr/langdata

which missing previously.the important thing tessnet2 work languages packages, here (https://github.com/tesseract-ocr/langdata) languages want. sample, use english language.

download language , extract "..\tesseract-ocr\tessdata" folder.

note: looks default language package not come in tessdata during installation.

here modified version of code :

 using system;  using system.collections.generic;  using system.linq;  using system.text;  using system.threading.tasks;  using tessnet2;  using system.drawing;  using system.drawing.drawing2d;  using system.drawing.imaging;  using system.io;   namespace consoleapplication2  { class program {     static void main(string[] args)     {         var image = new bitmap(@"d:\python\download.jpg");         tessnet2.tesseract ocr = new tessnet2.tesseract();         ocr.init(@"c:\program files (x86)\tesseract-ocr\tessdata", "eng",false);         list<tessnet2.word> result = ocr.doocr(image, rectangle.empty);         foreach (tessnet2.word word in result)         {             console.writeline("{0} : {1}",word.confidence,word.text);          }          console.read();     }  } } 

cheers!!!


Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -