c# - Unable to read the text from an image using tessnet2 and Tesseract-OCR -
i have written below.net code read text image:
the platform used write code: windows 10,visual studio 2015,tesseract-ocr-setup-4.00.00dev , tessnet2
 using system;  using system.collections.generic;  using system.linq;  using system.text;  using system.threading.tasks; using tessnet2;  using system.drawing; using system.drawing.drawing2d; using system.drawing.imaging; using system.io;  namespace consoleapplication2  {        class program     {     static void main(string[] args)     {         var image = new bitmap(@"d:\python\download.jpg");         var ocr = new tesseract();         ocr.init(@"c:\program files (x86)\tesseract-ocr\tessdata", "eng",false);         var result = ocr.doocr(image, rectangle.empty);         foreach (tessnet2.word word in result)         {             console.writeline(word.text);             file.appendalltext(@"d:\python\writefile.txt",word.text);          }         console.readline();     }    } } i have both tried both cpu "any cpu" , x86. tried changing target framework versions project properties.
however, i'm getting below error:
an unhandled exception of type 'system.io.fileloadexception' occurred in  mscorlib.dll  additional information: mixed mode assembly built against version  'v2.0.50727'  of runtime , cannot loaded in 4.0 runtime without additional    configuration information. edit: written in app.config remove error , looks below:
  <?xml version="1.0" encoding="utf-8"?> <configuration> <startup uselegacyv2runtimeactivationpolicy="true">   <supportedruntime version="v4.0" sku=".netframework,version=v4.5"/>  </startup> installed nuget referring this: https://www.nuget.org/packages/nuget.tessnet2/
i'm not able read image. image have downloaded 1 of google image has text in it.
here message i'm getting:
and when checked in path c:\program files (x86)\tesseract-ocr\tessdata
this how looks like:
what doing wrong? how fix this?
the issue resolved: downloading lang packages here: https://github.com/tesseract-ocr/langdata
which missing previously.the important thing tessnet2 work languages packages, here (https://github.com/tesseract-ocr/langdata) languages want. sample, use english language.
download language , extract "..\tesseract-ocr\tessdata" folder.
note: looks default language package not come in tessdata during installation.
here modified version of code :
 using system;  using system.collections.generic;  using system.linq;  using system.text;  using system.threading.tasks;  using tessnet2;  using system.drawing;  using system.drawing.drawing2d;  using system.drawing.imaging;  using system.io;   namespace consoleapplication2  { class program {     static void main(string[] args)     {         var image = new bitmap(@"d:\python\download.jpg");         tessnet2.tesseract ocr = new tessnet2.tesseract();         ocr.init(@"c:\program files (x86)\tesseract-ocr\tessdata", "eng",false);         list<tessnet2.word> result = ocr.doocr(image, rectangle.empty);         foreach (tessnet2.word word in result)         {             console.writeline("{0} : {1}",word.confidence,word.text);          }          console.read();     }  } } cheers!!!


Comments
Post a Comment