c# - Unable to read the text from an image using tessnet2 and Tesseract-OCR -
i have written below.net code read text image:
the platform used write code: windows 10,visual studio 2015,tesseract-ocr-setup-4.00.00dev , tessnet2
using system; using system.collections.generic; using system.linq; using system.text; using system.threading.tasks; using tessnet2; using system.drawing; using system.drawing.drawing2d; using system.drawing.imaging; using system.io; namespace consoleapplication2 { class program { static void main(string[] args) { var image = new bitmap(@"d:\python\download.jpg"); var ocr = new tesseract(); ocr.init(@"c:\program files (x86)\tesseract-ocr\tessdata", "eng",false); var result = ocr.doocr(image, rectangle.empty); foreach (tessnet2.word word in result) { console.writeline(word.text); file.appendalltext(@"d:\python\writefile.txt",word.text); } console.readline(); } } }
i have both tried both cpu "any cpu" , x86. tried changing target framework versions project properties.
however, i'm getting below error:
an unhandled exception of type 'system.io.fileloadexception' occurred in mscorlib.dll additional information: mixed mode assembly built against version 'v2.0.50727' of runtime , cannot loaded in 4.0 runtime without additional configuration information.
edit: written in app.config remove error , looks below:
<?xml version="1.0" encoding="utf-8"?> <configuration> <startup uselegacyv2runtimeactivationpolicy="true"> <supportedruntime version="v4.0" sku=".netframework,version=v4.5"/> </startup>
installed nuget referring this: https://www.nuget.org/packages/nuget.tessnet2/
i'm not able read image. image have downloaded 1 of google image has text in it.
here message i'm getting:
and when checked in path c:\program files (x86)\tesseract-ocr\tessdata
this how looks like:
what doing wrong? how fix this?
the issue resolved: downloading lang packages here: https://github.com/tesseract-ocr/langdata
which missing previously.the important thing tessnet2 work languages packages, here (https://github.com/tesseract-ocr/langdata) languages want. sample, use english language.
download language , extract "..\tesseract-ocr\tessdata" folder.
note: looks default language package not come in tessdata during installation.
here modified version of code :
using system; using system.collections.generic; using system.linq; using system.text; using system.threading.tasks; using tessnet2; using system.drawing; using system.drawing.drawing2d; using system.drawing.imaging; using system.io; namespace consoleapplication2 { class program { static void main(string[] args) { var image = new bitmap(@"d:\python\download.jpg"); tessnet2.tesseract ocr = new tessnet2.tesseract(); ocr.init(@"c:\program files (x86)\tesseract-ocr\tessdata", "eng",false); list<tessnet2.word> result = ocr.doocr(image, rectangle.empty); foreach (tessnet2.word word in result) { console.writeline("{0} : {1}",word.confidence,word.text); } console.read(); } } }
cheers!!!
Comments
Post a Comment