核心代码(在nuget上安装 Tesseract)
public string TesseractOCR(Bitmap image) { //Tesseract.Page chi_sim为中文训练数据包 Page page = new TesseractEngine(AppDomain.CurrentDomain.BaseDirectory + @"\tessdata", "chi_sim", EngineMode.Default).Process(PixConverter.ToPix(image)); //释放程序对图片的占用 image.Dispose(); //打印识别率 Console.WriteLine(String.Format("{0:P}", page.GetMeanConfidence())); //打印识别文本 //替换'/n'为'(空)'//替换'(空格)'为'(空)' string s = page.GetText().Replace("\n", "").Replace(" ", ""); Console.WriteLine(s); return s; }
源码下载地址: https://pan.baidu.com/s/1gwKFF-4ujgljdbI7VeS0eQ?pwd=9d8y 提取码: 9d8y
各种语言的训练数据包下载地址https://github.com/tesseract-ocr/tessdata