{"id":1391,"date":"2020-07-28T03:02:18","date_gmt":"2020-07-28T03:02:18","guid":{"rendered":"https:\/\/muthu.co\/?p=1391"},"modified":"2021-05-24T02:31:44","modified_gmt":"2021-05-24T02:31:44","slug":"all-tesseract-ocr-options","status":"publish","type":"post","link":"http:\/\/write.muthu.co\/all-tesseract-ocr-options\/","title":{"rendered":"All Tesseract OCR options"},"content":{"rendered":"\n
This is for my reference and this might come in handy for others too.<\/p>\n\n\n\n
$ tesseract --help-extra<\/code><\/pre>\n\n\n\nUsage:\n tesseract --help | --help-extra | --help-psm | --help-oem | --version\n tesseract --list-langs [--tessdata-dir PATH]\n tesseract --print-parameters [options...] [configfile...]\n tesseract imagename|imagelist|stdin outputbase|stdout [options...] [configfile...]\nOCR options:\n --tessdata-dir PATH Specify the location of tessdata path.\n --user-words PATH Specify the location of user words file.\n --user-patterns PATH Specify the location of user patterns file.\n -l LANG[+LANG] Specify language(s) used for OCR.\n -c VAR=VALUE Set value for config variables.\n Multiple -c arguments are allowed.\n --psm NUM Specify page segmentation mode.\n --oem NUM Specify OCR Engine mode.\nNOTE: These options must occur before any configfile.\nPage segmentation modes:\n 0 Orientation and script detection (OSD) only.\n 1 Automatic page segmentation with OSD.\n 2 Automatic page segmentation, but no OSD, or OCR.\n 3 Fully automatic page segmentation, but no OSD. (Default)\n 4 Assume a single column of text of variable sizes.\n 5 Assume a single uniform block of vertically aligned text.\n 6 Assume a single uniform block of text.\n 7 Treat the image as a single text line.\n 8 Treat the image as a single word.\n 9 Treat the image as a single word in a circle.\n 10 Treat the image as a single character.\n 11 Sparse text. Find as much text as possible in no particular order.\n 12 Sparse text with OSD.\n 13 Raw line. Treat the image as a single text line,\n bypassing hacks that are Tesseract-specific.\nOCR Engine modes: (see https:\/\/github.com\/tesseract-ocr\/tesseract\/wiki#linux)\n 0 Legacy engine only.\n 1 Neural nets LSTM engine only.\n 2 Legacy + LSTM engines.\n 3 Default, based on what is available.\nSingle options:\n -h, --help Show minimal help message.\n --help-extra Show extra help for advanced users.\n --help-psm Show page segmentation modes.\n --help-oem Show OCR Engine modes.\n -v, --version Show version information.\n --list-langs List available languages for tesseract engine.\n --print-parameters Print tesseract parameters.<\/code><\/pre>\n\n\n\n
\n\n\n\nCLI Examples<\/h2>\n\n\n\n