Homepage: https://gitlab.com/rstocker/scanner.git
Author: Raffael Stocker
Updated:
Scan documents and images
Scan documents and images using scanimage(1) from the SANE distribution and tesseract(1) for OCR and PDF export. The scanner package uses two sets of customizations for image mode and document mode, with the former usually configured to use high resolution and an image file format, like JPEG, and the latter to use lower resolution and a document format, like PDF or text. The available file formats are provided by scanimage(1) for image mode and tesseract(1) for document mode. The scanner package uses tesseract(1) to provide optical character recognition (OCR). You can select the language plugins with ‘scanner-tesseract-languages’. See also the remark about the data directories below. In document mode, you can scan one or multiple pages that are then written in a customizable output format, e.g. (searchable) PDF or text, or whatever tesseract provides. You can also customize resolution, intermediate image format, and paper size. The command ‘scanner-scan-document’ starts a document scan. Without a prefix argument, it scans one page. With a non-numeric argument, it asks the user after each scanned page for confirmation to scan another page. With a numeric argument, it scans that many pages. In the latter case, it observes a delay between scans that is customizable using ‘scanner-scan-delay’. The ‘scanner-scan-image’ command performs one scan or multiple scans in image mode. This function tries to guess the file format from the chosen file name or falls back to the configured default, see ‘scanner-image-format’. The prefix argument works as in document mode. The scanning commands are also available in the Tools->Scanner menu. For both images and documents, you can customize the scan mode (e.g. "Color" or "Gray") if your scanning device supports it. You can pass additional options to the backends using the customization variables ‘scanner-scanimage-switches’ and ‘scanner-tesseract-switches’. The former variable is helpful for tuning brightness and contrast, for instance. Finally, the customization options ‘scanner-tessdata-dir’ and ‘scanner-tessdata-configdir’ must be set to point to tesseract's data directory containing the language definitions (usually something like /usr/share/tessdata/) and tesseract's configs directory containing the output configurations (usually something like /usr/share/tessdata/configs/).