Gentoo Archives: gentoo-user

From: Michael Mol <mikemol@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] multi-region OCR
Date: Wed, 30 Nov 2016 18:28:28
Message-Id: 148057878.rWyb8N9IZg@serenity
In Reply to: Re: [gentoo-user] multi-region OCR by "J. Roeleveld"
1 On Wednesday, November 30, 2016 05:34:25 PM J. Roeleveld wrote:
2 > On November 30, 2016 6:03:36 PM GMT+01:00, Michael Mol <mikemol@×××××.com>
3 wrote:
4 > >On Wednesday, November 30, 2016 10:43:13 AM J. Roeleveld wrote:
5 > >> On Tuesday, November 29, 2016 11:18:36 PM karl@××××××××.se wrote:
6 > >> > Michael Mol:
7 > >> > ...
8 > >> >
9 > >> > > xsane would have let me do it during the scan process if I'd
10 > >
11 > >thought of
12 > >
13 > >> > > it
14 > >> > > then, but the scans are done, drives aren't there any more.
15 > >
16 > >Something
17 > >
18 > >> > ...
19 > >> >
20 > >> > If xsane solves your need why don't you just print your scans so
21 > >
22 > >xsane
23 > >
24 > >> > can do its job ?
25 > >>
26 > >> There has to be a way to do this without killing an entire forest...
27 > >
28 > >And big chunks of ink cartridges. The scans stretched the contrast so I
29 > >can
30 > >clearly read the drive labels through the translucent anti-static bags,
31 > >which
32 > >means a huge chunk of the image (what's outside the labels) is pure
33 > >black.
34 > >
35 > >Which I could get around by spending fifteen minutes munging things in
36 > >the Gimp
37 > >before printing, but at that point, I may as well just transcribe
38 > >things
39 > >manually at that point.
40 > >
41 > >Looking for something reasonably simple to improve the general
42 > >workflow. I'd
43 > >have hoped something would have already been available on Linux; it'd
44 > >be easy
45 > >enough to copy the scans to my phone and feed them through Google
46 > >Goggles for
47 > >the desired output, but then I'm deliberately filtering company data
48 > >through an
49 > >outside entity.
50 >
51 > Did you manage to use that link I sent?
52
53 I did. tesseract almost worked, even separating the regions cleanly in its
54 output, but it seems, sadly, that the 300dpi scans were insufficient to get a
55 good read; lots of clear corruption of the text, so things like serial
56 numbers, model numbers, version numbers--everything you'd care about--would be
57 highly suspect.
58
59 The next tool that looked like it might work, gscan2pdf, wasn't in portage,
60 and with the semi-garbled output from tesseract suggesting the scans were too
61 poor quality, I didn't pursue further.
62
63 --
64 :wq

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-user] multi-region OCR Neil Bothwick <neil@××××××××××.uk>
Re: [gentoo-user] multi-region OCR Francisco Ares <frares@×××××.com>
Re: [gentoo-user] multi-region OCR Landis Blackwell <blackwelllandis@×××××.com>