Gentoo Archives: gentoo-user

From:	Landis Blackwell <blackwelllandis@×××××.com>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] multi-region OCR
Date:	Wed, 30 Nov 2016 19:48:35
Message-Id:	`1a2dfbf4-e061-3162-58c2-1da289430568@gmail.com`
In Reply to:	Re: [gentoo-user] multi-region OCR by Michael Mol

1	Did you train tesseract per chance? And could I get some sample images?
2
3	Landis
4
5
6	On 11/30/2016 12:28 PM, Michael Mol wrote:
7	> On Wednesday, November 30, 2016 05:34:25 PM J. Roeleveld wrote:
8	>> On November 30, 2016 6:03:36 PM GMT+01:00, Michael Mol <mikemol@×××××.com>
9	> wrote:
10	>>> On Wednesday, November 30, 2016 10:43:13 AM J. Roeleveld wrote:
11	>>>> On Tuesday, November 29, 2016 11:18:36 PM karl@××××××××.se wrote:
12	>>>>> Michael Mol:
13	>>>>> ...
14	>>>>>
15	>>>>>> xsane would have let me do it during the scan process if I'd
16	>>> thought of
17	>>>
18	>>>>>> it
19	>>>>>> then, but the scans are done, drives aren't there any more.
20	>>> Something
21	>>>
22	>>>>> ...
23	>>>>>
24	>>>>> If xsane solves your need why don't you just print your scans so
25	>>> xsane
26	>>>
27	>>>>> can do its job ?
28	>>>> There has to be a way to do this without killing an entire forest...
29	>>> And big chunks of ink cartridges. The scans stretched the contrast so I
30	>>> can
31	>>> clearly read the drive labels through the translucent anti-static bags,
32	>>> which
33	>>> means a huge chunk of the image (what's outside the labels) is pure
34	>>> black.
35	>>>
36	>>> Which I could get around by spending fifteen minutes munging things in
37	>>> the Gimp
38	>>> before printing, but at that point, I may as well just transcribe
39	>>> things
40	>>> manually at that point.
41	>>>
42	>>> Looking for something reasonably simple to improve the general
43	>>> workflow. I'd
44	>>> have hoped something would have already been available on Linux; it'd
45	>>> be easy
46	>>> enough to copy the scans to my phone and feed them through Google
47	>>> Goggles for
48	>>> the desired output, but then I'm deliberately filtering company data
49	>>> through an
50	>>> outside entity.
51	>> Did you manage to use that link I sent?
52	> I did. tesseract almost worked, even separating the regions cleanly in its
53	> output, but it seems, sadly, that the 300dpi scans were insufficient to get a
54	> good read; lots of clear corruption of the text, so things like serial
55	> numbers, model numbers, version numbers--everything you'd care about--would be
56	> highly suspect.
57	>
58	> The next tool that looked like it might work, gscan2pdf, wasn't in portage,
59	> and with the semi-garbled output from tesseract suggesting the scans were too
60	> poor quality, I didn't pursue further.
61	>

Report Message

Find on MARC Find on Google Groups