Gentoo Archives: gentoo-user

From:	Michael Mol <mikemol@×××××.com>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] multi-region OCR
Date:	Wed, 30 Nov 2016 18:28:28
Message-Id:	`148057878.rWyb8N9IZg@serenity`
In Reply to:	Re: [gentoo-user] multi-region OCR by "J. Roeleveld"

1	On Wednesday, November 30, 2016 05:34:25 PM J. Roeleveld wrote:
2	> On November 30, 2016 6:03:36 PM GMT+01:00, Michael Mol <mikemol@×××××.com>
3	wrote:
4	> >On Wednesday, November 30, 2016 10:43:13 AM J. Roeleveld wrote:
5	> >> On Tuesday, November 29, 2016 11:18:36 PM karl@××××××××.se wrote:
6	> >> > Michael Mol:
7	> >> > ...
8	> >> >
9	> >> > > xsane would have let me do it during the scan process if I'd
10	> >
11	> >thought of
12	> >
13	> >> > > it
14	> >> > > then, but the scans are done, drives aren't there any more.
15	> >
16	> >Something
17	> >
18	> >> > ...
19	> >> >
20	> >> > If xsane solves your need why don't you just print your scans so
21	> >
22	> >xsane
23	> >
24	> >> > can do its job ?
25	> >>
26	> >> There has to be a way to do this without killing an entire forest...
27	> >
28	> >And big chunks of ink cartridges. The scans stretched the contrast so I
29	> >can
30	> >clearly read the drive labels through the translucent anti-static bags,
31	> >which
32	> >means a huge chunk of the image (what's outside the labels) is pure
33	> >black.
34	> >
35	> >Which I could get around by spending fifteen minutes munging things in
36	> >the Gimp
37	> >before printing, but at that point, I may as well just transcribe
38	> >things
39	> >manually at that point.
40	> >
41	> >Looking for something reasonably simple to improve the general
42	> >workflow. I'd
43	> >have hoped something would have already been available on Linux; it'd
44	> >be easy
45	> >enough to copy the scans to my phone and feed them through Google
46	> >Goggles for
47	> >the desired output, but then I'm deliberately filtering company data
48	> >through an
49	> >outside entity.
50	>
51	> Did you manage to use that link I sent?
52
53	I did. tesseract almost worked, even separating the regions cleanly in its
54	output, but it seems, sadly, that the 300dpi scans were insufficient to get a
55	good read; lots of clear corruption of the text, so things like serial
56	numbers, model numbers, version numbers--everything you'd care about--would be
57	highly suspect.
58
59	The next tool that looked like it might work, gscan2pdf, wasn't in portage,
60	and with the semi-garbled output from tesseract suggesting the scans were too
61	poor quality, I didn't pursue further.
62
63	--
64	:wq

Attachments

File name	MIME type
signature.asc	application/pgp-signature

Replies

Subject	Author
Re: [gentoo-user] multi-region OCR	Neil Bothwick <neil@××××××××××.uk>
Re: [gentoo-user] multi-region OCR	Francisco Ares <frares@×××××.com>
Re: [gentoo-user] multi-region OCR	Landis Blackwell <blackwelllandis@×××××.com>

Report Message

Find on MARC Find on Google Groups