Gentoo Archives: gentoo-user

From:	Francisco Ares <frares@×××××.com>
To:	gentoo-user <gentoo-user@l.g.o>
Subject:	Re: [gentoo-user] multi-region OCR
Date:	Wed, 30 Nov 2016 18:42:19
Message-Id:	`CAHH9eM58yai8=VLPmaO2LOij_H5BxkohVbDRz3ndo-OLitr0AQ@mail.gmail.com`
In Reply to:	Re: [gentoo-user] multi-region OCR by Michael Mol

1	2016-11-30 16:28 GMT-02:00 Michael Mol <mikemol@×××××.com>:
2
3	> On Wednesday, November 30, 2016 05:34:25 PM J. Roeleveld wrote:
4	> > On November 30, 2016 6:03:36 PM GMT+01:00, Michael Mol <
5	> mikemol@×××××.com>
6	> wrote:
7	> > >On Wednesday, November 30, 2016 10:43:13 AM J. Roeleveld wrote:
8	> > >> On Tuesday, November 29, 2016 11:18:36 PM karl@××××××××.se wrote:
9	> > >> > Michael Mol:
10	> > >> > ...
11	> > >> >
12	> > >> > > xsane would have let me do it during the scan process if I'd
13	> > >
14	> > >thought of
15	> > >
16	> > >> > > it
17	> > >> > > then, but the scans are done, drives aren't there any more.
18	> > >
19	> > >Something
20	> > >
21	> > >> > ...
22	> > >> >
23	> > >> > If xsane solves your need why don't you just print your scans so
24	> > >
25	> > >xsane
26	> > >
27	> > >> > can do its job ?
28	> > >>
29	> > >> There has to be a way to do this without killing an entire forest...
30	> > >
31	> > >And big chunks of ink cartridges. The scans stretched the contrast so I
32	> > >can
33	> > >clearly read the drive labels through the translucent anti-static bags,
34	> > >which
35	> > >means a huge chunk of the image (what's outside the labels) is pure
36	> > >black.
37	> > >
38	> > >Which I could get around by spending fifteen minutes munging things in
39	> > >the Gimp
40	> > >before printing, but at that point, I may as well just transcribe
41	> > >things
42	> > >manually at that point.
43	> > >
44	> > >Looking for something reasonably simple to improve the general
45	> > >workflow. I'd
46	> > >have hoped something would have already been available on Linux; it'd
47	> > >be easy
48	> > >enough to copy the scans to my phone and feed them through Google
49	> > >Goggles for
50	> > >the desired output, but then I'm deliberately filtering company data
51	> > >through an
52	> > >outside entity.
53	> >
54	> > Did you manage to use that link I sent?
55	>
56	> I did. tesseract almost worked, even separating the regions cleanly in its
57	> output, but it seems, sadly, that the 300dpi scans were insufficient to
58	> get a
59	> good read; lots of clear corruption of the text, so things like serial
60	> numbers, model numbers, version numbers--everything you'd care
61	> about--would be
62	> highly suspect.
63	>
64	> The next tool that looked like it might work, gscan2pdf, wasn't in portage,
65	> and with the semi-garbled output from tesseract suggesting the scans were
66	> too
67	> poor quality, I didn't pursue further.
68	>
69	> --
70	> :wq
71
72
73	Well, I've had similar issue. I had gimp to resize the image to its double
74	(width and height, of course), filtered it a bit (edge enhancement) and
75	split the image in several ones for the regions of interest.
76
77	Of course, there might be an easier way ;-)
78
79	Francisco

Report Message

Find on MARC Find on Google Groups