1 |
Did you train tesseract per chance? And could I get some sample images? |
2 |
|
3 |
Landis |
4 |
|
5 |
|
6 |
On 11/30/2016 12:28 PM, Michael Mol wrote: |
7 |
> On Wednesday, November 30, 2016 05:34:25 PM J. Roeleveld wrote: |
8 |
>> On November 30, 2016 6:03:36 PM GMT+01:00, Michael Mol <mikemol@×××××.com> |
9 |
> wrote: |
10 |
>>> On Wednesday, November 30, 2016 10:43:13 AM J. Roeleveld wrote: |
11 |
>>>> On Tuesday, November 29, 2016 11:18:36 PM karl@××××××××.se wrote: |
12 |
>>>>> Michael Mol: |
13 |
>>>>> ... |
14 |
>>>>> |
15 |
>>>>>> xsane would have let me do it during the scan process if I'd |
16 |
>>> thought of |
17 |
>>> |
18 |
>>>>>> it |
19 |
>>>>>> then, but the scans are done, drives aren't there any more. |
20 |
>>> Something |
21 |
>>> |
22 |
>>>>> ... |
23 |
>>>>> |
24 |
>>>>> If xsane solves your need why don't you just print your scans so |
25 |
>>> xsane |
26 |
>>> |
27 |
>>>>> can do its job ? |
28 |
>>>> There has to be a way to do this without killing an entire forest... |
29 |
>>> And big chunks of ink cartridges. The scans stretched the contrast so I |
30 |
>>> can |
31 |
>>> clearly read the drive labels through the translucent anti-static bags, |
32 |
>>> which |
33 |
>>> means a huge chunk of the image (what's outside the labels) is pure |
34 |
>>> black. |
35 |
>>> |
36 |
>>> Which I could get around by spending fifteen minutes munging things in |
37 |
>>> the Gimp |
38 |
>>> before printing, but at that point, I may as well just transcribe |
39 |
>>> things |
40 |
>>> manually at that point. |
41 |
>>> |
42 |
>>> Looking for something reasonably simple to improve the general |
43 |
>>> workflow. I'd |
44 |
>>> have hoped something would have already been available on Linux; it'd |
45 |
>>> be easy |
46 |
>>> enough to copy the scans to my phone and feed them through Google |
47 |
>>> Goggles for |
48 |
>>> the desired output, but then I'm deliberately filtering company data |
49 |
>>> through an |
50 |
>>> outside entity. |
51 |
>> Did you manage to use that link I sent? |
52 |
> I did. tesseract almost worked, even separating the regions cleanly in its |
53 |
> output, but it seems, sadly, that the 300dpi scans were insufficient to get a |
54 |
> good read; lots of clear corruption of the text, so things like serial |
55 |
> numbers, model numbers, version numbers--everything you'd care about--would be |
56 |
> highly suspect. |
57 |
> |
58 |
> The next tool that looked like it might work, gscan2pdf, wasn't in portage, |
59 |
> and with the semi-garbled output from tesseract suggesting the scans were too |
60 |
> poor quality, I didn't pursue further. |
61 |
> |