1 |
On 07/05/2016 16:31, hw wrote: |
2 |
> Helmut Jarausch schrieb: |
3 |
>> On 04/08/2016 03:26:53 PM, hw wrote: |
4 |
>>> |
5 |
>>> Hi, |
6 |
>>> |
7 |
>>> what would be the best approach to extract data |
8 |
>>> from a screencast? |
9 |
>>> |
10 |
>>> The task is to acquire some data from the display of |
11 |
>>> a GUI program used interactively by a user. There are |
12 |
>>> a couple 'fields' (as in "designated areas of the display") |
13 |
>>> in which the relevant data is being displayed while the |
14 |
>>> program is being used. The acquired data needs to be |
15 |
>>> entered into a mysql database, preferably as soon as |
16 |
>>> possible. (The program needs windoze, and the sources |
17 |
>>> are unavailable :( ) |
18 |
>>> |
19 |
>>> |
20 |
>>> The idea is to make a screen recording and postprocess |
21 |
>>> the recording with some sort of OCR software. This might |
22 |
>>> require using ffmpeg (or the like) to create a single |
23 |
>>> image from each frame of the recording; then treat each |
24 |
>>> image with an OCR software to get the interesting data |
25 |
>>> which can then be entered into the database. |
26 |
>>> |
27 |
>>> Data to extract is mostly numbers. The relevant fields |
28 |
>>> can be expected to be either filled or empty. The FPS rate |
29 |
>>> of the recording can be kept reasonably low, like 1 FPS, |
30 |
>>> or perhaps even less, depending on how frequent the relevant |
31 |
>>> fields change. |
32 |
>>> |
33 |
>>> Using tesseract comes to mind, but after reading that |
34 |
>>> |
35 |
>>> "Tesseract's output will be very poor quality if the input |
36 |
>>> images are not preprocessed to suit it: Images (especially |
37 |
>>> screenshots) must be scaled up such that the text x-height |
38 |
>>> is at least 20 pixels,[12] any rotation or skew must be |
39 |
>>> corrected or no text will be recognized, low-frequency |
40 |
>>> changes in brightness must be high-pass filtered, or |
41 |
>>> Tesseract's binarization stage will destroy much of the |
42 |
>>> page, and dark borders must be manually removed, or they |
43 |
>>> will be misinterpreted as characters."[1] |
44 |
>>> |
45 |
>>> I'm even more doubtful that this would produce usable |
46 |
>>> results with sufficient reliability. |
47 |
>>> |
48 |
>>> So what might be the best way to get text/numbers out of |
49 |
>>> what a program displays? |
50 |
>>> |
51 |
>>> |
52 |
>>> [1]: https://en.wikipedia.org/wiki/Tesseract_(software) |
53 |
>>> |
54 |
>> |
55 |
>> I can't help with Gentoo. |
56 |
>> Try to find an old (free) version of FineReader which runs under wine. |
57 |
>> If you do it only occasionally, transfer the image to an Android phone |
58 |
>> where there a good and cheap OCR apps, even FineReader. |
59 |
> |
60 |
> It would be too much video to process. Besides, phones are |
61 |
> ok for making phone calls and entirely incompatible with |
62 |
> computers, which makes them useless for anything else but |
63 |
> making phone calls. |
64 |
|
65 |
|
66 |
Huh? da fuck you talkin' 'bout? |
67 |
|
68 |
|
69 |
My trusty collection of Android devices would be very surprised to hear |
70 |
they now don't have real CPUs, wifi chips, RAM and storage. Or can't run |
71 |
a web browser, do email, instant chat, play x264 video with less cpu |
72 |
load than my 8 core laptop, share with smb on the network, do bluetooth, |
73 |
video calls or any of the other bazzillion things computers have always |
74 |
done with each other. |
75 |
|
76 |
How odd. I really thought my Android phones could do all of that. I must |
77 |
have imagined it .... that means my delusions are worse than I thought |
78 |
and maybe I need different and more pills from the nice lady who's my GP. |
79 |
|
80 |
|
81 |
|
82 |
-- |
83 |
Alan McKinnon |
84 |
alan.mckinnon@×××××.com |