1 |
Helmut Jarausch schrieb: |
2 |
> On 04/08/2016 03:26:53 PM, hw wrote: |
3 |
>> |
4 |
>> Hi, |
5 |
>> |
6 |
>> what would be the best approach to extract data |
7 |
>> from a screencast? |
8 |
>> |
9 |
>> The task is to acquire some data from the display of |
10 |
>> a GUI program used interactively by a user. There are |
11 |
>> a couple 'fields' (as in "designated areas of the display") |
12 |
>> in which the relevant data is being displayed while the |
13 |
>> program is being used. The acquired data needs to be |
14 |
>> entered into a mysql database, preferably as soon as |
15 |
>> possible. (The program needs windoze, and the sources |
16 |
>> are unavailable :( ) |
17 |
>> |
18 |
>> |
19 |
>> The idea is to make a screen recording and postprocess |
20 |
>> the recording with some sort of OCR software. This might |
21 |
>> require using ffmpeg (or the like) to create a single |
22 |
>> image from each frame of the recording; then treat each |
23 |
>> image with an OCR software to get the interesting data |
24 |
>> which can then be entered into the database. |
25 |
>> |
26 |
>> Data to extract is mostly numbers. The relevant fields |
27 |
>> can be expected to be either filled or empty. The FPS rate |
28 |
>> of the recording can be kept reasonably low, like 1 FPS, |
29 |
>> or perhaps even less, depending on how frequent the relevant |
30 |
>> fields change. |
31 |
>> |
32 |
>> Using tesseract comes to mind, but after reading that |
33 |
>> |
34 |
>> "Tesseract's output will be very poor quality if the input |
35 |
>> images are not preprocessed to suit it: Images (especially |
36 |
>> screenshots) must be scaled up such that the text x-height |
37 |
>> is at least 20 pixels,[12] any rotation or skew must be |
38 |
>> corrected or no text will be recognized, low-frequency |
39 |
>> changes in brightness must be high-pass filtered, or |
40 |
>> Tesseract's binarization stage will destroy much of the |
41 |
>> page, and dark borders must be manually removed, or they |
42 |
>> will be misinterpreted as characters."[1] |
43 |
>> |
44 |
>> I'm even more doubtful that this would produce usable |
45 |
>> results with sufficient reliability. |
46 |
>> |
47 |
>> So what might be the best way to get text/numbers out of |
48 |
>> what a program displays? |
49 |
>> |
50 |
>> |
51 |
>> [1]: https://en.wikipedia.org/wiki/Tesseract_(software) |
52 |
>> |
53 |
> |
54 |
> I can't help with Gentoo. |
55 |
> Try to find an old (free) version of FineReader which runs under wine. |
56 |
> If you do it only occasionally, transfer the image to an Android phone where there a good and cheap OCR apps, even FineReader. |
57 |
|
58 |
It would be too much video to process. Besides, phones are |
59 |
ok for making phone calls and entirely incompatible with |
60 |
computers, which makes them useless for anything else but |
61 |
making phone calls. |