1 |
On 04/08/2016 03:26:53 PM, hw wrote: |
2 |
> |
3 |
> Hi, |
4 |
> |
5 |
> what would be the best approach to extract data |
6 |
> from a screencast? |
7 |
> |
8 |
> The task is to acquire some data from the display of |
9 |
> a GUI program used interactively by a user. There are |
10 |
> a couple 'fields' (as in "designated areas of the display") |
11 |
> in which the relevant data is being displayed while the |
12 |
> program is being used. The acquired data needs to be |
13 |
> entered into a mysql database, preferably as soon as |
14 |
> possible. (The program needs windoze, and the sources |
15 |
> are unavailable :( ) |
16 |
> |
17 |
> |
18 |
> The idea is to make a screen recording and postprocess |
19 |
> the recording with some sort of OCR software. This might |
20 |
> require using ffmpeg (or the like) to create a single |
21 |
> image from each frame of the recording; then treat each |
22 |
> image with an OCR software to get the interesting data |
23 |
> which can then be entered into the database. |
24 |
> |
25 |
> Data to extract is mostly numbers. The relevant fields |
26 |
> can be expected to be either filled or empty. The FPS rate |
27 |
> of the recording can be kept reasonably low, like 1 FPS, |
28 |
> or perhaps even less, depending on how frequent the relevant |
29 |
> fields change. |
30 |
> |
31 |
> Using tesseract comes to mind, but after reading that |
32 |
> |
33 |
> "Tesseract's output will be very poor quality if the input |
34 |
> images are not preprocessed to suit it: Images (especially |
35 |
> screenshots) must be scaled up such that the text x-height |
36 |
> is at least 20 pixels,[12] any rotation or skew must be |
37 |
> corrected or no text will be recognized, low-frequency |
38 |
> changes in brightness must be high-pass filtered, or |
39 |
> Tesseract's binarization stage will destroy much of the |
40 |
> page, and dark borders must be manually removed, or they |
41 |
> will be misinterpreted as characters."[1] |
42 |
> |
43 |
> I'm even more doubtful that this would produce usable |
44 |
> results with sufficient reliability. |
45 |
> |
46 |
> So what might be the best way to get text/numbers out of |
47 |
> what a program displays? |
48 |
> |
49 |
> |
50 |
> [1]: https://en.wikipedia.org/wiki/Tesseract_(software) |
51 |
> |
52 |
|
53 |
I can't help with Gentoo. |
54 |
Try to find an old (free) version of FineReader which runs under wine. |
55 |
If you do it only occasionally, transfer the image to an Android phone |
56 |
where there a good and cheap OCR apps, even FineReader. |