Gentoo Archives: gentoo-user

From: felix@×××××××.com
To: gentoo-user@l.g.o
Subject: [gentoo-user] Trying to automate HTML ---> pdf
Date: Sun, 27 Jan 2008 17:08:36
Message-Id: 20080127170615.GA30515@crowfix.com
1 I am trying to automate converting a URL into a pdf file. These web
2 pages include javascript and fancy formatting, so the simple minded
3 converters just don't cut the ice. My next plan was to hack up a real
4 browser so it would take two command line args, the URL and the print
5 file, render the page, print it to the pdf file, and exit. From what
6 I know of some of them, they would have to be configured in advance,
7 and invocation would have to be strictly controlled so only one
8 instance runs at a time, at least per user. I could probably create
9 several firefox user sessions and have each of them running
10 simultaneously, but multiple real users works for me too.
11
12 Firefox doesn't print to pdf, however. But konqueror does. By using
13 the DCOP interface, I can even pass it commands to load a URL and
14 print the page, altho I have to settle for the configured print file
15 name. But since I have to run individual sessions anyway, that's no
16 big deal. The commands look like this:
17
18 dcop konqueror-6352 'konqueror-mainwindow#1' openURL 'http://slashdot.org'
19 dcop konqueror-6352 html-widget2 print true
20
21 There's a bit more than that, since widget names change, but a simple
22 perl program handles it easily (so far!).
23
24 However, there's a problem. The "openURL" command returns without
25 waiting for the web page to finish loading, and the "print" command
26 does not wait for it to finish loading. The "print" command does wait
27 for printing to finish before returning, which is nice.
28
29 This means I have to put in some arbitrary "sleep 30" or so between
30 "openURL" and "print" to have a good chance of a complete printed
31 page, and even then, there is no guarantee it actually will be
32 complete. We have to send these pdf files to a bank, and it would not
33 be good to send them incomplete pages, even if only one out of 100 or
34 even 1000. There will be at least hundreds of these every day.
35
36 I started to look at sources but there is no "konqueror-3.5.8.tar.gz"
37 or anything similar. No doubt most of the code is handled by Qt
38 widgets and KDE libs.
39
40 Here are my quests:
41
42 0. Is there a better place to ask this? I tried a KDE mailing list
43 and got no responses; there weren't even many views.
44
45 1. Is there either a DCOP command to wait for a URL to be loaded or a
46 DCOP command like openURL which waits?
47
48 2. Is there a source file for konqueror which I could hack to take
49 command line parameters without changing libraries or other code
50 which would affect the rest of KDE? I don't have any problem with
51 a hacked and renamed konqueror command.
52
53 3. Is there some other way of converting complicated web pages into
54 pdf? If they don't understand javascript and style sheets and
55 everything else that a real browser does, they are useless to me.
56
57 4. Are there other ways to do this that I haven't thought of?
58
59 --
60 ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
61 Felix Finch: scarecrow repairman & rocket surgeon / felix@×××××××.com
62 GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933
63 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
64 --
65 gentoo-user@l.g.o mailing list

Replies

Subject Author
Re: [gentoo-user] Trying to automate HTML ---> pdf Neil Bothwick <neil@××××××××××.uk>
Re: [gentoo-user] Trying to automate HTML ---> pdf Etaoin Shrdlu <shrdlu@×××××××××××××.org>
[gentoo-user] Re: Trying to automate HTML ---> pdf Grant Edwards <grante@××××.com>