1 |
-----BEGIN PGP SIGNED MESSAGE----- |
2 |
Hash: SHA1 |
3 |
|
4 |
I'm writing a sed script that will parse the *broken* output of |
5 |
man2html. I say broken, because the output isn't W3C compliant (html |
6 |
OR xhtml). I'd like to be able to modify it so that the final outcome |
7 |
is XHTML 1.0 compliant. I'm running into a problem where the output |
8 |
doesn't close the <p>, <dt>, or <dd> tags. XHTML requires that tags |
9 |
containing text be closed. So the problem I'm having is being able to |
10 |
take note of the starting tag, grab the subsequent paragraph, then |
11 |
insert the closing tag. What I've got /sort of/ works, but still not. |
12 |
|
13 |
Here's a sample that has been parsed, but not with the <p> modifying |
14 |
elements: |
15 |
|
16 |
<p> |
17 |
|
18 |
Regular expression support is provided by the PCRE library package, |
19 |
which is open source software, written by Philip Hazel, and copyright |
20 |
by the University of Cambridge, England. See <a |
21 |
href="http://www.pcre.org/">http://www.pcre.org/</a> . |
22 |
|
23 |
<p> |
24 |
|
25 |
Nmap can optionally link to the OpenSSL cryptography toolkit, which is |
26 |
available from <a |
27 |
href="http://www.openssl.org/">http://www.openssl.org/</a> . |
28 |
|
29 |
|
30 |
Here's the entire sedscr (sans comments): |
31 |
|
32 |
/^$/{ |
33 |
N |
34 |
/^\n$/d |
35 |
} |
36 |
/^Content-type: text\/html/c\ |
37 |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" |
38 |
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
39 |
s%<\(HTML\|P\|HEAD\|TITLE\|BODY\|STRONG\|EM\|H[123456]\|D[DLT]\|T[TDRH]\)>%\L<\1>%g |
40 |
s%<\/\(HTML\|P\|A\|HEAD\|TITLE\|BODY\|STRONG\|EM\|H[123456]\|D[DLT]\|T[TDRH]\)>%\L</\1>%g |
41 |
s%<BR>%<br />%g |
42 |
s%<HR>%<hr />%g |
43 |
s%<[Dd][Ll] [Cc][Oo][Mm][Pp][Aa][Cc][Tt]>%<dl compact="compact">% |
44 |
s%<A HREF\(.*\)>%<a href\1>%g |
45 |
s%<A NAME\(.*\)>%<a name\1>%g |
46 |
/^<[IB]>.*$/{ |
47 |
N |
48 |
s%\(<[IB]>\)\(.*\)\(<\/[IB]>\)\n%\L\1\2\L\3% |
49 |
} |
50 |
/^<[ib]>.*$/{ |
51 |
N |
52 |
s%\n%% |
53 |
} |
54 |
s%<[IB]>%\L&% |
55 |
s%<\/[IB]>%\L&% |
56 |
/<body>/,/<\/body>/{ |
57 |
/<p>/!{ |
58 |
H |
59 |
d |
60 |
} |
61 |
/<p>/{ |
62 |
x |
63 |
s/$/<\/p>/ |
64 |
G |
65 |
} |
66 |
} |
67 |
/^<p>$/,/<\p>$/{ |
68 |
N |
69 |
/^\n<p>$/d |
70 |
} |
71 |
|
72 |
|
73 |
Here's the funkiness after parsing with the last part |
74 |
(/<body>/,/<\/body>/{) enabled: |
75 |
|
76 |
<p> |
77 |
|
78 |
<p> |
79 |
|
80 |
Regular expression support is provided by the PCRE library package, |
81 |
which is open source software, written by Philip Hazel, and copyright |
82 |
by the University of Cambridge, England. See <a |
83 |
href="http://www.pcre.org/">http://www.pcre.org/</a> .</p> |
84 |
|
85 |
<p> |
86 |
|
87 |
<p> |
88 |
|
89 |
Nmap can optionally link to the OpenSSL cryptography toolkit, which is |
90 |
available from <a |
91 |
href="http://www.openssl.org/">http://www.openssl.org/</a> .</p> |
92 |
|
93 |
|
94 |
|
95 |
(Just in case you were wondering, this IS from the nmap man page. ;-) |
96 |
Thanks. |
97 |
|
98 |
- -- |
99 |
gentux |
100 |
echo "hfouvyAdpy/ofu" | perl -pe 's/(.)/chr(ord($1)-1)/ge' |
101 |
|
102 |
gentux's gpg fingerprint ==> 34CE 2E97 40C7 EF6E EC40 9795 2D81 924A |
103 |
6996 0993 |
104 |
-----BEGIN PGP SIGNATURE----- |
105 |
Version: GnuPG v1.4.1 (GNU/Linux) |
106 |
|
107 |
iD8DBQFDOMBkLYGSSmmWCZMRAnnrAJwKNqr+/OgBdDD8X8PXX6rpKUfaxQCfU9PW |
108 |
Bs2oA/76RYFbbc7DWEpfTM8= |
109 |
=gcc/ |
110 |
-----END PGP SIGNATURE----- |
111 |
|
112 |
-- |
113 |
gentoo-user@g.o mailing list |