1 |
jkt 05/07/26 10:46:47 |
2 |
|
3 |
Added: xml/htdocs/doc/en/articles l-sed1.xml l-sed2.xml l-sed3.xml |
4 |
Log: |
5 |
#99049, "Common threads: Sed by example", converted by rane |
6 |
|
7 |
Revision Changes Path |
8 |
1.1 xml/htdocs/doc/en/articles/l-sed1.xml |
9 |
|
10 |
file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed1.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo |
11 |
plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed1.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo |
12 |
|
13 |
Index: l-sed1.xml |
14 |
=================================================================== |
15 |
<?xml version='1.0' encoding="UTF-8"?> |
16 |
<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-sed1.xml,v 1.1 2005/07/26 10:46:47 jkt Exp $ --> |
17 |
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
18 |
|
19 |
<guide link="/doc/en/articles/l-sed1.xml"> |
20 |
<title>Sed by example, Part 1</title> |
21 |
|
22 |
<author title="Author"> |
23 |
<mail link="drobbins@g.o">Daniel Robbins</mail> |
24 |
</author> |
25 |
<author title="Editor"> |
26 |
<mail link="rane@××××××.pl">Łukasz Damentko</mail> |
27 |
</author> |
28 |
|
29 |
<abstract> |
30 |
In this series of articles, Daniel Robbins will show you how to use the very |
31 |
powerful (but often forgotten) UNIX stream editor, sed. Sed is an ideal tool for |
32 |
batch-editing files or for creating shell scripts to modify existing files in |
33 |
powerful ways. |
34 |
</abstract> |
35 |
|
36 |
<!-- The original version of this article was published on IBM developerWorks, |
37 |
and is property of Westtech Information Services. This document is an updated |
38 |
version of the original article, and contains various improvements made by the |
39 |
Gentoo Linux Documentation team --> |
40 |
|
41 |
<version>1.0</version> |
42 |
<date>2005-07-15</date> |
43 |
|
44 |
<chapter> |
45 |
<title>Get to know the powerful UNIX editor</title> |
46 |
<section> |
47 |
<title>Pick an editor</title> |
48 |
<body> |
49 |
|
50 |
<note> |
51 |
The original version of this article was published on IBM developerWorks, and is |
52 |
property of Westtech Information Services. This document is an updated version |
53 |
of the original article, and contains various improvements made by the Gentoo |
54 |
Linux Documentation team. |
55 |
</note> |
56 |
|
57 |
<p> |
58 |
In the UNIX world, we have a lot of options when it comes to editing files. |
59 |
Think of it -- vi, emacs, and jed come to mind, as well as many others. We all |
60 |
have our favorite editor (along with our favorite keybindings) that we have come |
61 |
to know and love. With our trusty editor, we are ready to tackle any number of |
62 |
UNIX-related administration or programming tasks with ease.</p> |
63 |
|
64 |
<p> |
65 |
While interactive editors are great, they do have limitations. Though their |
66 |
interactive nature can be a strength, it can also be a weakness. Consider a |
67 |
situation where you need to perform similar types of changes on a group of |
68 |
files. You could instinctively fire up your favorite editor and perform a bunch |
69 |
of mundane, repetitive, and time-consuming edits by hand. But there's a better |
70 |
way. |
71 |
</p> |
72 |
|
73 |
</body> |
74 |
</section> |
75 |
<section> |
76 |
<title>Enter sed</title> |
77 |
<body> |
78 |
|
79 |
<p> |
80 |
It would be nice if we could automate the process of making edits to files, so |
81 |
that we could "batch" edit files, or even write scripts with the ability to |
82 |
perform sophisticated changes to existing files. Fortunately for us, for these |
83 |
types of situations, there is a better way -- and the better way is called sed. |
84 |
</p> |
85 |
|
86 |
<p> |
87 |
sed is a lightweight stream editor that's included with nearly all UNIX flavors, |
88 |
including Linux. sed has a lot of nice features. First of all, it's very |
89 |
lightweight, typically many times smaller than your favorite scripting language. |
90 |
Secondly, because sed is a stream editor, it can perform edits to data it |
91 |
receives from stdin, such as from a pipeline. So, you don't need to have the |
92 |
data to be edited stored in a file on disk. Because data can just as easily be |
93 |
piped to sed, it's very easy to use sed as part of a long, complex pipeline in a |
94 |
powerful shell script. Try doing that with your favorite editor. |
95 |
</p> |
96 |
|
97 |
</body> |
98 |
</section> |
99 |
<section> |
100 |
<title>GNU sed</title> |
101 |
<body> |
102 |
|
103 |
<p> |
104 |
Fortunately for us Linux users, one of the nicest versions of sed out there |
105 |
happens to be GNU sed, which is currently at version 3.02. Every Linux |
106 |
distribution has GNU sed, or at least should. GNU sed is popular not only |
107 |
because its sources are freely distributable, but because it happens to have a |
108 |
lot of handy, time-saving extensions to the POSIX sed standard. GNU sed also |
109 |
doesn't suffer from many of the limitations that earlier and proprietary |
110 |
versions of sed had, such as a limited line length -- GNU sed handles lines of |
111 |
any length with ease. |
112 |
</p> |
113 |
|
114 |
</body> |
115 |
</section> |
116 |
<section> |
117 |
<title>The newest GNU sed</title> |
118 |
<body> |
119 |
|
120 |
<p> |
121 |
While researching this article, I noticed that several online sed aficionados |
122 |
made reference to a GNU sed 3.02a. Strangely, I couldn't find sed 3.02a on |
123 |
<uri>ftp://ftp.gnu.org</uri> (see <uri link="#resources">Resources</uri> for |
124 |
these links), so I had to go look for it elsewhere. I found it at |
125 |
<uri>ftp://alpha.gnu.org</uri>, in <path>/pub/sed</path>. I happily downloaded |
126 |
it, compiled it, and installed it, only to find minutes later that the most |
127 |
recent version of sed is 3.02.80 -- and you can find its sources right next to |
128 |
those for 3.02a, at <uri>ftp://alpha.gnu.org</uri>. After getting GNU sed |
129 |
3.02.80 installed, I was finally ready to go. |
130 |
</p> |
131 |
|
132 |
</body> |
133 |
</section> |
134 |
<section> |
135 |
<title>The right sed</title> |
136 |
<body> |
137 |
|
138 |
<p> |
139 |
In this series, we will be using GNU sed 3.02.80. Some (but very few) of the |
140 |
most advanced examples you'll find in my upcoming, follow-on articles in this |
141 |
series will not work with GNU sed 3.02 or 3.02a. If you're using a non-GNU sed, |
142 |
your results may vary. Why not take some time to install GNU sed 3.02.80 now? |
143 |
Then, not only will you be ready for the rest of the series, but you'll also be |
144 |
able to use arguably the best sed in existence! |
145 |
</p> |
146 |
|
147 |
</body> |
148 |
</section> |
149 |
<section> |
150 |
<title>Sed examples</title> |
151 |
<body> |
152 |
|
153 |
<p> |
154 |
Sed works by performing any number of user-specified editing operations |
155 |
("commands") on the input data. Sed is line-based, so the commands are performed |
156 |
on each line in order. And, sed writes its results to standard output (stdout); |
157 |
it doesn't modify any input files. |
158 |
</p> |
159 |
|
160 |
<p> |
161 |
Let's look at some examples. The first several are going to be a bit weird |
162 |
because I'm using them to illustrate how sed works rather than to perform any |
163 |
useful task. However, if you're new to sed, it's very important that you |
164 |
understand them. Here's our first example: |
165 |
</p> |
166 |
|
167 |
<pre caption="Example of sed usage"> |
168 |
$ <i>sed -e 'd' /etc/services</i> |
169 |
</pre> |
170 |
|
171 |
<p> |
172 |
If you type this command, you'll get absolutely no output. Now, what happened? |
173 |
In this example, we called sed with one editing command, <c>d</c>. Sed opened |
174 |
the <path>/etc/services</path> file, read a line into its pattern buffer, |
175 |
performed our editing command ("delete line"), and then printed the pattern |
176 |
buffer (which was empty). It then repeated these steps for each successive line. |
177 |
This produced no output, because the <c>d</c> command zapped every single line |
178 |
in the pattern buffer! |
179 |
</p> |
180 |
|
181 |
<p> |
182 |
There are a couple of things to notice in this example. First, |
183 |
<path>/etc/services</path> was not modified at all. This is because, again, sed |
184 |
only reads from the file you specify on the command line, using it as input -- |
185 |
it doesn't try to modify the file. The second thing to notice is that sed is |
186 |
line-oriented. The <c>d</c> command didn't simply tell sed to delete all incoming |
187 |
data in one fell swoop. Instead, sed read each line of /etc/services one by one |
188 |
into its internal buffer, called the pattern buffer. Once a line was read into |
189 |
the pattern buffer, it performed the <c>d</c> command and printed the contents |
190 |
of the pattern buffer (nothing in this example). Later, I'll show you how to use |
191 |
address ranges to control which lines a command is applied to -- but in the |
192 |
absence of addresses, a command is applied to all lines. |
193 |
</p> |
194 |
|
195 |
<p> |
196 |
The third thing to notice is the use of single quotes to surround the <c>d</c> |
197 |
command. It's a good idea to get into the habit of using single quotes to |
198 |
surround your sed commands, so that shell expansion is disabled. |
199 |
</p> |
200 |
|
201 |
</body> |
202 |
</section> |
203 |
<section> |
204 |
<title>Another sed example</title> |
205 |
<body> |
206 |
|
207 |
<p> |
208 |
Here's an example of how to use sed to remove the first line of the |
209 |
<path>/etc/services</path> file from our output stream: |
210 |
</p> |
211 |
|
212 |
<pre caption="Another sed example"> |
213 |
$ <i>sed -e '1d' /etc/services | more</i> |
214 |
|
215 |
|
216 |
|
217 |
1.1 xml/htdocs/doc/en/articles/l-sed2.xml |
218 |
|
219 |
file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed2.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo |
220 |
plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed2.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo |
221 |
|
222 |
Index: l-sed2.xml |
223 |
=================================================================== |
224 |
<?xml version='1.0' encoding="UTF-8"?> |
225 |
<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-sed2.xml,v 1.1 2005/07/26 10:46:47 jkt Exp $ --> |
226 |
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
227 |
|
228 |
<guide link="/doc/en/articles/l-sed2.xml"> |
229 |
<title>Sed by example, Part 2</title> |
230 |
|
231 |
<author title="Author"> |
232 |
<mail link="drobbins@g.o">Daniel Robbins</mail> |
233 |
</author> |
234 |
<author title="Editor"> |
235 |
<mail link="rane@××××××.pl">Łukasz Damentko</mail> |
236 |
</author> |
237 |
|
238 |
<abstract> |
239 |
Sed is a very powerful and compact text stream editor. In this article, the |
240 |
second in the series, Daniel shows you how to use sed to perform string |
241 |
substitution; create larger sed scripts; and use sed's append, insert, and |
242 |
change line commands. |
243 |
</abstract> |
244 |
|
245 |
<!-- The original version of this article was published on IBM developerWorks, |
246 |
and is property of Westtech Information Services. This document is an updated |
247 |
version of the original article, and contains various improvements made by the |
248 |
Gentoo Linux Documentation team --> |
249 |
|
250 |
<version>1.0</version> |
251 |
<date>2005-07-15</date> |
252 |
|
253 |
<chapter> |
254 |
<title>How to further take advantage of the UNIX text editor</title> |
255 |
<section> |
256 |
<title>Substitution!</title> |
257 |
<body> |
258 |
|
259 |
<note> |
260 |
The original version of this article was published on IBM developerWorks, and is |
261 |
property of Westtech Information Services. This document is an updated version |
262 |
of the original article, and contains various improvements made by the Gentoo |
263 |
Linux Documentation team. |
264 |
</note> |
265 |
|
266 |
|
267 |
<p> |
268 |
Let's look at one of sed's most useful commands, the substitution command. |
269 |
Using it, we can replace a particular string or matched regular expression with |
270 |
another string. Here's an example of the most basic use of this command: |
271 |
</p> |
272 |
|
273 |
<pre caption="Most basic use of substitution command"> |
274 |
$ <i>sed -e 's/foo/bar/' myfile.txt</i> |
275 |
</pre> |
276 |
|
277 |
<p> |
278 |
The above command will output the contents of myfile.txt to stdout, with the |
279 |
first occurrence of 'foo' (if any) on each line replaced with the string 'bar'. |
280 |
Please note that I said first occurrence on each line, though this is normally |
281 |
not what you want. Normally, when I do a string replacement, I want to perform |
282 |
it globally. That is, I want to replace all occurrences on every line, as |
283 |
follows: |
284 |
</p> |
285 |
|
286 |
<pre caption="Replacing all the occurences on every line"> |
287 |
$ <i>sed -e 's/foo/bar/g' myfile.txt</i> |
288 |
</pre> |
289 |
|
290 |
<p> |
291 |
The additional 'g' option after the last slash tells sed to perform a global |
292 |
replace. |
293 |
</p> |
294 |
|
295 |
<p> |
296 |
Here are a few other things you should know about the <c>s///</c> substitution |
297 |
command. First, it is a command, and a command only; there are no addresses |
298 |
specified in any of the above examples. This means that the <c>s///</c> command |
299 |
can also be used with addresses to control what lines it will be applied to, as |
300 |
follows: |
301 |
</p> |
302 |
|
303 |
<pre caption="Specifying lines command will be applied to"> |
304 |
$ <i>sed -e '1,10s/enchantment/entrapment/g' myfile2.txt</i> |
305 |
</pre> |
306 |
|
307 |
<p> |
308 |
The above example will cause all occurrences of the phrase 'enchantment' to be |
309 |
replaced with the phrase 'entrapment', but only on lines one through ten, |
310 |
inclusive. |
311 |
</p> |
312 |
|
313 |
<pre caption="Specifying more options"> |
314 |
$ <i>sed -e '/^$/,/^END/s/hills/mountains/g' myfile3.txt</i> |
315 |
</pre> |
316 |
|
317 |
<p> |
318 |
This example will swap 'hills' for 'mountains', but only on blocks of text |
319 |
beginning with a blank line, and ending with a line beginning with the three |
320 |
characters 'END', inclusive. |
321 |
</p> |
322 |
|
323 |
<p> |
324 |
Another nice thing about the <c>s///</c> command is that we have a lot of |
325 |
options when it comes to those <c>/</c> separators. If we're performing string |
326 |
substitution and the regular expression or replacement string has a lot of |
327 |
slashes in it, we can change the separator by specifying a different character |
328 |
after the 's'. For example, this will replace all occurrences of |
329 |
<path>/usr/local</path> with <path>/usr</path>: |
330 |
</p> |
331 |
|
332 |
<pre caption="Replacing all the occurences of one string with another one"> |
333 |
$ <i>sed -e 's:/usr/local:/usr:g' mylist.txt</i> |
334 |
</pre> |
335 |
|
336 |
<note> |
337 |
In this example, we're using the colon as a separator. If you ever need to |
338 |
specify the separator character in the regular expression, put a backslash |
339 |
before it. |
340 |
</note> |
341 |
|
342 |
</body> |
343 |
</section> |
344 |
<section> |
345 |
<title>Regexp snafus</title> |
346 |
<body> |
347 |
|
348 |
<p> |
349 |
Up until now, we've only performed simple string substitution. While this is |
350 |
handy, we can also match a regular expression. For example, the following sed |
351 |
command will match a phrase beginning with '<' and ending with '>', and |
352 |
containing any number of characters inbetween. This phrase will be deleted |
353 |
(replaced with an empty string): |
354 |
</p> |
355 |
|
356 |
<pre caption="Deleting specified phrase"> |
357 |
$ <i>sed -e 's/<.*>//g' myfile.html</i> |
358 |
</pre> |
359 |
|
360 |
<p> |
361 |
This is a good first attempt at a sed script that will remove HTML tags from a |
362 |
file, but it won't work well, due to a regular expression quirk. The reason? |
363 |
When sed tries to match the regular expression on a line, it finds the longest |
364 |
match on the line. This wasn't an issue in my previous sed article, because we |
365 |
were using the <c>d</c> and <c>p</c> commands, which would delete or print the |
366 |
entire line anyway. But when we use the <c>s///</c> command, it definitely makes |
367 |
a big difference, because the entire portion that the regular expression matches |
368 |
will be replaced with the target string, or in this case, deleted. This means |
369 |
that the above example will turn the following line: |
370 |
</p> |
371 |
|
372 |
<pre caption="Sample HTML code"> |
373 |
<b>This</b> is what <b>I</b> meant. |
374 |
</pre> |
375 |
|
376 |
<p> |
377 |
Into this: |
378 |
</p> |
379 |
|
380 |
<pre caption="Not desired effect"> |
381 |
meant. |
382 |
</pre> |
383 |
|
384 |
<p> |
385 |
Rather than this, which is what we wanted to do: |
386 |
</p> |
387 |
|
388 |
<pre caption="Desired effect"> |
389 |
This is what I meant. |
390 |
</pre> |
391 |
|
392 |
<p> |
393 |
Fortunately, there is an easy way to fix this. Instead of typing in a regular |
394 |
expression that says "a '<' character followed by any number of characters, and |
395 |
ending with a '>' character", we just need to type in a regexp that says "a |
396 |
'<' character followed by any number of non-'>' characters, and ending |
397 |
with a '>' character". This will have the effect of matching the shortest |
398 |
possible match, rather than the longest possible one. The new command looks like |
399 |
this: |
400 |
</p> |
401 |
|
402 |
<pre caption=""> |
403 |
$ <i>sed -e 's/<[^>]*>//g' myfile.html</i> |
404 |
</pre> |
405 |
|
406 |
<p> |
407 |
In the above example, the '[^>]' specifies a "non-'>'" character, and the '*' |
408 |
after it completes this expression to mean "zero or more non-'>' characters". |
409 |
Test this command on a few sample html files, pipe them to more, and review |
410 |
their results. |
411 |
</p> |
412 |
|
413 |
</body> |
414 |
</section> |
415 |
<section> |
416 |
<title>More character matching</title> |
417 |
<body> |
418 |
|
419 |
<p> |
420 |
The '[ ]' regular expression syntax has some more additional options. To specify |
421 |
a range of characters, you can use a '-' as long as it isn't in the first or |
422 |
last position, as follows: |
423 |
|
424 |
|
425 |
|
426 |
1.1 xml/htdocs/doc/en/articles/l-sed3.xml |
427 |
|
428 |
file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed3.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo |
429 |
plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed3.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo |
430 |
|
431 |
Index: l-sed3.xml |
432 |
=================================================================== |
433 |
<?xml version='1.0' encoding="UTF-8"?> |
434 |
<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-sed3.xml,v 1.1 2005/07/26 10:46:47 jkt Exp $ --> |
435 |
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
436 |
|
437 |
<guide link="/doc/en/articles/l-sed3.xml"> |
438 |
<title>Sed by example, Part 3</title> |
439 |
|
440 |
<author title="Author"> |
441 |
<mail link="drobbins@g.o">Daniel Robbins</mail> |
442 |
</author> |
443 |
<author title="Editor"> |
444 |
<mail link="rane@××××××.pl">Łukasz Damentko</mail> |
445 |
</author> |
446 |
|
447 |
<abstract> |
448 |
Sed is a very powerful and compact text stream editor. In this article, the |
449 |
second in the series, Daniel shows you how to use sed to perform string |
450 |
substitution; create larger sed scripts; and use sed's append, insert, and |
451 |
change line commands. |
452 |
</abstract> |
453 |
|
454 |
<!-- The original version of this article was published on IBM developerWorks, |
455 |
and is property of Westtech Information Services. This document is an updated |
456 |
version of the original article, and contains various improvements made by the |
457 |
Gentoo Linux Documentation team --> |
458 |
|
459 |
<version>1.0</version> |
460 |
<date>2005-07-16</date> |
461 |
|
462 |
<chapter> |
463 |
<title>Taking it to the next level: Data crunching, sed style</title> |
464 |
<section> |
465 |
<title>Muscular sed</title> |
466 |
<body> |
467 |
|
468 |
<note> |
469 |
The original version of this article was published on IBM developerWorks, and is |
470 |
property of Westtech Information Services. This document is an updated version |
471 |
of the original article, and contains various improvements made by the Gentoo |
472 |
Linux Documentation team. |
473 |
</note> |
474 |
|
475 |
<p> |
476 |
In <uri link="l-sed2.xml">my second sed article</uri>, I |
477 |
offered examples that demonstrated how sed works, but very few of these examples |
478 |
actually did anything particularly useful. In this final sed article, it's time |
479 |
to change that pattern and put sed to good use. I'll show you several excellent |
480 |
examples that not only demonstrate the power of sed, but also do some really |
481 |
neat (and handy) things. For example, in the second half of the article, I'll |
482 |
show you how I designed a sed script that converts a .QIF file from Intuit's |
483 |
Quicken financial program into a nicely formatted text file. Before doing that, |
484 |
we'll take a look at some less complicated yet useful sed scripts. |
485 |
</p> |
486 |
|
487 |
</body> |
488 |
</section> |
489 |
<section> |
490 |
<title>Text translation</title> |
491 |
<body> |
492 |
|
493 |
<p> |
494 |
Our first practical script converts UNIX-style text to DOS/Windows format. As |
495 |
you probably know, DOS/Windows-based text files have a CR (carriage return) and |
496 |
LF (line feed) at the end of each line, while UNIX text has only a line feed. |
497 |
There may be times when you need to move some UNIX text to a Windows system, and |
498 |
this script will perform the necessary format conversion for you. |
499 |
</p> |
500 |
|
501 |
<pre caption="Format conversion between UNIX and Windows"> |
502 |
$ <i>sed -e 's/$/\r/' myunix.txt > mydos.txt</i> |
503 |
</pre> |
504 |
|
505 |
<p> |
506 |
In this script, the '$' regular expression will match the end of the line, and |
507 |
the '\r' tells sed to insert a carriage return right before it. Insert a |
508 |
carriage return before a line feed, and presto, a CR/LF ends each line. Please |
509 |
note that the '\r' will be replaced with a CR only when using GNU sed 3.02.80 or |
510 |
later. If you haven't installed GNU sed 3.02.80 yet, see <uri |
511 |
link="l-sed1.xml">my first sed article</uri> for instructions on |
512 |
how to do this. |
513 |
</p> |
514 |
|
515 |
<p> |
516 |
I can't tell you how many times I've downloaded some example script or C code, |
517 |
only to find that it's in DOS/Windows format. While many programs don't mind |
518 |
DOS/Windows format CR/LF text files, several programs definitely do -- the most |
519 |
notable being bash, which chokes as soon as it encounters a carriage return. The |
520 |
following sed invocation will convert DOS/Windows format text to trusty UNIX |
521 |
format: |
522 |
</p> |
523 |
|
524 |
<pre caption="Converting C code from Windows to UNIX format"> |
525 |
$ <i>sed -e 's/.$//' mydos.txt > myunix.txt</i> |
526 |
</pre> |
527 |
|
528 |
<p> |
529 |
The way this script works is simple: our substitution regular expression matches |
530 |
the last character on the line, which happens to be a carriage return. We |
531 |
replace it with nothing, causing it to be deleted from the output entirely. If |
532 |
you use this script and notice that the last character of every line of the |
533 |
output has been deleted, you've specified a text file that's already in UNIX |
534 |
format. No need for that! |
535 |
</p> |
536 |
|
537 |
</body> |
538 |
</section> |
539 |
<section> |
540 |
<title>Reversing lines</title> |
541 |
<body> |
542 |
|
543 |
<p> |
544 |
Here's another handy little script. This one will reverse lines in a file, |
545 |
similar to the "tac" command that's included with most Linux distributions. The |
546 |
name "tac" may be a bit misleading, because "tac" doesn't reverse the position |
547 |
of characters on the line (left and right), but rather the position of lines in |
548 |
the file (up and down). Tacing the following file: |
549 |
</p> |
550 |
|
551 |
<pre caption="Sample file"> |
552 |
foo |
553 |
bar |
554 |
oni |
555 |
</pre> |
556 |
|
557 |
<p> |
558 |
....produces the following output: |
559 |
</p> |
560 |
|
561 |
<pre caption="Output file"> |
562 |
oni |
563 |
bar |
564 |
foo |
565 |
</pre> |
566 |
|
567 |
<p> |
568 |
We can do the same thing with the following sed script: |
569 |
</p> |
570 |
|
571 |
<pre caption="Doing same with script"> |
572 |
$ <i>sed -e '1!G;h;$!d' forward.txt > backward.txt</i> |
573 |
</pre> |
574 |
|
575 |
<p> |
576 |
You'll find this sed script useful if you're logged in to a FreeBSD system, |
577 |
which doesn't happen to have a "tac" command. While handy, it's also a good idea |
578 |
to know why this script does what it does. Let's dissect it. |
579 |
</p> |
580 |
|
581 |
</body> |
582 |
</section> |
583 |
<section> |
584 |
<title>Reversal explained</title> |
585 |
<body> |
586 |
|
587 |
<p> |
588 |
First, this script contains three separate sed commands, separated by |
589 |
semicolons: '1!G', 'h' and '$!d'. Now, it's time to get an good understanding of |
590 |
the addresses used for the first and third commands. If the first command were |
591 |
'1G', the 'G' command would be applied only to the first line. However, there is |
592 |
an additional '!' character -- this '!' character negates the address, meaning |
593 |
that the 'G' command will apply to all but the first line. For the '$!d' |
594 |
command, we have a similar situation. If the command were '$d', it would apply |
595 |
the 'd' command to only the last line in the file (the '$' address is a simple |
596 |
way of specifying the last line). However, with the '!', '$!d' will apply the |
597 |
'd' command to all but the last line. Now, all we need to to is understand what |
598 |
the commands themselves do. |
599 |
</p> |
600 |
|
601 |
<p> |
602 |
When we execute our line reversal script on the text file above, the first |
603 |
command that gets executed is 'h'. This command tells sed to copy the contents |
604 |
of the pattern space (the buffer that holds the current line being worked on) to |
605 |
the hold space (a temporary buffer). Then, the 'd' command is executed, which |
606 |
deletes "foo" from the pattern space, so it doesn't get printed after all the |
607 |
commands are executed for this line. |
608 |
</p> |
609 |
|
610 |
<p> |
611 |
Now, line two. After "bar" is read into the pattern space, the 'G' command is |
612 |
executed, which appends the contents of the hold space ("foo\n") to the pattern |
613 |
space ("bar\n"), resulting in "bar\n\foo\n" in our pattern space. The 'h' |
614 |
command puts this back in the hold space for safekeeping, and 'd' deletes the |
615 |
line from the pattern space so that it isn't printed. |
616 |
</p> |
617 |
|
618 |
<p> |
619 |
For the last "oni" line, the same steps are repeated, except that the contents |
620 |
of the pattern space aren't deleted (due to the '$!' before the 'd'), and the |
621 |
contents of the pattern space (three lines) are printed to stdout. |
622 |
</p> |
623 |
|
624 |
<p> |
625 |
Now, it's time to do some powerful data conversion with sed. |
626 |
</p> |
627 |
|
628 |
</body> |
629 |
</section> |
630 |
<section> |
631 |
<title>sed QIF magic</title> |
632 |
|
633 |
|
634 |
|
635 |
-- |
636 |
gentoo-doc-cvs@g.o mailing list |