1 |
neysx 05/07/28 08:04:04 |
2 |
|
3 |
Added: xml/htdocs/doc/en/articles l-awk1.xml l-awk2.xml l-awk3.xml |
4 |
Log: |
5 |
#99260 xmlified awk articles |
6 |
|
7 |
Revision Changes Path |
8 |
1.1 xml/htdocs/doc/en/articles/l-awk1.xml |
9 |
|
10 |
file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-awk1.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo |
11 |
plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-awk1.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo |
12 |
|
13 |
Index: l-awk1.xml |
14 |
=================================================================== |
15 |
<?xml version='1.0' encoding="UTF-8"?> |
16 |
<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-awk1.xml,v 1.1 2005/07/28 08:04:04 neysx Exp $ --> |
17 |
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
18 |
|
19 |
<guide link="/doc/en/articles/l-awk1.xml"> |
20 |
<title>Awk by example, Part 1</title> |
21 |
|
22 |
<author title="Author"> |
23 |
<mail link="drobbins@g.o">Daniel Robbins</mail> |
24 |
</author> |
25 |
<author title="Editor"> |
26 |
<mail link="rane@××××××.pl">Łukasz Damentko</mail> |
27 |
</author> |
28 |
|
29 |
<abstract> |
30 |
Awk is a very nice language with a very strange name. In this first article of a |
31 |
three-part series, Daniel Robbins will quickly get your awk programming skills |
32 |
up to speed. As the series progresses, more advanced topics will be covered, |
33 |
culminating with an advanced real-world awk application demo. |
34 |
</abstract> |
35 |
|
36 |
<!-- The original version of this article was published on IBM developerWorks, |
37 |
and is property of Westtech Information Services. This document is an updated |
38 |
version of the original article, and contains various improvements made by the |
39 |
Gentoo Linux Documentation team --> |
40 |
|
41 |
<version>1.0</version> |
42 |
<date>2005-07-15</date> |
43 |
|
44 |
<chapter> |
45 |
<title>An intro to the great language with the strange name</title> |
46 |
<section> |
47 |
<title>In defense of awk</title> |
48 |
<body> |
49 |
|
50 |
<note> |
51 |
The original version of this article was published on IBM developerWorks, and is |
52 |
property of Westtech Information Services. This document is an updated version |
53 |
of the original article, and contains various improvements made by the Gentoo |
54 |
Linux Documentation team. |
55 |
</note> |
56 |
|
57 |
<p> |
58 |
In this series of articles, I'm going to turn you into a proficient awk coder. |
59 |
I'll admit, awk doesn't have a very pretty or particularly "hip" name, and the |
60 |
GNU version of awk, called gawk, sounds downright weird. Those unfamiliar with |
61 |
the language may hear "awk" and think of a mess of code so backwards and |
62 |
antiquated that it's capable of driving even the most knowledgeable UNIX guru to |
63 |
the brink of insanity (causing him to repeatedly yelp "kill -9!" as he runs for |
64 |
coffee machine). |
65 |
</p> |
66 |
|
67 |
<p> |
68 |
Sure, awk doesn't have a great name. But it is a great language. Awk is geared |
69 |
toward text processing and report generation, yet features many well-designed |
70 |
features that allow for serious programming. And, unlike some languages, awk's |
71 |
syntax is familiar, and borrows some of the best parts of languages like C, |
72 |
python, and bash (although, technically, awk was created before both python and |
73 |
bash). Awk is one of those languages that, once learned, will become a key part |
74 |
of your strategic coding arsenal. |
75 |
</p> |
76 |
|
77 |
</body> |
78 |
</section> |
79 |
<section> |
80 |
<title>The first awk</title> |
81 |
<body> |
82 |
|
83 |
<p> |
84 |
You should see the contents of your <path>/etc/passwd</path> file appear before |
85 |
your eyes. Now, for an explanation of what awk did. When we called awk, we |
86 |
specified <path>/etc/passwd</path> as our input file. When we executed awk, it |
87 |
evaluated the print command for each line in <path>/etc/passwd</path>, in |
88 |
order. All output is sent to stdout, and we get a result identical to catting |
89 |
<path>/etc/pass</path>. |
90 |
</p> |
91 |
|
92 |
<p> |
93 |
Now, for an explanation of the { print } code block. In awk, curly braces are |
94 |
used to group blocks of code together, similar to C. Inside our block of code, |
95 |
we have a single print command. In awk, when a print command appears by itself, |
96 |
the full contents of the current line are printed. |
97 |
</p> |
98 |
|
99 |
<pre caption="Printing the current line"> |
100 |
$ <i>awk '{ print $0 }' /etc/passwd</i> |
101 |
$ <i>awk '{ print "" }' /etc/passwd</i> |
102 |
</pre> |
103 |
|
104 |
<p> |
105 |
In awk, the $0 variable represents the entire current line, so print and print |
106 |
$0 do exactly the same thing. |
107 |
</p> |
108 |
|
109 |
<pre caption="Filling the screen with some text"> |
110 |
$ <i>awk '{ print "hiya" }' /etc/passwd</i> |
111 |
</pre> |
112 |
|
113 |
</body> |
114 |
</section> |
115 |
<section> |
116 |
<title>Multiple fields</title> |
117 |
<body> |
118 |
|
119 |
<pre caption="print $1"> |
120 |
$ <i>awk -F":" '{ print $1 $3 }' /etc/passwd</i> |
121 |
halt7 |
122 |
operator11 |
123 |
root0 |
124 |
shutdown6 |
125 |
sync5 |
126 |
bin1 |
127 |
<comment>....etc.</comment> |
128 |
</pre> |
129 |
|
130 |
<pre caption="print $1 $3"> |
131 |
$ <i>awk -F":" '{ print $1 " " $3 }' /etc/passwd</i> |
132 |
</pre> |
133 |
|
134 |
<pre caption="$1$3"> |
135 |
$ <i>awk -F":" '{ print "username: " $1 "\t\tuid:" $3" }' /etc/passwd</i> |
136 |
username: halt uid:7 |
137 |
username: operator uid:11 |
138 |
username: root uid:0 |
139 |
username: shutdown uid:6 |
140 |
username: sync uid:5 |
141 |
username: bin uid:1 |
142 |
<comment>....etc.</comment> |
143 |
</pre> |
144 |
|
145 |
</body> |
146 |
</section> |
147 |
<section> |
148 |
<title>External scripts</title> |
149 |
<body> |
150 |
|
151 |
<pre caption="Sample script"> |
152 |
BEGIN { FS=":" } |
153 |
{ print $1 } |
154 |
</pre> |
155 |
|
156 |
<p> |
157 |
The difference between these two methods has to do with how we set the field |
158 |
separator. In this script, the field separator is specified within the code |
159 |
itself (by setting the FS variable), while our previous example set FS by |
160 |
passing the -F":" option to awk on the command line. It's generally best to set |
161 |
the field separator inside the script itself, simply because it means you have |
162 |
one less command line argument to remember to type. We'll cover the FS variable |
163 |
in more detail later in this article. |
164 |
</p> |
165 |
|
166 |
</body> |
167 |
</section> |
168 |
<section> |
169 |
<title>The BEGIN and END blocks</title> |
170 |
<body> |
171 |
|
172 |
<p> |
173 |
Normally, awk executes each block of your script's code once for each input |
174 |
line. However, there are many programming situations where you may need to |
175 |
execute initialization code before awk begins processing the text from the input |
176 |
file. For such situations, awk allows you to define a BEGIN block. We used a |
177 |
BEGIN block in the previous example. Because the BEGIN block is evaluated before |
178 |
awk starts processing the input file, it's an excellent place to initialize the |
179 |
FS (field separator) variable, print a heading, or initialize other global |
180 |
variables that you'll reference later in the program. |
181 |
</p> |
182 |
|
183 |
<p> |
184 |
Awk also provides another special block, called the END block. Awk executes this |
185 |
block after all lines in the input file have been processed. Typically, the END |
186 |
block is used to perform final calculations or print summaries that should |
187 |
appear at the end of the output stream. |
188 |
</p> |
189 |
|
190 |
</body> |
191 |
</section> |
192 |
<section> |
193 |
<title>Regular expressions and blocks</title> |
194 |
<body> |
195 |
|
196 |
<pre caption="Regular expressions and blocks"> |
197 |
/foo/ { print } |
198 |
/[0-9]+\.[0-9]*/ { print } |
199 |
</pre> |
200 |
|
201 |
</body> |
202 |
</section> |
203 |
<section> |
204 |
<title>Expressions and blocks</title> |
205 |
<body> |
206 |
|
207 |
<pre caption="fredprint"> |
208 |
$1 == "fred" { print $3 } |
209 |
</pre> |
210 |
|
211 |
<pre caption="root"> |
212 |
$5 ~ /root/ { print $3 } |
213 |
</pre> |
214 |
|
215 |
|
216 |
|
217 |
1.1 xml/htdocs/doc/en/articles/l-awk2.xml |
218 |
|
219 |
file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-awk2.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo |
220 |
plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-awk2.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo |
221 |
|
222 |
Index: l-awk2.xml |
223 |
=================================================================== |
224 |
<?xml version='1.0' encoding="UTF-8"?> |
225 |
<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-awk2.xml,v 1.1 2005/07/28 08:04:04 neysx Exp $ --> |
226 |
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
227 |
|
228 |
<guide link="/doc/en/articles/l-awk2.xml"> |
229 |
<title>Awk by example, Part 2</title> |
230 |
|
231 |
<author title="Author"> |
232 |
<mail link="drobbins@g.o">Daniel Robbins</mail> |
233 |
</author> |
234 |
<author title="Editor"> |
235 |
<mail link="rane@××××××.pl">Łukasz Damentko</mail> |
236 |
</author> |
237 |
|
238 |
<abstract> |
239 |
In this sequel to his previous intro to awk, Daniel Robbins continues to explore |
240 |
awk, a great language with a strange name. Daniel will show you how to handle |
241 |
multi-line records, use looping constructs, and create and use awk arrays. By |
242 |
the end of this article, you'll be well versed in a wide range of awk features, |
243 |
and you'll be ready to write your own powerful awk scripts. |
244 |
</abstract> |
245 |
|
246 |
<!-- The original version of this article was published on IBM developerWorks, |
247 |
and is property of Westtech Information Services. This document is an updated |
248 |
version of the original article, and contains various improvements made by the |
249 |
Gentoo Linux Documentation team --> |
250 |
|
251 |
<version>1.0</version> |
252 |
<date>2005-07-27</date> |
253 |
|
254 |
<chapter> |
255 |
<title>Records, loops, and arrays</title> |
256 |
<section> |
257 |
<title>Multi-line records</title> |
258 |
<body> |
259 |
|
260 |
<note> |
261 |
The original version of this article was published on IBM developerWorks, and is |
262 |
property of Westtech Information Services. This document is an updated version |
263 |
of the original article, and contains various improvements made by the Gentoo |
264 |
Linux Documentation team. |
265 |
</note> |
266 |
|
267 |
<p> |
268 |
Awk is an excellent tool for reading in and processing structured data, such as |
269 |
the system's <path>/etc/passwd</path> file. <path>/etc/passwd</path> is the UNIX |
270 |
user database, and is a colon-delimited text file, containing a lot of important |
271 |
information, including all existing user accounts and user IDs, among other |
272 |
things. In <uri link="/doc/en/articles/l-awk1.xml">my previous article</uri>, I |
273 |
showed you how awk could easily parse this file. All we had to do was to set the |
274 |
FS (field separator) variable to ":". |
275 |
</p> |
276 |
|
277 |
<p> |
278 |
By setting the FS variable correctly, awk can be configured to parse almost any |
279 |
kind of structured data, as long as there is one record per line. However, just |
280 |
setting FS won't do us any good if we want to parse a record that exists over |
281 |
multiple lines. In these situations, we also need to modify the RS record |
282 |
separator variable. The RS variable tells awk when the current record ends and a |
283 |
new record begins. |
284 |
</p> |
285 |
|
286 |
<p> |
287 |
As an example, let's look at how we'd handle the task of processing an address |
288 |
list of Federal Witness Protection Program participants: |
289 |
</p> |
290 |
|
291 |
<pre caption="Sample entry from Federal Witness Protection Program participants list"> |
292 |
Jimmy the Weasel |
293 |
100 Pleasant Drive |
294 |
San Francisco, CA 12345 |
295 |
Big Tony |
296 |
200 Incognito Ave. |
297 |
Suburbia, WA 67890 |
298 |
</pre> |
299 |
|
300 |
<p> |
301 |
Ideally, we'd like awk to recognize each 3-line address as an individual record, |
302 |
rather than as three separate records. It would make our code a lot simpler if |
303 |
awk would recognize the first line of the address as the first field ($1), the |
304 |
street address as the second field ($2), and the city, state, and zip code as |
305 |
field $3. The following code will do just what we want: |
306 |
</p> |
307 |
|
308 |
<pre caption="Making one field from the address"> |
309 |
BEGIN { |
310 |
FS="\n" |
311 |
RS="" |
312 |
} |
313 |
</pre> |
314 |
|
315 |
<p> |
316 |
Above, setting FS to "\n" tells awk that each field appears on its own line. By |
317 |
setting RS to "", we also tell awk that each address record is separated by a |
318 |
blank line. Once awk knows how the input is formatted, it can do all the parsing |
319 |
work for us, and the rest of the script is simple. Let's look at a complete |
320 |
script that will parse this address list and print out each address record on a |
321 |
single line, separating each field with a comma. |
322 |
</p> |
323 |
|
324 |
<pre caption="Complete script"> |
325 |
BEGIN { |
326 |
FS="\n" |
327 |
RS="" |
328 |
} |
329 |
{ print $1 ", " $2 ", " $3 } |
330 |
</pre> |
331 |
|
332 |
|
333 |
<p> |
334 |
If this script is saved as <path>address.awk</path>, and the address data is |
335 |
stored in a file called <path>address.txt</path>, you can execute this script by |
336 |
typing <c>awk -f address.awk address.txt</c>. This code produces the following |
337 |
output: |
338 |
</p> |
339 |
|
340 |
<pre caption="The script's output"> |
341 |
Jimmy the Weasel, 100 Pleasant Drive, San Francisco, CA 12345 |
342 |
Big Tony, 200 Incognito Ave., Suburbia, WA 67890 |
343 |
</pre> |
344 |
|
345 |
</body> |
346 |
</section> |
347 |
<section> |
348 |
<title>OFS and ORS</title> |
349 |
<body> |
350 |
|
351 |
<p> |
352 |
In address.awk's print statement, you can see that awk concatenates (joins) |
353 |
strings that are placed next to each other on a line. We used this feature to |
354 |
insert a comma and a space (", ") between the three address fields that appeared |
355 |
on the line. While this method works, it's a bit ugly looking. Rather than |
356 |
inserting literal ", " strings between our fields, we can have awk do it for us |
357 |
by setting a special awk variable called OFS. Take a look at this code snippet. |
358 |
</p> |
359 |
|
360 |
<pre caption="Sample code snippet"> |
361 |
print "Hello", "there", "Jim!" |
362 |
</pre> |
363 |
|
364 |
<p> |
365 |
The commas on this line are not part of the actual literal strings. Instead, |
366 |
they tell awk that "Hello", "there", and "Jim!" are separate fields, and that |
367 |
the OFS variable should be printed between each string. By default, awk produces |
368 |
the following output: |
369 |
</p> |
370 |
|
371 |
<pre caption="Output produced by awk"> |
372 |
Hello there Jim! |
373 |
</pre> |
374 |
|
375 |
<p> |
376 |
This shows us that by default, OFS is set to " ", a single space. However, we |
377 |
can easily redefine OFS so that awk will insert our favorite field separator. |
378 |
Here's a revised version of our original <path>address.awk</path> program that |
379 |
uses OFS to output those intermediate ", " strings: |
380 |
</p> |
381 |
|
382 |
<pre caption="Redefining OFS"> |
383 |
BEGIN { |
384 |
FS="\n" |
385 |
RS="" |
386 |
OFS=", " |
387 |
} |
388 |
{ print $1, $2, $3 } |
389 |
</pre> |
390 |
|
391 |
<p> |
392 |
Awk also has a special variable called ORS, called the "output record |
393 |
separator". By setting ORS, which defaults to a newline ("\n"), we can control |
394 |
the character that's automatically printed at the end of a print statement. The |
395 |
default ORS value causes awk to output each new print statement on a new line. |
396 |
If we wanted to make the output double-spaced, we would set ORS to "\n\n". Or, |
397 |
if we wanted records to be separated by a single space (and no newline), we |
398 |
would set ORS to " ". |
399 |
</p> |
400 |
|
401 |
</body> |
402 |
</section> |
403 |
<section> |
404 |
<title>Multi-line to tabbed</title> |
405 |
<body> |
406 |
|
407 |
<p> |
408 |
Let's say that we wrote a script that converted our address list to a |
409 |
single-line per record, tab-delimited format for import into a spreadsheet. |
410 |
After using a slightly modified version of <path>address.awk</path>, it would |
411 |
become clear that our program only works for three-line addresses. If awk |
412 |
encountered the following address, the fourth line would be thrown away and not |
413 |
printed: |
414 |
</p> |
415 |
|
416 |
<pre caption="Sample entry"> |
417 |
Cousin Vinnie |
418 |
Vinnie's Auto Shop |
419 |
300 City Alley |
420 |
Sosueme, OR 76543 |
421 |
</pre> |
422 |
|
423 |
|
424 |
|
425 |
|
426 |
1.1 xml/htdocs/doc/en/articles/l-awk3.xml |
427 |
|
428 |
file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-awk3.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo |
429 |
plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-awk3.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo |
430 |
|
431 |
Index: l-awk3.xml |
432 |
=================================================================== |
433 |
<?xml version='1.0' encoding="UTF-8"?> |
434 |
<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-awk3.xml,v 1.1 2005/07/28 08:04:04 neysx Exp $ --> |
435 |
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
436 |
|
437 |
<guide link="/doc/en/articles/l-awk3.xml"> |
438 |
<title>Awk by example, Part 3</title> |
439 |
|
440 |
<author title="Author"> |
441 |
<mail link="drobbins@g.o">Daniel Robbins</mail> |
442 |
</author> |
443 |
<author title="Editor"> |
444 |
<mail link="rane@××××××.pl">Łukasz Damentko</mail> |
445 |
</author> |
446 |
|
447 |
<abstract> |
448 |
In this sequel to his previous intro to awk, Daniel Robbins continues to explore |
449 |
awk, a great language with a strange name. Daniel will show you how to handle |
450 |
multi-line records, use looping constructs, and create and use awk arrays. By |
451 |
the end of this article, you'll be well versed in a wide range of awk features, |
452 |
and you'll be ready to write your own powerful awk scripts. |
453 |
</abstract> |
454 |
|
455 |
<!-- The original version of this article was published on IBM developerWorks, |
456 |
and is property of Westtech Information Services. This document is an updated |
457 |
version of the original article, and contains various improvements made by the |
458 |
Gentoo Linux Documentation team --> |
459 |
|
460 |
<version>1.0</version> |
461 |
<date>2005-07-27</date> |
462 |
|
463 |
<chapter> |
464 |
<title>String functions and ... checkbooks?</title> |
465 |
<section> |
466 |
<title>Formatting output</title> |
467 |
<body> |
468 |
|
469 |
<p> |
470 |
While awk's print statement does do the job most of the time, sometimes more is |
471 |
needed. For those times, awk offers two good old friends called printf() and |
472 |
sprintf(). Yes, these functions, like so many other awk parts, are identical to |
473 |
their C counterparts. printf() will print a formatted string to stdout, while |
474 |
sprintf() returns a formatted string that can be assigned to a variable. If |
475 |
you're not familiar with printf() and sprintf(), an introductory C text will |
476 |
quickly get you up to speed on these two essential printing functions. You can |
477 |
view the printf() man page by typing "man 3 printf" on your Linux system. |
478 |
</p> |
479 |
|
480 |
<p> |
481 |
Here's some sample awk sprintf() and printf() code. As you can see, everything |
482 |
looks almost identical to C. |
483 |
</p> |
484 |
|
485 |
<pre caption="Sample awk sprintf() and printf() code"> |
486 |
x=1 |
487 |
b="foo" |
488 |
printf("%s got a %d on the last test\n","Jim",83) |
489 |
myout=("%s-%d",b,x) |
490 |
print myout |
491 |
</pre> |
492 |
|
493 |
<p> |
494 |
This code will print: |
495 |
</p> |
496 |
|
497 |
<pre caption="Code output"> |
498 |
Jim got a 83 on the last test |
499 |
foo-1 |
500 |
</pre> |
501 |
|
502 |
</body> |
503 |
</section> |
504 |
<section> |
505 |
<title>String functions</title> |
506 |
<body> |
507 |
|
508 |
<p> |
509 |
Awk has a plethora of string functions, and that's a good thing. In awk, you |
510 |
really need string functions, since you can't treat a string as an array of |
511 |
characters as you can in other languages like C, C++, and Python. For example, |
512 |
if you execute the following code: |
513 |
</p> |
514 |
|
515 |
<pre caption="Example code"> |
516 |
mystring="How are you doing today?" |
517 |
print mystring[3] |
518 |
</pre> |
519 |
|
520 |
<p> |
521 |
You'll receive an error that looks something like this: |
522 |
</p> |
523 |
|
524 |
<pre caption="Example code error"> |
525 |
awk: string.gawk:59: fatal: attempt to use scalar as array |
526 |
</pre> |
527 |
|
528 |
<p> |
529 |
Oh, well. While not as convenient as Python's sequence types, awk's string |
530 |
functions get the job done. Let's take a look at them. |
531 |
</p> |
532 |
|
533 |
<p> |
534 |
First, we have the basic length() function, which returns the length of a |
535 |
string. Here's how to use it: |
536 |
</p> |
537 |
|
538 |
<pre caption="length() function example"> |
539 |
print length(mystring) |
540 |
</pre> |
541 |
|
542 |
<p> |
543 |
This code will print the value: |
544 |
</p> |
545 |
|
546 |
<pre caption="Printed value"> |
547 |
24 |
548 |
</pre> |
549 |
|
550 |
<p> |
551 |
OK, let's keep going. The next string function is called index, and will return |
552 |
the position of the occurrence of a substring in another string, or it will |
553 |
return 0 if the string isn't found. Using mystring, we can call it this way: |
554 |
</p> |
555 |
|
556 |
<pre caption="index() funtion example"> |
557 |
print index(mystring,"you") |
558 |
</pre> |
559 |
|
560 |
<p> |
561 |
Awk prints: |
562 |
</p> |
563 |
|
564 |
<pre caption="Function output"> |
565 |
9 |
566 |
</pre> |
567 |
|
568 |
<p> |
569 |
We move on to two more easy functions, tolower() and toupper(). As you might |
570 |
guess, these functions will return the string with all characters converted to |
571 |
lowercase or uppercase respectively. Notice that tolower() and toupper() return |
572 |
the new string, and don't modify the original. This code: |
573 |
</p> |
574 |
|
575 |
<pre caption="Converting strings to lower or uppercase"> |
576 |
print tolower(mystring) |
577 |
print toupper(mystring) |
578 |
print mystring |
579 |
</pre> |
580 |
|
581 |
<p> |
582 |
....will produce this output: |
583 |
</p> |
584 |
|
585 |
<pre caption="Output"> |
586 |
how are you doing today? |
587 |
HOW ARE YOU DOING TODAY? |
588 |
How are you doing today? |
589 |
</pre> |
590 |
|
591 |
<p> |
592 |
So far so good, but how exactly do we select a substring or even a single |
593 |
character from a string? That's where substr() comes in. Here's how to call |
594 |
substr(): |
595 |
</p> |
596 |
|
597 |
<pre caption="substr() function example"> |
598 |
mysub=substr(mystring,startpos,maxlen) |
599 |
</pre> |
600 |
|
601 |
<p> |
602 |
mystring should be either a string variable or a literal string from which you'd |
603 |
like to extract a substring. startpos should be set to the starting character |
604 |
position, and maxlen should contain the maximum length of the string you'd like |
605 |
to extract. Notice that I said maximum length; if length(mystring) is shorter |
606 |
than startpos+maxlen, your result will be truncated. substr() won't modify the |
607 |
original string, but returns the substring instead. Here's an example: |
608 |
</p> |
609 |
|
610 |
<pre caption="Another example"> |
611 |
print substr(mystring,9,3) |
612 |
</pre> |
613 |
|
614 |
<p> |
615 |
Awk will print: |
616 |
</p> |
617 |
|
618 |
<pre caption="What awk prints"> |
619 |
you |
620 |
</pre> |
621 |
|
622 |
<p> |
623 |
If you regularly program in a language that uses array indices to access parts |
624 |
of a string (and who doesn't), make a mental note that substr() is your awk |
625 |
substitute. You'll need to use it to extract single characters and substrings; |
626 |
because awk is a string-based language, you'll be using it often. |
627 |
</p> |
628 |
|
629 |
<p> |
630 |
Now, we move on to some meatier functions, the first of which is called match(). |
631 |
match() is a lot like index(), except instead of searching for a substring like |
632 |
|
633 |
|
634 |
|
635 |
-- |
636 |
gentoo-doc-cvs@g.o mailing list |