[gentoo-doc-cvs] cvs commit: l-sed1.xml - gentoo-doc-cvs

From:	Jan Kundrat <jkt@×××××××××××.org>
To:	gentoo-doc-cvs@l.g.o
Subject:	[gentoo-doc-cvs] cvs commit: l-sed1.xml
Date:	Tue, 26 Jul 2005 10:47:06
Message-Id:	`200507261046.j6QAkpVW027955@robin.gentoo.org`

1

jkt         05/07/26 10:46:47

2

3

  Added:       xml/htdocs/doc/en/articles l-sed1.xml l-sed2.xml l-sed3.xml

4

  Log:

5

  #99049, "Common threads: Sed by example", converted by rane

6

7

Revision  Changes    Path

8

1.1                  xml/htdocs/doc/en/articles/l-sed1.xml

9

10

file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed1.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo

11

plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed1.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo

12

13

Index: l-sed1.xml

14

===================================================================

15

<?xml version='1.0' encoding="UTF-8"?>

16

<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-sed1.xml,v 1.1 2005/07/26 10:46:47 jkt Exp $ -->

17

<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">

18

19

<guide link="/doc/en/articles/l-sed1.xml">

20

<title>Sed by example, Part 1</title>

21

22

<author title="Author">

23

  <mail link="drobbins@g.o">Daniel Robbins</mail>

24

</author>

25

<author title="Editor">

26

  <mail link="rane@××××××.pl">Łukasz Damentko</mail>

27

</author>

28

29

<abstract>

30

In this series of articles, Daniel Robbins will show you how to use the very

31

powerful (but often forgotten) UNIX stream editor, sed. Sed is an ideal tool for

32

batch-editing files or for creating shell scripts to modify existing files in

33

powerful ways.

34

</abstract>

35

36

<!-- The original version of this article was published on IBM developerWorks,

37

and is property of Westtech Information Services. This document is an updated

38

version of the original article, and contains various improvements made by the

39

Gentoo Linux Documentation team -->

40

41

<version>1.0</version>

42

<date>2005-07-15</date>

43

44

<chapter>

45

<title>Get to know the powerful UNIX editor</title>

46

<section>

47

<title>Pick an editor</title>

48

<body>

49

50

<note>

51

The original version of this article was published on IBM developerWorks, and is

52

property of Westtech Information Services. This document is an updated version

53

of the original article, and contains various improvements made by the Gentoo

54

Linux Documentation team.

55

</note>

56

57

<p>

58

In the UNIX world, we have a lot of options when it comes to editing files.

59

Think of it -- vi, emacs, and jed come to mind, as well as many others. We all

60

have our favorite editor (along with our favorite keybindings) that we have come

61

to know and love. With our trusty editor, we are ready to tackle any number of

62

UNIX-related administration or programming tasks with ease.</p>

63

64

<p>

65

While interactive editors are great, they do have limitations. Though their

66

interactive nature can be a strength, it can also be a weakness. Consider a

67

situation where you need to perform similar types of changes on a group of

68

files. You could instinctively fire up your favorite editor and perform a bunch

69

of mundane, repetitive, and time-consuming edits by hand. But there's a better

70

way.

71

</p>

72

73

</body>

74

</section>

75

<section>

76

<title>Enter sed</title>

77

<body>

78

79

<p>

80

It would be nice if we could automate the process of making edits to files, so

81

that we could "batch" edit files, or even write scripts with the ability to

82

perform sophisticated changes to existing files. Fortunately for us, for these

83

types of situations, there is a better way -- and the better way is called sed.

84

</p>

85

86

<p>

87

sed is a lightweight stream editor that's included with nearly all UNIX flavors,

88

including Linux. sed has a lot of nice features. First of all, it's very

89

lightweight, typically many times smaller than your favorite scripting language.

90

Secondly, because sed is a stream editor, it can perform edits to data it

91

receives from stdin, such as from a pipeline. So, you don't need to have the

92

data to be edited stored in a file on disk. Because data can just as easily be

93

piped to sed, it's very easy to use sed as part of a long, complex pipeline in a

94

powerful shell script. Try doing that with your favorite editor.

95

</p>

96

97

</body>

98

</section>

99

<section>

100

<title>GNU sed</title>

101

<body>

102

103

<p>

104

Fortunately for us Linux users, one of the nicest versions of sed out there

105

happens to be GNU sed, which is currently at version 3.02. Every Linux

106

distribution has GNU sed, or at least should. GNU sed is popular not only

107

because its sources are freely distributable, but because it happens to have a

108

lot of handy, time-saving extensions to the POSIX sed standard. GNU sed also

109

doesn't suffer from many of the limitations that earlier and proprietary

110

versions of sed had, such as a limited line length -- GNU sed handles lines of

111

any length with ease.

112

</p>

113

114

</body>

115

</section>

116

<section>

117

<title>The newest GNU sed</title>

118

<body>

119

120

<p>

121

While researching this article, I noticed that several online sed aficionados

122

made reference to a GNU sed 3.02a. Strangely, I couldn't find sed 3.02a on

123

<uri>ftp://ftp.gnu.org</uri> (see <uri link="#resources">Resources</uri> for

124

these links), so I had to go look for it elsewhere. I found it at

125

<uri>ftp://alpha.gnu.org</uri>, in <path>/pub/sed</path>. I happily downloaded

126

it, compiled it, and installed it, only to find minutes later that the most

127

recent version of sed is 3.02.80 -- and you can find its sources right next to

128

those for 3.02a, at <uri>ftp://alpha.gnu.org</uri>. After getting GNU sed

129

3.02.80 installed, I was finally ready to go.

130

</p>

131

132

</body>

133

</section>

134

<section>

135

<title>The right sed</title>

136

<body>

137

138

<p>

139

In this series, we will be using GNU sed 3.02.80. Some (but very few) of the

140

most advanced examples you'll find in my upcoming, follow-on articles in this

141

series will not work with GNU sed 3.02 or 3.02a. If you're using a non-GNU sed,

142

your results may vary. Why not take some time to install GNU sed 3.02.80 now?

143

Then, not only will you be ready for the rest of the series, but you'll also be

144

able to use arguably the best sed in existence!

145

</p>

146

147

</body>

148

</section>

149

<section>

150

<title>Sed examples</title>

151

<body>

152

153

<p>

154

Sed works by performing any number of user-specified editing operations

155

("commands") on the input data. Sed is line-based, so the commands are performed

156

on each line in order. And, sed writes its results to standard output (stdout);

157

it doesn't modify any input files.

158

</p>

159

160

<p>

161

Let's look at some examples. The first several are going to be a bit weird

162

because I'm using them to illustrate how sed works rather than to perform any

163

useful task. However, if you're new to sed, it's very important that you

164

understand them. Here's our first example:

165

</p>

166

167

<pre caption="Example of sed usage">

168

$ <i>sed -e 'd' /etc/services</i>

169

</pre>

170

171

<p>

172

If you type this command, you'll get absolutely no output. Now, what happened?

173

In this example, we called sed with one editing command, <c>d</c>. Sed opened

174

the <path>/etc/services</path> file, read a line into its pattern buffer,

175

performed our editing command ("delete line"), and then printed the pattern

176

buffer (which was empty). It then repeated these steps for each successive line.

177

This produced no output, because the <c>d</c> command zapped every single line

178

in the pattern buffer!

179

</p>

180

181

<p>

182

There are a couple of things to notice in this example. First,

183

<path>/etc/services</path> was not modified at all. This is because, again, sed

184

only reads from the file you specify on the command line, using it as input --

185

it doesn't try to modify the file. The second thing to notice is that sed is

186

line-oriented. The <c>d</c> command didn't simply tell sed to delete all incoming

187

data in one fell swoop. Instead, sed read each line of /etc/services one by one

188

into its internal buffer, called the pattern buffer. Once a line was read into

189

the pattern buffer, it performed the <c>d</c> command and printed the contents

190

of the pattern buffer (nothing in this example). Later, I'll show you how to use

191

address ranges to control which lines a command is applied to -- but in the

192

absence of addresses, a command is applied to all lines.

193

</p>

194

195

<p>

196

The third thing to notice is the use of single quotes to surround the <c>d</c>

197

command. It's a good idea to get into the habit of using single quotes to

198

surround your sed commands, so that shell expansion is disabled.

199

</p>

200

201

</body>

202

</section>

203

<section>

204

<title>Another sed example</title>

205

<body>

206

207

<p>

208

Here's an example of how to use sed to remove the first line of the

209

<path>/etc/services</path> file from our output stream:

210

</p>

211

212

<pre caption="Another sed example">

213

$ <i>sed -e '1d' /etc/services | more</i>

1.1                  xml/htdocs/doc/en/articles/l-sed2.xml

218

219

file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed2.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo

220

plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed2.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo

221

222

Index: l-sed2.xml

223

===================================================================

224

<?xml version='1.0' encoding="UTF-8"?>

225

<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-sed2.xml,v 1.1 2005/07/26 10:46:47 jkt Exp $ -->

226

<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">

227

228

<guide link="/doc/en/articles/l-sed2.xml"> 

229

<title>Sed by example, Part 2</title>

230

231

<author title="Author">

232

  <mail link="drobbins@g.o">Daniel Robbins</mail>

233

</author>

234

<author title="Editor">

235

  <mail link="rane@××××××.pl">Łukasz Damentko</mail>

236

</author>

237

238

<abstract>

239

Sed is a very powerful and compact text stream editor. In this article, the

240

second in the series, Daniel shows you how to use sed to perform string

241

substitution; create larger sed scripts; and use sed's append, insert, and

242

change line commands.

243

</abstract>

244

245

<!-- The original version of this article was published on IBM developerWorks,

246

and is property of Westtech Information Services. This document is an updated

247

version of the original article, and contains various improvements made by the

248

Gentoo Linux Documentation team -->

249

250

<version>1.0</version>

251

<date>2005-07-15</date>

252

253

<chapter>

254

<title>How to further take advantage of the UNIX text editor</title>

255

<section>

256

<title>Substitution!</title>

257

<body>

258

259

<note>

260

The original version of this article was published on IBM developerWorks, and is

261

property of Westtech Information Services. This document is an updated version

262

of the original article, and contains various improvements made by the Gentoo

263

Linux Documentation team.

264

</note>

265

266

267

<p>

268

Let's look at one of sed's most useful commands, the substitution command.

269

Using it, we can replace a particular string or matched regular expression with

270

another string. Here's an example of the most basic use of this command:

271

</p>

272

273

<pre caption="Most basic use of substitution command">

274

$ <i>sed -e 's/foo/bar/' myfile.txt</i>

275

</pre>

276

277

<p>

278

The above command will output the contents of myfile.txt to stdout, with the

279

first occurrence of 'foo' (if any) on each line replaced with the string 'bar'.

280

Please note that I said first occurrence on each line, though this is normally

281

not what you want. Normally, when I do a string replacement, I want to perform

282

it globally. That is, I want to replace all occurrences on every line, as

283

follows:

284

</p>

285

286

<pre caption="Replacing all the occurences on every line">

287

$ <i>sed -e 's/foo/bar/g' myfile.txt</i>

288

</pre>

289

290

<p>

291

The additional 'g' option after the last slash tells sed to perform a global

292

replace.

293

</p>

294

295

<p>

296

Here are a few other things you should know about the <c>s///</c> substitution

297

command. First, it is a command, and a command only; there are no addresses

298

specified in any of the above examples. This means that the <c>s///</c> command

299

can also be used with addresses to control what lines it will be applied to, as

300

follows:

301

</p>

302

303

<pre caption="Specifying lines command will be applied to">

304

$ <i>sed -e '1,10s/enchantment/entrapment/g' myfile2.txt</i>

305

</pre>

306

307

<p>

308

The above example will cause all occurrences of the phrase 'enchantment' to be

309

replaced with the phrase 'entrapment', but only on lines one through ten,

310

inclusive.

311

</p>

312

313

<pre caption="Specifying more options">

314

$ <i>sed -e '/^$/,/^END/s/hills/mountains/g' myfile3.txt</i>

315

</pre>

316

317

<p>

318

This example will swap 'hills' for 'mountains', but only on blocks of text

319

beginning with a blank line, and ending with a line beginning with the three

320

characters 'END', inclusive.

321

</p>

322

323

<p>

324

Another nice thing about the <c>s///</c> command is that we have a lot of

325

options when it comes to those <c>/</c> separators. If we're performing string

326

substitution and the regular expression or replacement string has a lot of

327

slashes in it, we can change the separator by specifying a different character

328

after the 's'. For example, this will replace all occurrences of

329

<path>/usr/local</path> with <path>/usr</path>:

330

</p>

331

332

<pre caption="Replacing all the occurences of one string with another one">

333

$ <i>sed -e 's:/usr/local:/usr:g' mylist.txt</i>

334

</pre>

335

336

<note>

337

In this example, we're using the colon as a separator. If you ever need to

338

specify the separator character in the regular expression, put a backslash

339

before it.

340

</note>

341

342

</body>

343

</section>

344

<section>

345

<title>Regexp snafus</title>

346

<body>

347

348

<p>

349

Up until now, we've only performed simple string substitution. While this is

350

handy, we can also match a regular expression. For example, the following sed

351

command will match a phrase beginning with '&lt;' and ending with '&gt;', and

352

containing any number of characters inbetween. This phrase will be deleted

353

(replaced with an empty string):

354

</p>

355

356

<pre caption="Deleting specified phrase">

357

$ <i>sed -e 's/&lt;.*&gt;//g' myfile.html</i>

358

</pre>

359

360

<p>

361

This is a good first attempt at a sed script that will remove HTML tags from a

362

file, but it won't work well, due to a regular expression quirk. The reason?

363

When sed tries to match the regular expression on a line, it finds the longest

364

match on the line. This wasn't an issue in my previous sed article, because we

365

were using the <c>d</c> and <c>p</c> commands, which would delete or print the

366

entire line anyway. But when we use the <c>s///</c> command, it definitely makes

367

a big difference, because the entire portion that the regular expression matches

368

will be replaced with the target string, or in this case, deleted. This means

369

that the above example will turn the following line:

370

</p>

371

372

<pre caption="Sample HTML code">

373

&lt;b&gt;This&lt;/b&gt; is what &lt;b&gt;I&lt;/b&gt; meant.

374

</pre>

375

376

<p>

377

Into this:

378

</p>

379

380

<pre caption="Not desired effect">

381

meant.

382

</pre>

383

384

<p>

385

Rather than this, which is what we wanted to do:

386

</p>

387

388

<pre caption="Desired effect">

389

This is what I meant.

390

</pre>

391

392

<p>

393

Fortunately, there is an easy way to fix this. Instead of typing in a regular

394

expression that says "a '&lt;' character followed by any number of characters, and

395

ending with a '&gt;' character", we just need to type in a regexp that says "a

396

'&lt;' character followed by any number of non-'&gt;' characters, and ending

397

with a '&gt;' character". This will have the effect of matching the shortest

398

possible match, rather than the longest possible one. The new command looks like

399

this:

400

</p>

401

402

<pre caption="">

403

$ <i>sed -e 's/&lt;[^&gt;]*&gt;//g' myfile.html</i>

404

</pre>

405

406

<p>

407

In the above example, the '[^&gt;]' specifies a "non-'&gt;'" character, and the '*'

408

after it completes this expression to mean "zero or more non-'&gt;' characters".

409

Test this command on a few sample html files, pipe them to more, and review

410

their results.

411

</p>

412

413

</body>

414

</section>

415

<section>

416

<title>More character matching</title>

417

<body>

418

419

<p>

420

The '[ ]' regular expression syntax has some more additional options. To specify

421

a range of characters, you can use a '-' as long as it isn't in the first or

422

last position, as follows:

1.1                  xml/htdocs/doc/en/articles/l-sed3.xml

427

428

file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed3.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo

429

plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed3.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo

430

431

Index: l-sed3.xml

432

===================================================================

433

<?xml version='1.0' encoding="UTF-8"?>

434

<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/articles/l-sed3.xml,v 1.1 2005/07/26 10:46:47 jkt Exp $ -->

435

<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">

436

437

<guide link="/doc/en/articles/l-sed3.xml">

438

<title>Sed by example, Part 3</title>

439

440

<author title="Author">

441

  <mail link="drobbins@g.o">Daniel Robbins</mail>

442

</author>

443

<author title="Editor">

444

  <mail link="rane@××××××.pl">Łukasz Damentko</mail>

445

</author>

446

447

<abstract>

448

Sed is a very powerful and compact text stream editor. In this article, the

449

second in the series, Daniel shows you how to use sed to perform string

450

substitution; create larger sed scripts; and use sed's append, insert, and

451

change line commands.

452

</abstract>

453

454

<!-- The original version of this article was published on IBM developerWorks,

455

and is property of Westtech Information Services. This document is an updated

456

version of the original article, and contains various improvements made by the

457

Gentoo Linux Documentation team -->

458

459

<version>1.0</version>

460

<date>2005-07-16</date>

461

462

<chapter>

463

<title>Taking it to the next level: Data crunching, sed style</title>

464

<section>

465

<title>Muscular sed</title>

466

<body>

467

468

<note>

469

The original version of this article was published on IBM developerWorks, and is

470

property of Westtech Information Services. This document is an updated version

471

of the original article, and contains various improvements made by the Gentoo

472

Linux Documentation team.

473

</note>

474

475

<p>

476

In <uri link="l-sed2.xml">my second sed article</uri>, I

477

offered examples that demonstrated how sed works, but very few of these examples

478

actually did anything particularly useful. In this final sed article, it's time

479

to change that pattern and put sed to good use. I'll show you several excellent

480

examples that not only demonstrate the power of sed, but also do some really

481

neat (and handy) things. For example, in the second half of the article, I'll

482

show you how I designed a sed script that converts a .QIF file from Intuit's

483

Quicken financial program into a nicely formatted text file. Before doing that,

484

we'll take a look at some less complicated yet useful sed scripts.

485

</p>

486

487

</body>

488

</section>

489

<section>

490

<title>Text translation</title>

491

<body>

492

493

<p>

494

Our first practical script converts UNIX-style text to DOS/Windows format. As

495

you probably know, DOS/Windows-based text files have a CR (carriage return) and

496

LF (line feed) at the end of each line, while UNIX text has only a line feed.

497

There may be times when you need to move some UNIX text to a Windows system, and

498

this script will perform the necessary format conversion for you.

499

</p>

500

501

<pre caption="Format conversion between UNIX and Windows">

502

$ <i>sed -e 's/$/\r/' myunix.txt > mydos.txt</i>

503

</pre>

504

505

<p>

506

In this script, the '$' regular expression will match the end of the line, and

507

the '\r' tells sed to insert a carriage return right before it. Insert a

508

carriage return before a line feed, and presto, a CR/LF ends each line. Please

509

note that the '\r' will be replaced with a CR only when using GNU sed 3.02.80 or

510

later. If you haven't installed GNU sed 3.02.80 yet, see <uri

511

link="l-sed1.xml">my first sed article</uri> for instructions on

512

how to do this.

513

</p>

514

515

<p>

516

I can't tell you how many times I've downloaded some example script or C code,

517

only to find that it's in DOS/Windows format. While many programs don't mind

518

DOS/Windows format CR/LF text files, several programs definitely do -- the most

519

notable being bash, which chokes as soon as it encounters a carriage return. The

520

following sed invocation will convert DOS/Windows format text to trusty UNIX

521

format:

522

</p>

523

524

<pre caption="Converting C code from Windows to UNIX format">

525

$ <i>sed -e 's/.$//' mydos.txt > myunix.txt</i>

526

</pre>

527

528

<p>

529

The way this script works is simple: our substitution regular expression matches

530

the last character on the line, which happens to be a carriage return. We

531

replace it with nothing, causing it to be deleted from the output entirely. If

532

you use this script and notice that the last character of every line of the

533

output has been deleted, you've specified a text file that's already in UNIX

534

format. No need for that!

535

</p>

536

537

</body>

538

</section>

539

<section>

540

<title>Reversing lines</title>

541

<body>

542

543

<p>

544

Here's another handy little script. This one will reverse lines in a file,

545

similar to the "tac" command that's included with most Linux distributions. The

546

name "tac" may be a bit misleading, because "tac" doesn't reverse the position

547

of characters on the line (left and right), but rather the position of lines in

548

the file (up and down). Tacing the following file:

549

</p>

550

551

<pre caption="Sample file">

552

foo

553

bar

554

oni

555

</pre>

556

557

<p>

558

....produces the following output:

559

</p>

560

561

<pre caption="Output file">

562

oni

563

bar

564

foo

565

</pre>

566

567

<p>

568

We can do the same thing with the following sed script:

569

</p>

570

571

<pre caption="Doing same with script">

572

$ <i>sed -e '1!G;h;$!d' forward.txt > backward.txt</i>

573

</pre>

574

575

<p>

576

You'll find this sed script useful if you're logged in to a FreeBSD system,

577

which doesn't happen to have a "tac" command. While handy, it's also a good idea

578

to know why this script does what it does. Let's dissect it.

579

</p>

580

581

</body>

582

</section>

583

<section>

584

<title>Reversal explained</title>

585

<body>

586

587

<p>

588

First, this script contains three separate sed commands, separated by

589

semicolons: '1!G', 'h' and '$!d'. Now, it's time to get an good understanding of

590

the addresses used for the first and third commands. If the first command were

591

'1G', the 'G' command would be applied only to the first line. However, there is

592

an additional '!' character -- this '!' character negates the address, meaning

593

that the 'G' command will apply to all but the first line. For the '$!d'

594

command, we have a similar situation. If the command were '$d', it would apply

595

the 'd' command to only the last line in the file (the '$' address is a simple

596

way of specifying the last line). However, with the '!', '$!d' will apply the

597

'd' command to all but the last line. Now, all we need to to is understand what

598

the commands themselves do.

599

</p>

600

601

<p>

602

When we execute our line reversal script on the text file above, the first

603

command that gets executed is 'h'. This command tells sed to copy the contents

604

of the pattern space (the buffer that holds the current line being worked on) to

605

the hold space (a temporary buffer). Then, the 'd' command is executed, which

606

deletes "foo" from the pattern space, so it doesn't get printed after all the

607

commands are executed for this line.

608

</p>

609

610

<p>

611

Now, line two. After "bar" is read into the pattern space, the 'G' command is

612

executed, which appends the contents of the hold space ("foo\n") to the pattern

613

space ("bar\n"), resulting in "bar\n\foo\n" in our pattern space. The 'h'

614

command puts this back in the hold space for safekeeping, and 'd' deletes the

615

line from the pattern space so that it isn't printed.

616

</p>

617

618

<p>

619

For the last "oni" line, the same steps are repeated, except that the contents

620

of the pattern space aren't deleted (due to the '$!' before the 'd'), and the

621

contents of the pattern space (three lines) are printed to stdout.

622

</p>

623

624

<p>

625

Now, it's time to do some powerful data conversion with sed.

626

</p>

627

628

</body>

629

</section>

630

<section>

631

<title>sed QIF magic</title>

--

636

gentoo-doc-cvs@g.o mailing list

Gentoo Archives: gentoo-doc-cvs