Gentoo Archives: gentoo-user

From: Paul Colquhoun <paulcol@×××××××××××××××××.au>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] OT: Extracting year from data, but honour empty lines
Date: Sat, 12 May 2018 02:05:16
Message-Id: 2449557.MjDVvXUIJB@bluering
In Reply to: [gentoo-user] OT: Extracting year from data, but honour empty lines by Daniel Frey
1 On Saturday, 12 May 2018 9:16:52 AM AEST Daniel Frey wrote:
2 > Hi all,
3 >
4 > I am trying to do something relatively simple and I've had something
5 > working in the past, but my brain just doesn't want to work today.
6 >
7 > I have a text file with the following (this is just a subset of about
8 > 2500 dates, and I don't want to edit these all by hand if I can avoid it):
9 >
10 > --- START ---
11 > December 2, 1994
12 > March 27, 1992
13 > June 4, 1994
14 > 1993
15 > January 11, 1992
16 > January 3, 1995
17 >
18 >
19 > March 12, 1993
20 > July 12, 1991
21 > May 17, 1991
22 > August 7, 1992
23 > December 23, 1994
24 > March 27, 1992
25 > March 1995
26 > --- END ---
27 >
28 > As you can see, there's no standard in the way the date is formatted.
29 > Some of them are also formatted YYYY-MM-DD and MM-DD-YYYY.
30 >
31 > I have a basic grep that I tossed together:
32 >
33 > grep -o '\([0-9]\{4\}\)'
34 >
35 > This does extract the year but yields the following:
36 >
37 > 1994
38 > 1992
39 > 1994
40 > 1993
41 > 1992
42 > 1995
43 > 1993
44 > 1991
45 > 1991
46 > 1992
47 > 1994
48 > 1992
49 > 1995
50 >
51 > As you can see, the two empty lines are removed but this will cause
52 > problems with data not lining up later on.
53 >
54 > Does anyone have a quick tip for my tired brain to make this work and
55 > just output a blank line if there's no match? I swear I did this months
56 > ago and had something working but I apparently didn't bother saving the
57 > script I made. Argh!
58 >
59 > Dan
60
61
62 You can add an alternate regular expression that matches the blank lines, but
63 the '-o' switch will still stop that match from being printed as it is an
64 'empty' match. The trick is to modify the data on the fly to add a space to the
65 empty lines. I have also added the '-E' switch to make the regular expression
66 easier.
67
68 sed -e 's/^$/ /' YOUR_DATA_FILE | grep -o -E '([0-9]{4}|^[[:space:]]*$)'
69
70
71 --
72 Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/
73 Asking for technical help in newsgroups? Read this first:
74 http://catb.org/~esr/faqs/smart-questions.html#intro

Replies

Subject Author
Re: [gentoo-user] OT: Extracting year from data, but honour empty lines Paul Colquhoun <paulcol@×××××××××××××××××.au>