1 |
On Sat, May 12, 2018 at 2:16 AM, Daniel Frey <djqfrey@×××××.com> wrote: |
2 |
> Hi all, |
3 |
> |
4 |
> I am trying to do something relatively simple and I've had something |
5 |
> working in the past, but my brain just doesn't want to work today. |
6 |
> |
7 |
> I have a text file with the following (this is just a subset of about |
8 |
> 2500 dates, and I don't want to edit these all by hand if I can avoid it): |
9 |
> |
10 |
> --- START --- |
11 |
> December 2, 1994 |
12 |
> March 27, 1992 |
13 |
> June 4, 1994 |
14 |
> 1993 |
15 |
> January 11, 1992 |
16 |
> January 3, 1995 |
17 |
> |
18 |
> |
19 |
> March 12, 1993 |
20 |
> July 12, 1991 |
21 |
> May 17, 1991 |
22 |
> August 7, 1992 |
23 |
> December 23, 1994 |
24 |
> March 27, 1992 |
25 |
> March 1995 |
26 |
> --- END --- |
27 |
> |
28 |
> As you can see, there's no standard in the way the date is formatted. |
29 |
> Some of them are also formatted YYYY-MM-DD and MM-DD-YYYY. |
30 |
> |
31 |
> I have a basic grep that I tossed together: |
32 |
> |
33 |
> grep -o '\([0-9]\{4\}\)' |
34 |
> |
35 |
> This does extract the year but yields the following: |
36 |
> |
37 |
> 1994 |
38 |
> 1992 |
39 |
> 1994 |
40 |
> 1993 |
41 |
> 1992 |
42 |
> 1995 |
43 |
> 1993 |
44 |
> 1991 |
45 |
> 1991 |
46 |
> 1992 |
47 |
> 1994 |
48 |
> 1992 |
49 |
> 1995 |
50 |
> |
51 |
> As you can see, the two empty lines are removed but this will cause |
52 |
> problems with data not lining up later on. |
53 |
> |
54 |
> Does anyone have a quick tip for my tired brain to make this work and |
55 |
> just output a blank line if there's no match? I swear I did this months |
56 |
> ago and had something working but I apparently didn't bother saving the |
57 |
> script I made. Argh! |
58 |
> |
59 |
> Dan |
60 |
> |
61 |
|
62 |
Here's an awk and sed scripts for you to try: |
63 |
cat dates |
64 |
December 2, 1994 |
65 |
March 27, 1992 |
66 |
June 4, 1994 |
67 |
1993 |
68 |
January 11, 1992 |
69 |
January 3, 1995 |
70 |
|
71 |
|
72 |
March 12, 1993 |
73 |
July 12, 1991 |
74 |
May 17, 1991 |
75 |
August 7, 1992 |
76 |
December 23, 1994 |
77 |
March 27, 1992 |
78 |
March 1995 |
79 |
|
80 |
2018-05-12 |
81 |
05-12-2018 |
82 |
|
83 |
awk 'match($0,/[0-9][0-9][0-9][0-9]/){ |
84 |
print substr($0, RSTART, RLENGTH) |
85 |
} |
86 |
/^$/ |
87 |
' dates |
88 |
|
89 |
1994 |
90 |
1992 |
91 |
1994 |
92 |
1993 |
93 |
1992 |
94 |
1995 |
95 |
|
96 |
|
97 |
1993 |
98 |
1991 |
99 |
1991 |
100 |
1992 |
101 |
1994 |
102 |
1992 |
103 |
1995 |
104 |
|
105 |
2018 |
106 |
2018 |
107 |
|
108 |
sed 's/.*\([0-9][0-9][0-9][0-9]\).*/\1/p |
109 |
/^$/p |
110 |
d' dates |
111 |
|
112 |
1994 |
113 |
1992 |
114 |
1994 |
115 |
1993 |
116 |
1992 |
117 |
1995 |
118 |
|
119 |
|
120 |
1993 |
121 |
1991 |
122 |
1991 |
123 |
1992 |
124 |
1994 |
125 |
1992 |
126 |
1995 |
127 |
|
128 |
2018 |
129 |
2018 |