From: Mark Knecht <markknecht@×××××.com>
To: gentoo-user@l.g.o
Subject: [gentoo-user] [OT] - command line read *.csv & create new file
Date: Sun, 22 Feb 2009 19:06:33
1 Hi,
2 Very off topic other than I'd do this on my Gentoo box prior to
3 using R on my Gentoo box. Please ignore if not of interest.
5 I've got a really big data file in essentially a *.csv format.
6 (comma delimited) I need to scan this file and create a new output
7 file. I'm wondering if there is a reasonably easy command line way of
8 doing this using something like sed or awk which I know nothing about.
9 Thanks in advance.
11 The basic idea goes something like this:
13 1) The input file might look this the following where some of it is
14 attributes (shown as letters) and other parts are results. (shown as
15 numbers)
17 A,B,C,D,1
18 E,F,G,H,2
19 I,J,K,L,3
20 M,N,O,P,4
21 Q,R,S,T,5
22 U,V,W,X,6
24 2) From the above data input file I want to take the attributes from a
25 few preceeding lines (say 3 in this example) and write them to the
26 output file along with the result on the last of the 3 lines. The
27 output file might look like this:
29 A,B,C,D,E,F,G,H,I,J,K,L,3
30 E,F,G,H,I,J,K,L,M,N,O,P,4
31 I,J,K,L,M,N,O,P,Q,R,S,T,5
32 M,N,O,P,Q,R,S,T,U,V,W,X,6
34 3) This must be done as a read/process/write operation of some sort
35 because the input file may be far larger than system memory.
36 (Currently it isn't, but it likely will eventually be.)
38 4) In my example above I suggested that there is a single result but
39 their may be more than one. (Don't know yet.) I showed 3 lines but
40 might be doing 10. I don't know. It's important to me to pick a
41 moderately flexible way of dealing with this as the order of columns
42 and number of results will likely change over time and I'll certainly
43 need to adjust.
45 Thanks in advance for any pointers. Happy to buy a good book if
46 someone knows what I should look for.
48 Cheers,
49 Mark


