Gentoo Archives: gentoo-user

From: Tim <root@×××××××××××××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] join two tab-separate-value files without join field
Date: Sat, 24 May 2008 05:06:13
Message-Id: 4837944A.9080509@pneumaticsystem.com
In Reply to: [gentoo-user] join two tab-separate-value files without join field by Zhang Weiwu
1 Zhang Weiwu wrote:
2 > Hi.
3 >
4 > I got a datasheet from my colleague in MS Excel format and I intend to
5 > process that file with my awk/sed knowledge. The problem is: he sent me
6 > two Excel files each with 2134 records, in fact there should be only one
7 > excel file with 2134 rows and 295 columns, but MS Excel can only handle
8 > 256 data columns, so he split the datasheet vertically so he can manage
9 > to send to me.
10 >
11 > Now I saved both file to tab-separated-value format, how do I join them?
12 >
13 > I could have used join(1) but that require a join field, an ID of some
14 > sort. I think of this:
15 >
16 > $ grep -n '' left.tsv | sed 's/:/\t/'> left.forjoin
17 > $ grep -n '' right.tsv | sed 's/:/\t/'> right.forjoin
18 > $ join -t " " left.forjoin right.forjoin > result.tsv
19 > (note that for join's -t parameter somehow I need to manage to get a tab
20 > between the quotes)
21 >
22 > Yes I achieved what I want, but that looks complex. Is there a simpler
23 > way? Thanks in advance.
24 >
25 > I know OpenOffice 3.0 can handle up to 1024 data columns. It's difficult
26 > to convince anyone to switch to OOO because here in China MS Office
27 > costs only 0$. I also could use OOO3.0 for doing the join but I wish to
28 > know the commandline way:)
29 >
30 Got perl?
31
32 #!/usr/bin/perl
33
34 if($#ARGV < 1) {
35 print "Arguments: <file1> <file2>\n";
36 exit(1);
37 }
38
39 open(FIRSTFILE, $ARGV[0]);
40 open(SECONDFILE, $ARGV[1]);
41 @first = <FIRSTFILE>;
42 @second = <SECONDFILE>;
43
44 $i = 0;
45 for($i = 0;$i < 2; $i++) {
46 $tmp1 = $first[$i];
47 $tmp1 =~ s/\n//g;
48 $tmp2 = $second[$i];
49 $tmp2 =~ s/\n//g;
50
51 $str = $tmp1 . "\t" . $tmp2 . "\n";
52 print $str;
53 }
54
55 close(FIRSTFILE);
56 close(SECONDFILE);
57
58 This is likely not the best or fastest way to do it, and I don't have a
59 dataset as large as yours readily available for testing, but it seems to
60 work.
61
62 -Tim
63 --
64 gentoo-user@l.g.o mailing list