Gentoo Archives: gentoo-user

From:	Tim <root@×××××××××××××××.com>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] join two tab-separate-value files without join field
Date:	Sat, 24 May 2008 05:06:13
Message-Id:	`4837944A.9080509@pneumaticsystem.com`
In Reply to:	[gentoo-user] join two tab-separate-value files without join field by Zhang Weiwu

1	Zhang Weiwu wrote:
2	> Hi.
3	>
4	> I got a datasheet from my colleague in MS Excel format and I intend to
5	> process that file with my awk/sed knowledge. The problem is: he sent me
6	> two Excel files each with 2134 records, in fact there should be only one
7	> excel file with 2134 rows and 295 columns, but MS Excel can only handle
8	> 256 data columns, so he split the datasheet vertically so he can manage
9	> to send to me.
10	>
11	> Now I saved both file to tab-separated-value format, how do I join them?
12	>
13	> I could have used join(1) but that require a join field, an ID of some
14	> sort. I think of this:
15	>
16	> $ grep -n '' left.tsv \| sed 's/:/\t/'> left.forjoin
17	> $ grep -n '' right.tsv \| sed 's/:/\t/'> right.forjoin
18	> $ join -t " " left.forjoin right.forjoin > result.tsv
19	> (note that for join's -t parameter somehow I need to manage to get a tab
20	> between the quotes)
21	>
22	> Yes I achieved what I want, but that looks complex. Is there a simpler
23	> way? Thanks in advance.
24	>
25	> I know OpenOffice 3.0 can handle up to 1024 data columns. It's difficult
26	> to convince anyone to switch to OOO because here in China MS Office
27	> costs only 0$. I also could use OOO3.0 for doing the join but I wish to
28	> know the commandline way:)
29	>
30	Got perl?
31
32	#!/usr/bin/perl
33
34	if($#ARGV < 1) {
35	print "Arguments: <file1> <file2>\n";
36	exit(1);
37	}
38
39	open(FIRSTFILE, $ARGV[0]);
40	open(SECONDFILE, $ARGV[1]);
41	@first = <FIRSTFILE>;
42	@second = <SECONDFILE>;
43
44	$i = 0;
45	for($i = 0;$i < 2; $i++) {
46	$tmp1 = $first[$i];
47	$tmp1 =~ s/\n//g;
48	$tmp2 = $second[$i];
49	$tmp2 =~ s/\n//g;
50
51	$str = $tmp1 . "\t" . $tmp2 . "\n";
52	print $str;
53	}
54
55	close(FIRSTFILE);
56	close(SECONDFILE);
57
58	This is likely not the best or fastest way to do it, and I don't have a
59	dataset as large as yours readily available for testing, but it seems to
60	work.
61
62	-Tim
63	--
64	gentoo-user@l.g.o mailing list

Report Message

Find on MARC Find on Google Groups