1 |
Replies are below... |
2 |
|
3 |
On Thu, 2003-06-12 at 03:15, Paul de Vrieze wrote: |
4 |
> On Thursday 12 June 2003 09:53, Evan Powers wrote: |
5 |
> > On Thursday 12 June 2003 02:56 am, Seemant Kulleen wrote: |
6 |
> > > Anyway, the current approach keeps it simple in that the md5sum is off |
7 |
> > > the *item(s) that is/are downloaded*. The first reason I can see is what |
8 |
> > > I stated above. There are other reasons I can see as well. You know, |
9 |
> > > immediately upon fetching the set of source items that they are bad. So, |
10 |
> > > no disk i/o or cpu cycles are spent in the unpacking; and no potentially |
11 |
> > > nasty code is even untarred on the system, yet. |
12 |
> > |
13 |
> > Well, I can think of a way to address this part of the issue, anyway. |
14 |
> > |
15 |
> > * The portage tree still has MD5 digests for the item(s) which is/are |
16 |
> > downloaded. |
17 |
> > |
18 |
> This would make things easy for most people, and would save also lots of cpu |
19 |
> cycles. |
20 |
Agreed. Ignoring the cpu cost of decompression, gzip decompressing and |
21 |
md5ing is fairly close to i/o at least for my machine, which is an |
22 |
xp1700 w/ about 40mb/s throughput on my hd. Bzip2 (as jjw pointed out) |
23 |
is an entirely different beast, decompressing it is not close to i/o |
24 |
speeds. At this stage of the game, recompressed/patched tarballs would |
25 |
be the minority- down the line assuming diffing/patching takes off, this |
26 |
might be something to think about. |
27 |
|
28 |
> > * After downloading, emerge executes a (user specified?) program on the |
29 |
> > newly downloaded file. This program applies some transform to the file; |
30 |
> > maybe it decompresses whatever format the file is in and re-compresses it |
31 |
> > with bzip2, or maybe it only format-shifts files which are over a certain |
32 |
> > size threshold, whatever. |
33 |
> > |
34 |
> Allways fun, maybe we would also have a list of allowed/blacklisted extensions |
35 |
> like RPM files which are allready compressed. (Or openoffice files/jar-balls |
36 |
> which loose validity if compressed in a different format, but could be |
37 |
> recompressed in zip format) |
38 |
> |
39 |
> > * Next, emerge adds a new record to a database (text, one record per line, |
40 |
> > for example) somewhere in /var. This database has the original name of the |
41 |
> > downloaded file, the original MD5 digest, the new name, and the new MD5 |
42 |
> > digest. |
43 |
> > |
44 |
> Maybe extending the serverside digest to also include an unpacked digest |
45 |
> (where applicable) would be smarter in validation of patch based and oddball |
46 |
> cases. (This doesn't mean that client-side digesting isn't useful, it is for |
47 |
> not having to unpack first). |
48 |
How about this, and mind you this is just for dealing w/ md5sum's- |
49 |
instead of doing any db-style stuff, just create a file along side (w/in |
50 |
the distfile dir most likely) that contains the uncompressed data's |
51 |
md5sum. If you go about creating a db type setup, you're going to run |
52 |
into major issues in an environment where the distfiles dir is shared |
53 |
out to other systems since you're not going to be sharing the db. |
54 |
Basically I could see this- a simple script that a user can use to |
55 |
convert non-gz distfiles to bzip2 tarballs which creates the file (think |
56 |
linux-2.4.19.tar.bz2 and linux-2.4.19.tar.md5)... for distfile |
57 |
diffing/patching, it uses the same method. If the reconstructed and |
58 |
recompressed verion's md5 matches what portage has, hoozah, no need to |
59 |
create the file- if not (say upgrading openoffice), we create the file. |
60 |
Also, it dawned on me that md5summing the data has an added bonus of |
61 |
being indifferent to the patching/differencing method. In other words, |
62 |
we could use the standard unified diff's that are provided for the |
63 |
kernel versions for instance. |
64 |
|
65 |
As for transforming, would it really be needed? I've spent a bit of |
66 |
time rooting through the distfile dir and I don't recall seeing |
67 |
non-versioned names, although as always, I could be wrong. |
68 |
|
69 |
|
70 |
-- |
71 |
gentoo-dev@g.o mailing list |