Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: Fragmentation
Date: Wed, 15 Feb 2006 04:04:47
Message-Id: pan.2006.02.15.04.01.44.793796@cox.net
In Reply to: Re: [gentoo-amd64] Re: Re: Fragmentation (Was: Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite) by Peter Humphrey
1 Peter Humphrey posted <43F20505.3040207@××××××××××.uk>, excerpted below,
2 on Tue, 14 Feb 2006 16:27:49 +0000:
3
4 > Gavin Seddon wrote:
5 >> From reading these posts I have sorted out 'other' issues that are off
6 >> list. Namely [...] creating a partition for /usr/portage to 'aid'
7 >> fragmentation.
8 >
9 > I did that some time ago in a simple-minded fashion, but I've had to
10 > revise my layout somewhat. I had an ext3 partition solely for
11 > /usr/portage, and it was mounted on that node, but every emerge --sync
12 > deleted the /lost+found directory. I don't know how serious that is, but
13 > of course no-one likes to have damaged file systems on their boxes, so I
14 > used tune2fs -c to set the mount count to 1 so that it was repaired
15 > every time the box booted, but that was taking too long and it was only
16 > a palliative anyway. The solution was easy, though of course I didn't
17 > see it at first! I only had to point /usr/portage to /usr-bits/portage
18 > (/usr-bits being the mount point of the partition) instead of mounting
19 > the partition directly in place.
20
21 If you have /usr/portage (or more precisely, your portage tree, wherever
22 it exists on your file system, since it can be moved anywhere and the
23 pointer in make.conf changed accordingly) on its own partition, consider
24 making it reiserfs, even if you don't consider reiserfs stable enough for
25 regular use. Correspondingly, those wanting a "safe" but relatively
26 high-use location for testing reiser4 should fine the portage tree a very
27 good choice.
28
29 Here's the logic: The portage tree (without the packages subdir if you use
30 FEATURES=buildpkg, and with or without the distdir package sources, your
31 call) has two very important characteristics that make it a perfect match
32 for either reiser file system, it has a very high number of very small
33 files -- less than a filesystem block -- and the data in it is ultimately
34 backed up -- available from multiple sources on the net, so easily
35 redownloadable should anything go wrong, thus addressing the distrust
36 issue -- some folks don't consider reiserfs stable enough to store
37 critical data on, and I myself wouldn't consider reiser4 stable enough for
38 non-redundant critical data.
39
40 When small files are stored on a regular filesystem, including ext2/3,
41 they are stored by block, each file taking up a full filesystem block of
42 data regardless of whether it's a single byte or exactly a block.
43 Likewise, a file a block and a byte long will take up two blocks worth of
44 space.
45
46 Reiserfs has tail-packing, altho it can be turned off. It stores the
47 <1-block ends of files (the entire file if it's less than a block, the
48 remainder of the file if it's more than a block but not an exact number of
49 blocks) packed together, requiring far less space. The savings can be
50 greater than 50% if the data is all small files. That is, while a regular
51 filesystem will require more than twice the actual data space to store a
52 set of small files, two gigs to store a gig of file data, reiser will
53 require only the single gig to store that same gig of data (plus the
54 journal space, but that's there with any journaled filesystem, including
55 ext3, and of course the metadata storage, that is, the inodes).
56
57 Likewise, reiserfs has been optimized to make working with small files
58 fast (in case you are wondering, the parallel for large files is xfs), and
59 there are a number of reports floating around on the forums of folks that
60 have switched their portage tree to reiserfs and been shocked at just how
61 much faster emerge --pretend and similar operations turned out to be.
62 Being conservative and because I've never had my portage tree on anything
63 else, so I have no first-hand experience on other filesystems to compare
64 to, I'll only say it shouldn't be /slower/, that is, if those operations
65 take longer on reiserfs, something's definitely wrong, but I'm not going
66 to claim any speedup, only that the storage efficiency is higher,
67 space-wise, and that it isn't slower to access.
68
69 As I said, some folks are concerned with reiserfs' reputation for
70 instability. I haven't found that to be the case since the kernel
71 defaulted to journal=ordered for reiserfs, but in any case, as long as
72 it's stable enough to keep temporary data on (and it is certainly that),
73 that shouldn't be an issue for the portage tree, since recovery is only an
74 emerge sync away. With an exception for those with infrequent or very
75 expensive per-byte or very slow (analog dialup) internet connections, who
76 probably aren't going to be using Gentoo in any case, there's therefore no
77 data stability issues with the portage tree on reiserfs, even for those
78 that wouldn't trust it with their regular data.
79
80 That makes reiserfs the best choice for the portage tree, where the
81 portage tree is on its own partition, anyway. Those who don't trust
82 reiserfs for data stability should have no qualms here, because the data
83 is ultimately backed up in any case, and reiserfs /will/ be /far/ more
84 efficient at storing the tree, and /likely/ will be faster, as well, altho
85 I can't personally verify that as I've never run the tree on anything else
86 to compare speed against.
87
88 As I said, you may want to keep the packages dir, /usr/portage/packages by
89 default, on another partition. This is easiest to accomplish simply by
90 pointing it elsewhere in make.conf. The distdir subdir isn't synced with
91 the portage tree, but contains a local cache of source tarballs that
92 portage has downloaded for various merges. As such, it's ultimately
93 backed up to the internet as well, but because those tarballs are fetched
94 by portage one at a time as it needs them, not synced with the tree, and
95 because the files aren't as small in any case, some folks might want to
96 keep this separate from the portage tree as well, tho the urgency isn't as
97 great here as it would be with packages.
98
99 All that dealt with, there's only one possibly valid reason I'm aware of,
100 for those already splitting out the portage tree onto its own partition,
101 why they might /not/ wish to use reiserfs. Those that only have ext2/3
102 configured in their kernel may not wish to bother configuring reiserfs for
103 just the portage tree. I'm actually in that situation with ext2/3 -- I
104 don't have anything on my system using it, so there'd have to be a
105 stronger than usual reason to use it on a particular partition, in ordered
106 to justify the bother of compiling it into the tree.
107
108 As for the lost&found dir, as someone else mentioned, that's trivial. The
109 only use for it is when the filesystem finds something during an fsck that
110 isn't properly linked, that might still be needed. fsck creates
111 lost&found in the root dir of the filesystem to place these lost files in,
112 as it finds them. It's recreated if needed, so IMO it's actually better
113 /not/ to have a lost&found by default, as that way, if it exists, you know
114 to look in it and see what fsck might have recovered, and either delete it
115 or move it back into its usual place in the tree. Again, because the
116 entire portage tree is recoverable from the net with an emerge sync
117 anyway, it should be entirely safe to ignore a lost&found and have it
118 deleted in a sync, in any case, because the data will be updated with an
119 emerge sync anyway, and it's less hassle to do that than to manually
120 figure out what any files there might be and where they go in the tree,
121 when even if you did, an emerge sync might simply be deleting the file
122 anyway, as outdated.
123
124 > A word of caution for anyone considering adopting Duncan's scheme
125 > without much thought: what he says is certainly good sense, but don't go
126 > copying him if you're just splitting out bits of the file system that
127 > can be made common to different running systems. I spent half of last
128 > night exploring some of the snags! My aim was to separate some large
129 > slabs of files into their own partitions and mount those partitions on
130 > whichever system I was booting. I have four Linux systems multibooting
131 > on this box, and it seems tidy to find common areas and treat them as
132 > such.
133 >
134 > Don't combine systems' /var/log directories - you will end up with
135 > deeply troubled emerge.log and PORT_LOGDIR records.
136
137 "Deeply troubled" is an apt description. <g> In any case, combining
138 /var/log dirs makes no sense, because what's the log info there for if not
139 to be able to examine should it be necessary for troubleshooting or record
140 keeping purposes? Throwing the logs from multiple independent boot
141 systems into the same location, with no way to tell what belongs to what
142 system, can only confuse things, and destroys the entire supposition
143 behind having the logs in the first place. (Note that this is different
144 than the convenience of running a central syslog with multiple machines
145 logging to it, because in that case, the log will have machine
146 identification labels to sort out which logged events correspond to which
147 system. Just using the same partition for everything just jumbles
148 everything up in a big mess, defeating the purpose of logging in the first
149 place!)
150
151 > Don't combine systems' /usr/src directories. It won't do much harm, but
152 > the records of which kernel version is installed in each system will
153 > cause overwriting anyway, thus spoiling the idea, especially if you have
154 > USE=symlink for gentoo-sources.
155
156 I'd rather say "Know what you're doing if you combine /usr/src dirs." It
157 can be done if the appropriate organization is maintained, but as you
158 point out, automating the /usr/src/linux symlink with USE=symlink for the
159 kernel-sources packages is NOT a good idea if you are running a combined
160 /usr/src. Perhaps a more intelligent solution could be borrowed from
161 Mandrake (and I suppose Mandriva continues the idea, but don't know),
162 where an initscript setup a number of symlinks, and this one could be
163 included. The idea being to set the symlink to point to the sources
164 corresponding to the kernel booted, where that is possible, leaving it
165 alone if there are no sources found that correspond to the booting kernel.
166
167 So... /usr/src can be multi-boot combined, but it's not as trivial as one
168 might expect if one doesn't consider the consequences, so don't do it
169 without some thought, first.
170
171 Actually, that "don't do it without some thought, first" can be applied to
172 a /lot/ of thing! =8^)
173
174 > I'm not yet sure of the wisdom of combining /var/tmp from different
175 > systems: I haven't yet sorted out the consequences for the portage work
176 > directories. Watch this space.
177
178 By definition, /tmp and /var/tmp should be multi-boot combineable, and
179 combineable between the two, as well (my /var/tmp is simply a symlink to
180 /tmp, altho on a multi-human-user system, there are security issues one
181 should consider before doing it -- yet another place to "don't do it
182 without some thought, first" <g>), because the data is by definition
183 "temporary", which in this case is defined as "not needing to survive a
184 reboot".
185
186 That "tmp" is defined as "not needing to survive a reboot" is, BTW, the
187 official FHS (File Hierarchy Standard, part of LSB aka Linux Standard
188 Base) definition as well, AFAIK -- the idea being that the practice of
189 certain distributions, deleting everything in /tmp and /var/tmp at boot,
190 is specifically allowed and shouldn't cause any malfunctions.
191
192 In any case, there's no damage to portage by combining those dirs, or
193 removing the contents at boot, either. If you have something set up
194 locally that saves data across reboots to either /var/tmp or /tmp,
195 consider changing it, as that's not what those dirs are for, and expecting
196 them to be safe for that could get broken at some point.
197
198 > Apologies if I'm not making much sense today - blame the loss of sleep
199 > and the head cold that probably caused it :-(
200
201 Actually, great sense! Thanks for bringing up the possibility of
202 multi-boot partition combines! It certainly adds to the information
203 available in the discussion!
204
205 --
206 Duncan - List replies preferred. No HTML msgs.
207 "Every nonfree program has a lord, a master --
208 and if you use the program, he is your master." Richard Stallman in
209 http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
210
211
212 --
213 gentoo-amd64@g.o mailing list

Replies

Subject Author
Re: [gentoo-amd64] Re: Fragmentation Paul de Vrieze <pauldv@g.o>
Re: [gentoo-amd64] Re: Fragmentation Peter Humphrey <prh@××××××××××.uk>