1 |
Alright, patch #2. |
2 |
|
3 |
This one is not 2.1 material. |
4 |
|
5 |
There are bugs in it (sheer size of changes gurantees I typo'd |
6 |
somewhere and haven't yet caught in re-reading the diff). |
7 |
|
8 |
It _does_ introduce a good potential for non typo'd bugs due to |
9 |
migrating config over to being a saner dict equiv. They've been |
10 |
stomped where found, but it's not a typo style bug; it introduces the |
11 |
possibility of bugs in consumer code that uses config objects and |
12 |
weren't doing containment tests on config instances prior to |
13 |
accessing (or using get which provides it) |
14 |
|
15 |
|
16 |
So... changes. Aside from the earlier list from "spring cleaning", |
17 |
following general changes- |
18 |
|
19 |
1) list comprehensions are your friend if you're doing filtering of a |
20 |
list. |
21 |
python -m timeit $'l=range(1000);i=0;\nwhile i < len(l):\n\tif l[i]!="asdf":del l[i]\n\telse:i+=1' |
22 |
100 loops, best of 3: 4.12 msec per loop |
23 |
python -m timeit $'l=range(1000);\nfor i in xrange(len(l)-1,-1,-1):\n\tif l[i]!="asdf":del l[i]' |
24 |
100 loops, best of 3: 3 msec per loop |
25 |
python -m timeit '[x for x in range(1000) if x == "asdf"]' |
26 |
1000 loops, best of 3: 887 usec per loop |
27 |
|
28 |
Lesson there is that inventing your own for loop is fricking slow |
29 |
(portage does this in several key spots). |
30 |
Trying to dodge the cost of removal by popping from the right also is |
31 |
costly. |
32 |
List comp's, on the other hand are cheap in worst case (as |
33 |
demonstrated above), but still win out in best case (1.9ms vs 3.75 for |
34 |
"invent your own loop" and 2.28ms for reversed walking). There still |
35 |
exists code that uses the slower versions, mainly because they're a |
36 |
bit harder to remove. |
37 |
|
38 |
Granted the bit above is kind of a "well duh", but portage code is |
39 |
kinda old/crufty :) |
40 |
|
41 |
2) config.__getitem__ previously would function as the equiv of |
42 |
get(key, ""); this sucks because to see if a key exists, and grab |
43 |
the value, you have to do a containment test _then_ pull the value. |
44 |
|
45 |
There's a reason dicts throw KeyErrors, and the profileration of |
46 |
has_key's usage in portage is proof of it. |
47 |
|
48 |
3) flesh out config to support usual iter*/len/__nonzero__ tests. |
49 |
4) Use itertools were appropriate. |
50 |
5) cacheddir isn't always your friend in listdir calls. If it's not |
51 |
cached, or it's stale and you're not doing files/dir/recursion, a |
52 |
simple os.listdir suffices (and is far faster since it doesn't |
53 |
trigger stat calls on each entry). |
54 |
6) elog_process doesn't need to reinvent the "dynamically grab an |
55 |
attribute from a module", load_mod exists already. |
56 |
7) don't pop from a list if you're just accessing it (stupid and |
57 |
slow). |
58 |
8) Know the return type. If None is returned when an fs accessing |
59 |
function is called for a nonexistant file, calling said func and |
60 |
testing for the file yourself is dumb. |
61 |
9) if blah in dar: |
62 |
for x in dar[blah]: |
63 |
# is dumb. use. |
64 |
for x in dar.get(blah, []): |
65 |
10) if os.path.exists(blah): |
66 |
return open(blah).read() |
67 |
return "" |
68 |
|
69 |
is dumb, try to open the file and catch the IOError instead; one |
70 |
less syscall involved... |
71 |
11) when building a unique list, do _not_ search the list and append |
72 |
if it doesn't exist in it. It's amazing how much portage does |
73 |
this :/ |
74 |
12) collision-protect cleanup. Existing code is pretty ugly, reworked |
75 |
it to load contents sets once (and don't hold them in memory), |
76 |
filtering per contents set instead of iterating over the image |
77 |
file set. Up shot, faster, and it's able to dump all colliding |
78 |
files instead of the first hit. |
79 |
13) added iter funcs for cp_all/cpv_all. Eliminates building a list |
80 |
in memory, and consumer code reacts immediately instead of pausing |
81 |
while the list is built. |
82 |
14) initial dead method removal. |
83 |
|
84 |
So... yeah. Speeds things up (algorithmic improvements), less code, |
85 |
and uses less intermediate memory; downside is the __getitem__ change |
86 |
introduces potential for complaints from consumer code, and the size |
87 |
of the changes make auditing the patch a bit fun. |
88 |
|
89 |
Throwing it y'alls way for anyone who is interested/wants to poke at |
90 |
it- fixes/comments welcome (especially considering this is what I'm |
91 |
running now). |
92 |
|
93 |
~harring |