Gentoo Archives: gentoo-dev

From: Alastair Tse <liquidx@g.o>
To: gentoo-dev@g.o
Subject: Re: [gentoo-dev] python-2.3.2 testing required
Date: Mon, 17 Nov 2003 00:29:46
Message-Id: 1069028971.19556.39.camel@huggins.eng.cam.ac.uk
In Reply to: [gentoo-dev] python-2.3.2 testing required by Alastair Tse
1 On Wed, 2003-11-12 at 18:46, Alastair Tse wrote:
2 > The reason why I'm not making this default is because UCS4 python uses
3 > more memory. An example is supybot (Python IRC bot) that uses 8M for
4 > UCS2 and 13M for UCS4. But note that this example is not scientific
5 > because the machines were different in kernel version, compiler and
6 > compiler optimisations.
7
8 I've found a little spare time this weekend to do a little bit of memory
9 benchmarking to prove/disprove my point about UCS4 using more memory
10 than UCS2.
11
12 I wrote and conducted 2 simple tests that I thought were relevant to
13 Python on Gentoo. The two tests I conducted were:
14
15 1. Generating a large number of Python Unicode Strings and recording the
16 memory usage.
17 2. Running "emerge" on various different options and recording the
18 memory usage.
19
20 The results demonstrate that UCS4 is more memory hungry _only_ if a
21 script/module/application uses unicode strings. This means any bindings
22 that use PyUnicode_* objects (for example, pygtk) or any script that
23 uses unicode strings. If a script/module/application does not use
24 unicode objects, it suffers from no noticable memory impact.
25
26 The numbers reported are averages from 3 or more runs. In nearly all
27 cases, the memory usage was constant.
28
29 Results:
30 ========
31 1 : Generating Unicode Multi-Byte Strings (1 to 10000) strings
32 (String Size of 256 mbchars stored in a regular python list)
33 -------------------------------------------------------------------
34 Strings: (UCS2) Mem RSS Shared (UCS4) Mem RSS Shared %+
35 1 1839 710 1535 1839 711 1535 0
36 10 1871 712 1535 1871 717 1535 0
37 100 1904 765 1535 1971 830 1535 3.5
38 1000 2465 1336 1535 3102 1960 1535 25.84
39 10000 8213 7052 1535 14445 13309 1535 75.80
40
41 2 : Generating Unicode ASCII Strings (1 to 10000) strings
42 (String Size of 256 chars stored in a regular python list)
43 -------------------------------------------------------------------
44 Strings: (UCS2) Mem RSS Shared (UCS4) Mem RSS Shared %+
45 1 1839 710 1535 1839 711 1535 0
46 10 1871 712 1535 1871 717 1535 0
47 100 1904 765 1535 1971 830 1535 3.5
48 1000 2465 1336 1535 3102 1960 1535 25.84
49 10000 8213 7053 1535 14445 13309 1535 75.80
50
51 3: Max Memory Usage under "emerge -p kde"
52 -------------------------------------------------------------------
53 Mem RSS Shared
54 UCS2: 3222 1893 1955
55 UCS4: 3123 1769 1955
56
57 4: Max Memory Usage under "emerge search kde"
58 -------------------------------------------------------------------
59 Mem RSS Shared
60 UCS2: 3221 1898 1955
61 UCS4: 3160 1803 1955
62
63 Discussion
64 ==========
65
66 There are two immediate observations. One is that UCS4 does use more
67 memory compared to UCS2 when unicode strings are involved. From Test 1
68 and 2, the VM has an overhead of 1.8M and as more strings are created,
69 their memory usage difference steadily increase to 75% difference.
70
71 The other observation is that if there are is no unicode usage in
72 application, like "emerge", there is virtually no impact. Actually, in
73 this case, you'll find that UCS4 uses about 60K ot 100K less memory than
74 UCS2. I don't have an explanation for that behaviour.
75
76 Other observations that can be made which do not relate to the UCS2/UCS4
77 benchmark is that it doesn't matter if you are primarily dealing with
78 ASCII or Multi-Byte (eg, CJK characters) strings. As soon as they are
79 cast as unicode objects, they use more memory. Note that the two runs
80 have identical memory usage, that is not a mistake.
81
82 Another one is that 'emerge' uses the same amount of memory regardless
83 of what is being run. I had an informal test running just "emerge info"
84 and it still used approximately the same memory as running more
85 complicated things like merging packages or searching the package
86 database.
87
88 Other Details
89 =============
90 The above results were run with dev-lang/python-2.3.2-r1 with:
91 Kernel 2.6.0-test9-mm1
92 Glibc-2.3.2-r8 (w/ nptl)
93 GCC-3.3.2
94 Portage 2.0.49-r16
95
96 The raw logs for the tests and the scripts used can be found at:
97 http://dev.gentoo.org/~liquidx/python-test/
98
99 Remarks
100 =======
101
102 After running these tests, I still divided about whether UCS4 should be
103 enabled by default. I'm not seeing the added benefits of UCS4 in
104 contrast with the memory usage increase it brings. Yet, it also seems
105 like the "right" thing to do for m17n support.
106
107 Cheers,
108 --
109 Alastair 'liquidx' Tse
110 >> Gentoo Developer
111 >> http://www.liquidx.net/ | http://dev.gentoo.org/~liquidx/

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-dev] python-2.3.2 testing required Toby Dickenson <tdickenson@××××××××××××××××××××××××××××.uk>