Gentoo Archives: gentoo-amd64

From: Matt Randolph <mattr@×××××.com>
To: gentoo-amd64@l.g.o
Subject: Re: [gentoo-amd64] x86_64 optimization patches for glibc.
Date: Sat, 23 Jul 2005 22:16:42
Message-Id: 42E2C177.6090001@erols.com
In Reply to: [gentoo-amd64] x86_64 optimization patches for glibc. by Simon Strandman
Simon Strandman wrote:

> Hi! > > Some binary distros like Mandrake and suse patches their glibcs with > x86_64 optimized strings and an x86_64 optimized libm to improve > performance. > > I tried extracting those patches from an mandrake SRPM and add them to > the glibc 2.3.5 ebuild. The x86_64 optimized strings patch built and > worked perfectly and gave a large speedup as you can see below. But I > couldn't get glibc to build with the libm patch because of unresolved > symbols (and I'm no programmer so I have no idea how to fix that). > > I found a small C program on a suse mailing-list to measure glibc > memory copy performance: > http://lists.suse.com/archive/suse-amd64/2005-Mar/0220.html > > With the glibc 2.3.5 currently in gentoo I get: > isidor ~ # ./memcpy 2200 1000 1048576 > Memory to memory copy rate = 1291.600098 MBytes / sec. Block size = > 1048576. > > But with glibc 2.3.5 + amd64 optimized strings I get: > isidor ~ # ./memcpy 2200 1000 1048576 > Memory to memory copy rate = 2389.321777 MBytes / sec. Block size = > 1048576. > > That's an improvement of over 1000mb/s! Suse 9.3 also gives about > 2300mb/s out of the box. > > How about adding these patches to gentoo? Perhaps in glibc 2.3.5-r1 > before it leaves package.mask? I'll create a bugreport about it if you > agree! > > This .tar.bz2 contains the glibc directory from my overlay with the > mandrake patches included in files/mdk, but the libm patches are > commented out in the ebuild. > http://snigel.no-ip.com/~nxsty/linux/glibc.tar.bz2 >
There is a bug in the original memcpy.c that will cause a segfault if you don't pass it any parameters. Here is a fixed version. I've left everything else alone (except for a spelling correction). // memcpy.c - Measure how fast we can copy memory #include <stdio.h> #include <stdlib.h> #include <time.h> #include <string.h> /* timing function */ #define rdtscll(val) do { \ unsigned int a,d; \ asm volatile("rdtsc" : "=a" (a), "=d" (d)); \ (val) = ((unsigned long)a) | (((unsigned long)d)<<32); \ } while(0) int main(int argc, char *argv[]) { int cpu_rate, num_loops, block_size, block_size_lwords, i, j; unsigned char *send_block_p, *rcv_block_p; unsigned long start_time, end_time; float rate; unsigned long *s_p, *r_p; if (argc != 4) { fprintf(stderr, "Usage: %s <cpu clk rate (MHz)> <num. iterations> <copy block size>\n", argv[0] ); return 1; } cpu_rate = atoi(argv[1]); num_loops = atoi(argv[2]); block_size = atoi(argv[3]); block_size_lwords = block_size / sizeof(unsigned long); block_size = sizeof(unsigned long) * block_size_lwords; send_block_p = malloc(block_size); rcv_block_p = malloc(block_size); if ((send_block_p == NULL) || (rcv_block_p == NULL)) { fprintf(stderr, "Malloc failed to allocate block(s) of size %d.\n", block_size); } // start_time = clock(); rdtscll(start_time); for (i = 0; i < num_loops; i++) { memcpy(rcv_block_p, send_block_p, block_size); // s_p = (unsigned long *) send_block_p; // r_p = (unsigned long *) rcv_block_p; // // for (j = 0 ; j < block_size_lwords; j++) { // *(r_p++) = *(s_p++); // } } // end_time = clock(); rdtscll(end_time); rate = (float) (block_size) * (float) (num_loops) / ((float) (end_time - start_time)) * ((float) cpu_rate) * 1.0E6 / 1.0E6; fprintf(stdout, "Memory to memory copy rate = %f MBytes / sec. Block size = %d.\n", rate, block_size); } /* end main() */ -- "Pluralitas non est ponenda sine necessitate" - W. of O. -- gentoo-amd64@g.o mailing list

Replies

Subject Author
Re: [gentoo-amd64] x86_64 optimization patches for glibc. Matt Randolph <mattr@×××××.com>