Gentoo Logo
Gentoo Spaceship




Note: Due to technical difficulties, the Archives are currently not up to date. GMANE provides an alternative service for most mailing lists.
c.f. bug 424647
List Archive: gentoo-amd64
Navigation:
Lists: gentoo-amd64: < Prev By Thread Next > < Prev By Date Next >
Headers:
To: gentoo-amd64@g.o
From: Matt Randolph <mattr@...>
Subject: Re: x86_64 optimization patches for glibc.
Date: Sat, 23 Jul 2005 18:15:19 -0400
Simon Strandman wrote:

> Hi!
>
> Some binary distros like Mandrake and suse patches their glibcs with 
> x86_64 optimized strings and an x86_64 optimized libm to improve 
> performance.
>
> I tried extracting those patches from an mandrake SRPM and add them to 
> the glibc 2.3.5 ebuild. The x86_64 optimized strings patch built and 
> worked perfectly and gave a large speedup as you can see below. But I 
> couldn't get glibc to build with the libm patch because of unresolved 
> symbols (and I'm no programmer so I have no idea how to fix that).
>
> I found a small C program on a suse mailing-list to measure glibc 
> memory copy performance:
> http://lists.suse.com/archive/suse-amd64/2005-Mar/0220.html
>
> With the glibc 2.3.5 currently in gentoo I get:
> isidor ~ # ./memcpy 2200 1000 1048576
> Memory to memory copy rate = 1291.600098 MBytes / sec. Block size = 
> 1048576.
>
> But with glibc 2.3.5 + amd64 optimized strings I get:
> isidor ~ # ./memcpy 2200 1000 1048576
> Memory to memory copy rate = 2389.321777 MBytes / sec. Block size = 
> 1048576.
>
> That's an improvement of over 1000mb/s! Suse 9.3 also gives about 
> 2300mb/s out of the box.
>
> How about adding these patches to gentoo? Perhaps in glibc 2.3.5-r1 
> before it leaves package.mask? I'll create a bugreport about it if you 
> agree!
>
> This .tar.bz2 contains the glibc directory from my overlay with the 
> mandrake patches included in files/mdk, but the libm patches are 
> commented out in the ebuild.
> http://snigel.no-ip.com/~nxsty/linux/glibc.tar.bz2
>
There is a bug in the original memcpy.c that will cause a segfault if 
you don't pass it any parameters.  Here is a fixed version.  I've left 
everything else alone (except for a spelling correction).

// memcpy.c - Measure how fast we can copy memory

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>

/* timing function */
#define rdtscll(val) do { \
     unsigned int a,d; \
     asm volatile("rdtsc" : "=a" (a), "=d" (d)); \
     (val) = ((unsigned long)a) | (((unsigned long)d)<<32); \
} while(0)

int main(int argc, char *argv[]) {
  int cpu_rate, num_loops, block_size, block_size_lwords, i, j;
  unsigned char *send_block_p, *rcv_block_p;
  unsigned long start_time, end_time;
  float rate;
  unsigned long *s_p, *r_p;

  if (argc != 4) {
    fprintf(stderr,
      "Usage: %s <cpu clk rate (MHz)> <num. iterations> <copy block 
size>\n",
           argv[0] );
    return 1;
  }

  cpu_rate = atoi(argv[1]);
  num_loops = atoi(argv[2]);
  block_size = atoi(argv[3]);

  block_size_lwords = block_size / sizeof(unsigned long);
  block_size = sizeof(unsigned long) * block_size_lwords;

  send_block_p = malloc(block_size);
  rcv_block_p = malloc(block_size);

  if ((send_block_p == NULL) || (rcv_block_p == NULL)) {
    fprintf(stderr, "Malloc failed to allocate block(s) of size %d.\n",
            block_size);
  }

// start_time = clock();
    rdtscll(start_time);

  for (i = 0; i < num_loops; i++) {
    memcpy(rcv_block_p, send_block_p, block_size);

// s_p = (unsigned long *) send_block_p;
// r_p = (unsigned long *) rcv_block_p;
//
// for (j = 0 ; j < block_size_lwords; j++) {
// *(r_p++) = *(s_p++);
// }
  }

// end_time = clock();
    rdtscll(end_time);

  rate = (float) (block_size) * (float) (num_loops) /
         ((float) (end_time - start_time)) *
         ((float) cpu_rate) * 1.0E6 / 1.0E6;

  fprintf(stdout,
    "Memory to memory copy rate = %f MBytes / sec. Block size = %d.\n",
    rate, block_size);

} /* end main() */


-- 
"Pluralitas non est ponenda sine necessitate" - W. of O.

-- 
gentoo-amd64@g.o mailing list


Replies:
Re: x86_64 optimization patches for glibc.
-- Matt Randolph
References:
x86_64 optimization patches for glibc.
-- Simon Strandman
Navigation:
Lists: gentoo-amd64: < Prev By Thread Next > < Prev By Date Next >
Previous by thread:
Re: x86_64 optimization patches for glibc.
Next by thread:
Re: x86_64 optimization patches for glibc.
Previous by date:
Re: controlling processor peaks
Next by date:
Re: x86_64 optimization patches for glibc.


Updated Jun 17, 2009

Summary: Archive of the gentoo-amd64 mailing list.

Donate to support our development efforts.

Copyright 2001-2013 Gentoo Foundation, Inc. Questions, Comments? Contact us.