Gentoo Archives: gentoo-commits

From: "Mike Frysinger (vapier)" <vapier@g.o>
To: gentoo-commits@l.g.o
Subject: [gentoo-commits] gentoo commit in src/patchsets/glibc/2.11.2: 6027_all_alpha-fix-memchr.patch 6028_all_alpha-fix-memchr.patch 6029_all_alpha-fix-memchr.patch README.history
Date: Wed, 29 Sep 2010 23:55:02
Message-Id: 20100929235455.36DB620051@flycatcher.gentoo.org
1 vapier 10/09/29 23:54:55
2
3 Modified: 6027_all_alpha-fix-memchr.patch README.history
4 Added: 6028_all_alpha-fix-memchr.patch
5 6029_all_alpha-fix-memchr.patch
6 Log:
7 grab more alpha memchr fixes from upstream
8
9 Revision Changes Path
10 1.2 src/patchsets/glibc/2.11.2/6027_all_alpha-fix-memchr.patch
11
12 file : http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.11.2/6027_all_alpha-fix-memchr.patch?rev=1.2&view=markup
13 plain: http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.11.2/6027_all_alpha-fix-memchr.patch?rev=1.2&content-type=text/plain
14 diff : http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.11.2/6027_all_alpha-fix-memchr.patch?r1=1.1&r2=1.2
15
16 Index: 6027_all_alpha-fix-memchr.patch
17 ===================================================================
18 RCS file: /var/cvsroot/gentoo/src/patchsets/glibc/2.11.2/6027_all_alpha-fix-memchr.patch,v
19 retrieving revision 1.1
20 retrieving revision 1.2
21 diff -u -r1.1 -r1.2
22 --- 6027_all_alpha-fix-memchr.patch 8 Jun 2010 05:00:42 -0000 1.1
23 +++ 6027_all_alpha-fix-memchr.patch 29 Sep 2010 23:54:54 -0000 1.2
24 @@ -1,8 +1,24 @@
25 -2009-07-27 Aurelien Jarno <aurelien@×××××××.net>
26 +From 200b5faee1cfac10d831e9b278ef294ca3119f53 Mon Sep 17 00:00:00 2001
27 +From: Richard Henderson <rth@×××××××.net>
28 +Date: Tue, 4 May 2010 09:06:15 -0700
29 +Subject: [PATCH] alpha: fix memchr to not cause memory faults.
30
31 +http://www.mail-archive.com/debian-alpha@××××××××××××.org/msg25088.html
32 +
33 +2010-05-03 Aurelien Jarno <aurelien@×××××××.net>
34 +
35 * sysdeps/alpha/memchr.S: Use prefetch load.
36 * sysdeps/alpha/alphaev6/memchr.S: Likewise.
37
38 +Signed-off-by: Matt Turner <mattst88@×××××.com>
39 +---
40 + ChangeLog.alpha | 5 +++++
41 + sysdeps/alpha/alphaev6/memchr.S | 26 +++++++++++++-------------
42 + sysdeps/alpha/memchr.S | 22 +++++++++++-----------
43 + 3 files changed, 29 insertions(+), 24 deletions(-)
44 +
45 +diff --git a/sysdeps/alpha/alphaev6/memchr.S b/sysdeps/alpha/alphaev6/memchr.S
46 +index 88e91fa..fe77cd8 100644
47 --- ports/sysdeps/alpha/alphaev6/memchr.S
48 +++ ports/sysdeps/alpha/alphaev6/memchr.S
49 @@ -127,7 +127,7 @@ $first_quad:
50 @@ -65,6 +81,8 @@
51 nop # E :
52 bne $18, $last_quad # U :
53
54 +diff --git a/sysdeps/alpha/memchr.S b/sysdeps/alpha/memchr.S
55 +index 5d713d5..87c7fb1 100644
56 --- ports/sysdeps/alpha/memchr.S
57 +++ ports/sysdeps/alpha/memchr.S
58 @@ -119,7 +119,7 @@ $first_quad:
59 @@ -115,3 +133,5 @@
60 bne a2, $last_quad # e1 :
61
62 $not_found:
63 +--
64 +1.7.3
65
66
67
68 1.10 src/patchsets/glibc/2.11.2/README.history
69
70 file : http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.11.2/README.history?rev=1.10&view=markup
71 plain: http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.11.2/README.history?rev=1.10&content-type=text/plain
72 diff : http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.11.2/README.history?r1=1.9&r2=1.10
73
74 Index: README.history
75 ===================================================================
76 RCS file: /var/cvsroot/gentoo/src/patchsets/glibc/2.11.2/README.history,v
77 retrieving revision 1.9
78 retrieving revision 1.10
79 diff -u -r1.9 -r1.10
80 --- README.history 29 Sep 2010 23:52:50 -0000 1.9
81 +++ README.history 29 Sep 2010 23:54:54 -0000 1.10
82 @@ -2,6 +2,9 @@
83 + 0010_all_glibc-locale-output-quote.patch
84 + 0050_all_glibc-make-3.82-rules.patch
85 + 1509_all_glibc-2.11-hppa-SOCK_CLOEXEC.patch
86 + U 6027_all_alpha-fix-memchr.patch
87 + + 6028_all_alpha-fix-memchr.patch
88 + + 6029_all_alpha-fix-memchr.patch
89 R 6028_all_alpha-fix-SOCK_NONBLOCK.patch -> 6030_all_alpha-fix-SOCK_NONBLOCK.patch
90 + 6031_all_sparc-glibc-2.12-epoll_create1.patch
91 + 6531_all_sparc-glibc-2.12-epoll_create1.patch
92
93
94
95 1.1 src/patchsets/glibc/2.11.2/6028_all_alpha-fix-memchr.patch
96
97 file : http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.11.2/6028_all_alpha-fix-memchr.patch?rev=1.1&view=markup
98 plain: http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.11.2/6028_all_alpha-fix-memchr.patch?rev=1.1&content-type=text/plain
99
100 Index: 6028_all_alpha-fix-memchr.patch
101 ===================================================================
102 From 926cf114f7ca2b19116cac005303040648e17e77 Mon Sep 17 00:00:00 2001
103 From: Richard Henderson <rth@×××××××.net>
104 Date: Wed, 15 Sep 2010 10:41:43 -0700
105 Subject: [PATCH] alpha: rewrite memchr.
106
107 [BZ #12019]
108 The new implementation does not read too much data.
109
110 2010-09-23 Richard Henderson <rth@××××××.com>
111
112 [BZ #12019]
113 * sysdeps/alpha/alphaev6/memchr.S: Remove.
114 * sysdeps/alpha/memchr.S: Remove.
115 * sysdeps/alpha/memchr.c: New.
116
117 ---
118 ChangeLog.alpha | 9 ++-
119 sysdeps/alpha/alphaev6/memchr.S | 193 ---------------------------------------
120 sysdeps/alpha/memchr.S | 176 -----------------------------------
121 sysdeps/alpha/memchr.c | 175 +++++++++++++++++++++++++++++++++++
122 4 files changed, 183 insertions(+), 370 deletions(-)
123 delete mode 100644 sysdeps/alpha/alphaev6/memchr.S
124 delete mode 100644 sysdeps/alpha/memchr.S
125 create mode 100644 sysdeps/alpha/memchr.c
126
127 diff --git a/sysdeps/alpha/alphaev6/memchr.S b/sysdeps/alpha/alphaev6/memchr.S
128 deleted file mode 100644
129 index fe77cd8..0000000
130 --- ports/sysdeps/alpha/alphaev6/memchr.S
131 +++ /dev/null
132 @@ -1,193 +0,0 @@
133 -/* Copyright (C) 2000, 2003 Free Software Foundation, Inc.
134 - This file is part of the GNU C Library.
135 - Contributed by David Mosberger (davidm@××××××××××.edu).
136 - EV6 optimized by Rick Gorton <rick.gorton@×××××××××××××××.com>.
137 -
138 - The GNU C Library is free software; you can redistribute it and/or
139 - modify it under the terms of the GNU Lesser General Public
140 - License as published by the Free Software Foundation; either
141 - version 2.1 of the License, or (at your option) any later version.
142 -
143 - The GNU C Library is distributed in the hope that it will be useful,
144 - but WITHOUT ANY WARRANTY; without even the implied warranty of
145 - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
146 - Lesser General Public License for more details.
147 -
148 - You should have received a copy of the GNU Lesser General Public
149 - License along with the GNU C Library; if not, write to the Free
150 - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
151 - 02111-1307 USA. */
152 -
153 -#include <sysdep.h>
154 -
155 - .arch ev6
156 - .set noreorder
157 - .set noat
158 -
159 -ENTRY(__memchr)
160 -#ifdef PROF
161 - ldgp gp, 0(pv)
162 - lda AT, _mcount
163 - jsr AT, (AT), _mcount
164 - .prologue 1
165 -#else
166 - .prologue 0
167 -#endif
168 -
169 - # Hack -- if someone passes in (size_t)-1, hoping to just
170 - # search til the end of the address space, we will overflow
171 - # below when we find the address of the last byte. Given
172 - # that we will never have a 56-bit address space, cropping
173 - # the length is the easiest way to avoid trouble.
174 - zap $18, 0x80, $5 # U : Bound length
175 - beq $18, $not_found # U :
176 - ldq_u $1, 0($16) # L : load first quadword Latency=3
177 - and $17, 0xff, $17 # E : L L U U : 00000000000000ch
178 -
179 - insbl $17, 1, $2 # U : 000000000000ch00
180 - cmpult $18, 9, $4 # E : small (< 1 quad) string?
181 - or $2, $17, $17 # E : 000000000000chch
182 - lda $3, -1($31) # E : U L L U
183 -
184 - sll $17, 16, $2 # U : 00000000chch0000
185 - addq $16, $5, $5 # E : Max search address
186 - or $2, $17, $17 # E : 00000000chchchch
187 - sll $17, 32, $2 # U : U L L U : chchchch00000000
188 -
189 - or $2, $17, $17 # E : chchchchchchchch
190 - extql $1, $16, $7 # U : $7 is upper bits
191 - beq $4, $first_quad # U :
192 - ldq_u $6, -1($5) # L : L U U L : eight or less bytes to search Latency=3
193 -
194 - extqh $6, $16, $6 # U : 2 cycle stall for $6
195 - mov $16, $0 # E :
196 - nop # E :
197 - or $7, $6, $1 # E : L U L U $1 = quadword starting at $16
198 -
199 - # Deal with the case where at most 8 bytes remain to be searched
200 - # in $1. E.g.:
201 - # $18 = 6
202 - # $1 = ????c6c5c4c3c2c1
203 -$last_quad:
204 - negq $18, $6 # E :
205 - xor $17, $1, $1 # E :
206 - srl $3, $6, $6 # U : $6 = mask of $18 bits set
207 - cmpbge $31, $1, $2 # E : L U L U
208 -
209 - nop
210 - nop
211 - and $2, $6, $2 # E :
212 - beq $2, $not_found # U : U L U L
213 -
214 -$found_it:
215 -#if defined(__alpha_fix__) && defined(__alpha_cix__)
216 - /*
217 - * Since we are guaranteed to have set one of the bits, we don't
218 - * have to worry about coming back with a 0x40 out of cttz...
219 - */
220 - cttz $2, $3 # U0 :
221 - addq $0, $3, $0 # E : All done
222 - nop # E :
223 - ret # L0 : L U L U
224 -#else
225 - /*
226 - * Slow and clunky. It can probably be improved.
227 - * An exercise left for others.
228 - */
229 - negq $2, $3 # E :
230 - and $2, $3, $2 # E :
231 - and $2, 0x0f, $1 # E :
232 - addq $0, 4, $3 # E :
233 -
234 - cmoveq $1, $3, $0 # E : Latency 2, extra map cycle
235 - nop # E : keep with cmov
236 - and $2, 0x33, $1 # E :
237 - addq $0, 2, $3 # E : U L U L : 2 cycle stall on $0
238 -
239 - cmoveq $1, $3, $0 # E : Latency 2, extra map cycle
240 - nop # E : keep with cmov
241 - and $2, 0x55, $1 # E :
242 - addq $0, 1, $3 # E : U L U L : 2 cycle stall on $0
243 -
244 - cmoveq $1, $3, $0 # E : Latency 2, extra map cycle
245 - nop
246 - nop
247 - ret # L0 : L U L U
248 -#endif
249 -
250 - # Deal with the case where $18 > 8 bytes remain to be
251 - # searched. $16 may not be aligned.
252 - .align 4
253 -$first_quad:
254 - andnot $16, 0x7, $0 # E :
255 - insqh $3, $16, $2 # U : $2 = 0000ffffffffffff ($16<0:2> ff)
256 - xor $1, $17, $1 # E :
257 - or $1, $2, $1 # E : U L U L $1 = ====ffffffffffff
258 -
259 - cmpbge $31, $1, $2 # E :
260 - bne $2, $found_it # U :
261 - # At least one byte left to process.
262 - ldq $31, 8($0) # L :
263 - subq $5, 1, $18 # E : U L U L
264 -
265 - addq $0, 8, $0 # E :
266 - # Make $18 point to last quad to be accessed (the
267 - # last quad may or may not be partial).
268 - andnot $18, 0x7, $18 # E :
269 - cmpult $0, $18, $2 # E :
270 - beq $2, $final # U : U L U L
271 -
272 - # At least two quads remain to be accessed.
273 -
274 - subq $18, $0, $4 # E : $4 <- nr quads to be processed
275 - and $4, 8, $4 # E : odd number of quads?
276 - bne $4, $odd_quad_count # U :
277 - # At least three quads remain to be accessed
278 - nop # E : L U L U : move prefetched value to correct reg
279 -
280 - .align 4
281 -$unrolled_loop:
282 - ldq $1, 0($0) # L : load quad
283 - xor $17, $1, $2 # E :
284 - ldq $31, 8($0) # L : prefetch next quad
285 - cmpbge $31, $2, $2 # E : U L U L
286 -
287 - bne $2, $found_it # U :
288 - addq $0, 8, $0 # E :
289 - nop # E :
290 - nop # E :
291 -
292 -$odd_quad_count:
293 - ldq $1, 0($0) # L : load quad
294 - xor $17, $1, $2 # E :
295 - ldq $31, 8($0) # L : prefetch $4
296 - cmpbge $31, $2, $2 # E :
297 -
298 - addq $0, 8, $6 # E :
299 - bne $2, $found_it # U :
300 - cmpult $6, $18, $6 # E :
301 - addq $0, 8, $0 # E :
302 -
303 - bne $6, $unrolled_loop # U :
304 - nop # E :
305 - nop # E :
306 - nop # E :
307 -
308 -$final: ldq $1, 0($0) # L : load last quad
309 - subq $5, $0, $18 # E : $18 <- number of bytes left to do
310 - nop # E :
311 - bne $18, $last_quad # U :
312 -
313 -$not_found:
314 - mov $31, $0 # E :
315 - nop # E :
316 - nop # E :
317 - ret # L0 :
318 -
319 - END(__memchr)
320 -
321 -weak_alias (__memchr, memchr)
322 -#if !__BOUNDED_POINTERS__
323 -weak_alias (__memchr, __ubp_memchr)
324 -#endif
325 -libc_hidden_builtin_def (memchr)
326 diff --git a/sysdeps/alpha/memchr.S b/sysdeps/alpha/memchr.S
327 deleted file mode 100644
328 index 87c7fb1..0000000
329 --- ports/sysdeps/alpha/memchr.S
330 +++ /dev/null
331 @@ -1,176 +0,0 @@
332 -/* Copyright (C) 1996, 2000, 2003 Free Software Foundation, Inc.
333 - This file is part of the GNU C Library.
334 - Contributed by David Mosberger (davidm@××××××××××.edu).
335 -
336 - The GNU C Library is free software; you can redistribute it and/or
337 - modify it under the terms of the GNU Lesser General Public
338 - License as published by the Free Software Foundation; either
339 - version 2.1 of the License, or (at your option) any later version.
340 -
341 - The GNU C Library is distributed in the hope that it will be useful,
342 - but WITHOUT ANY WARRANTY; without even the implied warranty of
343 - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
344 - Lesser General Public License for more details.
345 -
346 - You should have received a copy of the GNU Lesser General Public
347 - License along with the GNU C Library; if not, write to the Free
348 - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
349 - 02111-1307 USA. */
350 -
351 -/* Finds characters in a memory area. Optimized for the Alpha:
352 -
353 - - memory accessed as aligned quadwords only
354 - - uses cmpbge to compare 8 bytes in parallel
355 - - does binary search to find 0 byte in last
356 - quadword (HAKMEM needed 12 instructions to
357 - do this instead of the 9 instructions that
358 - binary search needs).
359 -
360 -For correctness consider that:
361 -
362 - - only minimum number of quadwords may be accessed
363 - - the third argument is an unsigned long
364 -*/
365 -
366 -#include <sysdep.h>
367 -
368 - .set noreorder
369 - .set noat
370 -
371 -ENTRY(__memchr)
372 -#ifdef PROF
373 - ldgp gp, 0(pv)
374 - lda AT, _mcount
375 - jsr AT, (AT), _mcount
376 - .prologue 1
377 -#else
378 - .prologue 0
379 -#endif
380 -
381 - # Hack -- if someone passes in (size_t)-1, hoping to just
382 - # search til the end of the address space, we will overflow
383 - # below when we find the address of the last byte. Given
384 - # that we will never have a 56-bit address space, cropping
385 - # the length is the easiest way to avoid trouble.
386 - zap a2, 0x80, t4 #-e0 :
387 -
388 - beq a2, $not_found # .. e1 :
389 - ldq_u t0, 0(a0) # e1 : load first quadword
390 - insbl a1, 1, t1 # .. e0 : t1 = 000000000000ch00
391 - and a1, 0xff, a1 #-e0 : a1 = 00000000000000ch
392 - cmpult a2, 9, t3 # .. e1 :
393 - or t1, a1, a1 # e0 : a1 = 000000000000chch
394 - lda t2, -1(zero) # .. e1 :
395 - sll a1, 16, t1 #-e0 : t1 = 00000000chch0000
396 - addq a0, t4, t4 # .. e1 :
397 - or t1, a1, a1 # e1 : a1 = 00000000chchchch
398 - unop # :
399 - sll a1, 32, t1 #-e0 : t1 = chchchch00000000
400 - or t1, a1, a1 # e1 : a1 = chchchchchchchch
401 - extql t0, a0, t6 # e0 :
402 - beq t3, $first_quad # .. e1 :
403 -
404 - ldq_u t5, -1(t4) #-e1 : eight or less bytes to search
405 - extqh t5, a0, t5 # .. e0 :
406 - mov a0, v0 # e0 :
407 - or t6, t5, t0 # .. e1 : t0 = quadword starting at a0
408 -
409 - # Deal with the case where at most 8 bytes remain to be searched
410 - # in t0. E.g.:
411 - # a2 = 6
412 - # t0 = ????c6c5c4c3c2c1
413 -$last_quad:
414 - negq a2, t5 #-e0 :
415 - xor a1, t0, t0 # .. e1 :
416 - srl t2, t5, t5 # e0 : t5 = mask of a2 bits set
417 - cmpbge zero, t0, t1 # .. e1 :
418 - and t1, t5, t1 #-e0 :
419 - beq t1, $not_found # .. e1 :
420 -
421 -$found_it:
422 - # Now, determine which byte matched:
423 - negq t1, t2 # e0 :
424 - and t1, t2, t1 # e1 :
425 -
426 - and t1, 0x0f, t0 #-e0 :
427 - addq v0, 4, t2 # .. e1 :
428 - cmoveq t0, t2, v0 # e0 :
429 -
430 - addq v0, 2, t2 # .. e1 :
431 - and t1, 0x33, t0 #-e0 :
432 - cmoveq t0, t2, v0 # .. e1 :
433 -
434 - and t1, 0x55, t0 # e0 :
435 - addq v0, 1, t2 # .. e1 :
436 - cmoveq t0, t2, v0 #-e0 :
437 -
438 -$done: ret # .. e1 :
439 -
440 - # Deal with the case where a2 > 8 bytes remain to be
441 - # searched. a0 may not be aligned.
442 - .align 4
443 -$first_quad:
444 - andnot a0, 0x7, v0 #-e1 :
445 - insqh t2, a0, t1 # .. e0 : t1 = 0000ffffffffffff (a0<0:2> ff)
446 - xor t0, a1, t0 # e0 :
447 - or t0, t1, t0 # e1 : t0 = ====ffffffffffff
448 - cmpbge zero, t0, t1 #-e0 :
449 - bne t1, $found_it # .. e1 :
450 -
451 - # At least one byte left to process.
452 -
453 - ldq zero, 8(v0) # e0 : prefetch next quad
454 - subq t4, 1, a2 # .. e1 :
455 - addq v0, 8, v0 #-e0 :
456 -
457 - # Make a2 point to last quad to be accessed (the
458 - # last quad may or may not be partial).
459 -
460 - andnot a2, 0x7, a2 # .. e1 :
461 - cmpult v0, a2, t1 # e0 :
462 - beq t1, $final # .. e1 :
463 -
464 - # At least two quads remain to be accessed.
465 -
466 - subq a2, v0, t3 #-e0 : t3 <- nr quads to be processed
467 - and t3, 8, t3 # e1 : odd number of quads?
468 - bne t3, $odd_quad_count # e1 :
469 -
470 - # At least three quads remain to be accessed
471 -
472 - .align 4
473 -$unrolled_loop:
474 - ldq t0, 0(v0) # e0 : load quad
475 - xor a1, t0, t1 # .. e1 :
476 - ldq zero, 8(v0) # e0 : prefetch next quad
477 - cmpbge zero, t1, t1 # .. e1:
478 - bne t1, $found_it # e0 :
479 -
480 - addq v0, 8, v0 # e1 :
481 -$odd_quad_count:
482 - ldq t0, 0(v0) # e0 : load quad
483 - xor a1, t0, t1 # .. e1 :
484 - ldq zero, 8(v0) # e0 : prefetch next quad
485 - cmpbge zero, t1, t1 # .. e1 :
486 - addq v0, 8, t5 #-e0 :
487 - bne t1, $found_it # .. e1 :
488 -
489 - cmpult t5, a2, t5 # e0 :
490 - addq v0, 8, v0 # .. e1 :
491 - bne t5, $unrolled_loop #-e1 :
492 -
493 -$final: ldq t0, 0(v0) # e0 : load last quad
494 - subq t4, v0, a2 # .. e1 : a2 <- number of bytes left to do
495 - bne a2, $last_quad # e1 :
496 -
497 -$not_found:
498 - mov zero, v0 #-e0 :
499 - ret # .. e1 :
500 -
501 - END(__memchr)
502 -
503 -weak_alias (__memchr, memchr)
504 -#if !__BOUNDED_POINTERS__
505 -weak_alias (__memchr, __ubp_memchr)
506 -#endif
507 -libc_hidden_builtin_def (memchr)
508 diff --git a/sysdeps/alpha/memchr.c b/sysdeps/alpha/memchr.c
509 new file mode 100644
510 index 0000000..c52841b
511 --- /dev/null
512 +++ ports/sysdeps/alpha/memchr.c
513 @@ -0,0 +1,175 @@
514 +/* Copyright (C) 2010 Free Software Foundation, Inc.
515 + This file is part of the GNU C Library.
516 +
517 + The GNU C Library is free software; you can redistribute it and/or
518 + modify it under the terms of the GNU Lesser General Public
519 + License as published by the Free Software Foundation; either
520 + version 2.1 of the License, or (at your option) any later version.
521 +
522 + The GNU C Library is distributed in the hope that it will be useful,
523 + but WITHOUT ANY WARRANTY; without even the implied warranty of
524 + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
525 + Lesser General Public License for more details.
526 +
527 + You should have received a copy of the GNU Lesser General Public
528 + License along with the GNU C Library; if not, write to the Free
529 + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
530 + 02111-1307 USA. */
531 +
532 +#include <string.h>
533 +
534 +typedef unsigned long word;
535 +
536 +static inline word
537 +ldq_u(const void *s)
538 +{
539 + return *(const word *)((word)s & -8);
540 +}
541 +
542 +#define unlikely(X) __builtin_expect ((X), 0)
543 +#define prefetch(X) __builtin_prefetch ((void *)(X), 0)
544 +
545 +#define cmpbeq0(X) __builtin_alpha_cmpbge(0, (X))
546 +#define find(X, Y) cmpbeq0 ((X) ^ (Y))
547 +
548 +/* Search no more than N bytes of S for C. */
549 +
550 +void *
551 +__memchr (const void *s, int xc, size_t n)
552 +{
553 + const word *s_align;
554 + word t, current, found, mask, offset;
555 +
556 + if (unlikely (n == 0))
557 + return 0;
558 +
559 + current = ldq_u (s);
560 +
561 + /* Replicate low byte of XC into all bytes of C. */
562 + t = xc & 0xff; /* 0000000c */
563 + t = (t << 8) | t; /* 000000cc */
564 + t = (t << 16) | t; /* 0000cccc */
565 + const word c = (t << 32) | t; /* cccccccc */
566 +
567 + /* Align the source, and decrement the count by the number
568 + of bytes searched in the first word. */
569 + s_align = (const word *)(s & -8);
570 + n += (s & 7);
571 +
572 + /* Deal with misalignment in the first word for the comparison. */
573 + mask = (1ul << (s & 7)) - 1;
574 +
575 + /* If the entire string fits within one word, we may need masking
576 + at both the front and the back of the string. */
577 + if (unlikely (n <= 8))
578 + {
579 + mask |= -1ul << n;
580 + goto last_quad;
581 + }
582 +
583 + found = find (current, c) & ~mask;
584 + if (unlikely (found))
585 + goto found_it;
586 +
587 + s_align++;
588 + n -= 8;
589 +
590 + /* If the block is sufficiently large, align to cacheline and prefetch. */
591 + if (unlikely (n >= 256))
592 + {
593 + /* Prefetch 3 cache lines beyond the one we're working on. */
594 + prefetch (s_align + 8);
595 + prefetch (s_align + 16);
596 + prefetch (s_align + 24);
597 +
598 + while ((word)s_align & 63)
599 + {
600 + current = *s_align;
601 + found = find (current, c);
602 + if (found)
603 + goto found_it;
604 + s_align++;
605 + n -= 8;
606 + }
607 +
608 + /* Within each cacheline, advance the load for the next word
609 + before the test for the previous word is complete. This
610 + allows us to hide the 3 cycle L1 cache load latency. We
611 + only perform this advance load within a cacheline to prevent
612 + reading across page boundary. */
613 +#define CACHELINE_LOOP \
614 + do { \
615 + word i, next = s_align[0]; \
616 + for (i = 0; i < 7; ++i) \
617 + { \
618 + current = next; \
619 + next = s_align[1]; \
620 + found = find (current, c); \
621 + if (unlikely (found)) \
622 + goto found_it; \
623 + s_align++; \
624 + } \
625 + current = next; \
626 + found = find (current, c); \
627 + if (unlikely (found)) \
628 + goto found_it; \
629 + s_align++; \
630 + n -= 64; \
631 + } while (0)
632 +
633 + /* While there's still lots more data to potentially be read,
634 + continue issuing prefetches for the 4th cacheline out. */
635 + while (n >= 256)
636 + {
637 + prefetch (s_align + 24);
638 + CACHELINE_LOOP;
639 + }
640 +
641 + /* Up to 3 cache lines remaining. Continue issuing advanced
642 + loads, but stop prefetching. */
643 + while (n >= 64)
644 + CACHELINE_LOOP;
645 +
646 + /* We may have exhausted the buffer. */
647 + if (n == 0)
648 + return NULL;
649 + }
650 +
651 + /* Quadword aligned loop. */
652 + current = *s_align;
653 + while (n > 8)
654 + {
655 + found = find (current, c);
656 + if (unlikely (found))
657 + goto found_it;
658 + current = *++s_align;
659 + n -= 8;
660 + }
661 +
662 + /* The last word may need masking at the tail of the compare. */
663 + mask = -1ul << n;
664 + last_quad:
665 + found = find (current, c) & ~mask;
666 + if (found == 0)
667 + return NULL;
668 +
669 + found_it:
670 +#ifdef __alpha_cix__
671 + offset = __builtin_alpha_cttz (found);
672 +#else
673 + /* Extract LSB. */
674 + found &= -found;
675 +
676 + /* Binary search for the LSB. */
677 + offset = (found & 0x0f ? 0 : 4);
678 + offset += (found & 0x33 ? 0 : 2);
679 + offset += (found & 0x55 ? 0 : 1);
680 +#endif
681 +
682 + return (void *)((word)s_align + offset);
683 +}
684 +
685 +#ifdef weak_alias
686 +weak_alias (__memchr, BP_SYM (memchr))
687 +#endif
688 +libc_hidden_builtin_def (memchr)
689 --
690 1.7.3
691
692
693
694 1.1 src/patchsets/glibc/2.11.2/6029_all_alpha-fix-memchr.patch
695
696 file : http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.11.2/6029_all_alpha-fix-memchr.patch?rev=1.1&view=markup
697 plain: http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.11.2/6029_all_alpha-fix-memchr.patch?rev=1.1&content-type=text/plain
698
699 Index: 6029_all_alpha-fix-memchr.patch
700 ===================================================================
701 From b54f998dc380ce327a7faf2c40e569fb2cf39bf0 Mon Sep 17 00:00:00 2001
702 From: Michael Cree <mcree@×××××××××.nz>
703 Date: Sun, 26 Sep 2010 21:15:51 +1300
704 Subject: [PATCH] alpha: Fix compile errors in memchr
705
706 Include missing header file and make some casts explicit.
707 ---
708 sysdeps/alpha/memchr.c | 7 ++++---
709 1 files changed, 4 insertions(+), 3 deletions(-)
710
711 diff --git a/sysdeps/alpha/memchr.c b/sysdeps/alpha/memchr.c
712 index c52841b..7e16f8a 100644
713 --- ports/sysdeps/alpha/memchr.c
714 +++ ports/sysdeps/alpha/memchr.c
715 @@ -17,6 +17,7 @@
716 02111-1307 USA. */
717
718 #include <string.h>
719 +#include <bp-sym.h>
720
721 typedef unsigned long word;
722
723 @@ -53,11 +54,11 @@ __memchr (const void *s, int xc, size_t n)
724
725 /* Align the source, and decrement the count by the number
726 of bytes searched in the first word. */
727 - s_align = (const word *)(s & -8);
728 - n += (s & 7);
729 + s_align = (const word *)((word)s & -8);
730 + n += ((word)s & 7);
731
732 /* Deal with misalignment in the first word for the comparison. */
733 - mask = (1ul << (s & 7)) - 1;
734 + mask = (1ul << ((word)s & 7)) - 1;
735
736 /* If the entire string fits within one word, we may need masking
737 at both the front and the back of the string. */
738 --
739 1.7.3