Gentoo Archives: gentoo-dev

From: Nikos Chantziaras <realnc@×××××.de>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Re: FYI: Rules for distro-friendly packages
Date: Sun, 27 Jun 2010 14:47:46
Message-Id: i07o8g$ug0$
In Reply to: Re: [gentoo-dev] Re: FYI: Rules for distro-friendly packages by "Harald van Dijk"
On 06/27/2010 03:23 PM, Harald van Dijk wrote:
> On Sun, Jun 27, 2010 at 02:56:33PM +0300, Nikos Chantziaras wrote: >> On 06/27/2010 01:47 PM, Enrico Weigelt wrote: >>> * Nikos Chantziaras<realnc@×××××.de> schrieb: >>> >>>> Did it actually occur to anyone that warnings are not errors? >>>> You can have them for correct code. A warning means you might >>>> want to look at the code to check whether there's some real >>>> error there. It doesn't mean the code is broken. >>> >>> In my personal experience, most times a warning comes it, the >>> code *is* broken (but *might* work in most situations). >> >> That's the key to it: most times. Granted, without -Wall (or any >> other options that tweaks the default warning level) we can be very >> sure that the warning is the result of a mistake by the developer. >> But with -Wall, many warnings are totally not interesting ("unused >> parameter") and some even try to outsmart the programmer even >> though he/she knows better ("taking address of variable declared >> register"). In that last example, fixing it would even be wrong >> when you consider the optimizer and the fuzzy meaning of "register" >> which the compiler is totally free to ignore. > > The compiler is not totally free to ignore the register keyword. > Both the C and the C++ standards require that the compiler complain > when taking the address of a register variable. Other compilers will > issue a hard error for it. Fixing the code to not declare the > variable as register would be the correct thing to do.
No, it would not be the correct thing to do, because of the following. (This is part of a discussion between me and someone quite smarter than me, who explained the issue in detail.) The basic issue is that the code takes the address of the variable in question in expressions passed as parameters to certain function calls. These function calls all happen to be in-linable functions, and it happens that in each function, the address operator is always canceled out by a '*' dereference operator - in other words, we have '*&p', which the compiler can turn into just plain 'p' when the calls are in-lined, eliminating the need to actually take the address of 'p'. A compiler is always free to ignore 'register' declarations *anyway*, even if enregistration is possible. Therefore a warning that it's not possible to obey 'register' is unnecessary, because it's explicit in the language definition that 'register' is not binding. It simply is not possible for an ignored 'register' attribute to cause unexpected behavior. Warnings really should only be generated for situations where it is likely that the programmer expects different behavior than the compiler will deliver; in the case of an ignored 'register' attribute, the programmer is *required* to expect that the attribute might be ignored, so a warning to this effect is superfluous. Now, I understand why they generate the warning - it's because the compiler believes that the program code itself makes enregistration impossible, not because the compiler has chosen for optimization purposes to ignore the 'register' request. However, as we'll see shortly, the program code doesn't truly make enregistration impossible; it is merely impossible in some interpretations of the code. Therefore we really are back to the compiler choosing to ignore the 'register' request due to its own optimization decisions; the 'register' request is made impossible far downstream of the actual decisions that the compiler makes (which have to do with in-line vs out-of-line calls), but it really is compiler decisions that make it impossible, not the inherent structure of the code. When a function is in-lined, the compiler is not required to generate the same code it would generate for the most general case of the same function call, as long as the meaning is the same. For example, suppose we have some code that contains a call to a function like so: a = myFunc(a + 7, 3); In the general out-of-line case, the compiler must generate some machine-code instructions like this: push #3 mov [a], d0 add #7, d0 push d0 call #myFunc mov d0, [a] The compiler doesn't have access to the inner workings of myFunc, so it must generate the appropriate code for the generic interface to an external function. Now, suppose the function is defined like so: int myFunc(int a, int b) { return a - 6; } and further suppose that the compiler decides to in-line this function. In-lining means the compiler will generate the code that implements the function directly in the caller; there will be no call to an external linkage point. This means the compiler can implement the linkage to the function with a custom one-off interface for this particular invocation - every in-line invocation can be customized to the exact context where it appears. So, for example, if we call myFunc right now and registers d1 and d2 happens to be available, we can put the parameters in d1 and d2, and the generated function will refer to those registers for the parameters rather than having to look in the stack. Later on, if we generate a separate call to the same function, but registers d3 and d7 are the ones available, we can use those instead. Each generated copy of the function can fit its exact context. Furthermore, looking at this function and at the arguments passed, we can see that the formal parameter 'b' has no effect on the function's results, and the actual parameter '3' passed for 'b' has no side effects. Therefore, the compiler is free to completely ignore this parameter - there's no need to generate any code for it at all, since we have sufficient knowledge to see that it has no effect on the meaning of the code. Further still, we can globally optimize the entire function. So, we can see that myFunc(a+7, 3) is going to turn into the expression (a+7-6). We can fold constants to arrive at (a+1) as the result of the function. We can therefore generate the entire code for the function's invocation like so: inc [a] Okay, now let's look at the &p case. In the specific examples in vmrun.cpp, we have a bunch of function invocations like this: register const char *p; int x = myfunc(&p); In the most general case, we have to generate code like this: lea [p], d0 ; load effective address push d0 call #myfunc mov d0, [x] So, in the most general case of a call with external linkage, we need 'p' to have a main memory address so that we can push it on the stack as the parameter to this call. Registers don't have main memory addresses, so 'p' can't go in a register. However, we know what myfunc() looks like: char myfunc(const char **p) { char c = **p; *p += 1; return c; } If the compiler chooses to in-line this function, it can globally optimize its linkage and implementation as we saw earlier. So, the compiler can rewrite the code like so: register const char *p; int x = **(&p); *(&p) += 1; which can be further rewritten to: register const char *p; int x = *p; p += 1; Now we can generate the machine code for the final optimized form: mov [p], a0 ; get the *value* of p into index register 0 mov.byte [a0+0], d0 ; get the value index register 0 points to mov.byte d0, [x] ; store it in x inc [p] ; inc the value of p do we need a main memory address for p. This means the compiler can keep p in a register, say d5: mov d5, a0 mov.byte [a0+0], d0 mov.byte d0, [x] inc d5 And this is indeed exactly what the code that comes out of most compilers looks like (changed from my abstract machine to 32-bit x86, of course). So: if the compiler chooses to in-line the functions that are called with '&p' as a parameter, and the compiler performs the available optimizations on those calls once they're in-lined, then a memory address for 'p' is never needed. Thus there is a valid interpretation of the code where 'register p' can be obeyed. If the compiler doesn't choose to in-line the functions or make those optimizations, then the compiler will be unable to satisfy the 'register p' request and will be forced to put 'p' in addressable main memory. But it really is entirely up to the compiler whether to obey the 'register p' request; the program's structure does not make the request impossible to satisfy. Therefore there is no reason for the compiler to warn about this, any more than there would be if the compiler chose not to obey the 'register p' simply because it thought it could make more optimal use of the available registers. That GCC warns is understandable, in that a superficial reading of the code would not reveal the optimization opportunity; but the warning is nonetheless unnecessary, and the 'register' does provide useful optimization hinting. OK, long read, but the the conclusion is that "fixing the code to not declare the variable as register would be the correct thing to do" it *not* the correct thing to do. The correct thing to do is to ignore the warning, which is not possible if warnings are turned into errors. You also mentioned that "other compilers will issue a hard error for it." That sounds rather strange, and I wonder which compilers that might be; someone should file a bug report against them ;)


Subject Author
Re: [gentoo-dev] Re: FYI: Rules for distro-friendly packages "Harald van Dijk" <truedfx@g.o>