Gentoo Archives: gentoo-portage-dev

From: Paul de Vrieze <pauldv@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] DB and binary dependency
Date: Mon, 27 Mar 2006 07:51:45
Message-Id: 200603270951.17133.pauldv@gentoo.org
In Reply to: Re: [gentoo-portage-dev] DB and binary dependency by tvali
1 On Friday 24 March 2006 15:24, tvali wrote:
2 > > Unfortunately, your wrong. This only makes sure that you have the
3 > > right slots to put your squares, triangles and circles in. It does
4 > > not say that b(int,int) from the first lib actually does the same
5 > > thing as b(int,int) from the second library.
6 >
7 > Sorry i thought about it for a moment, then left out later as broken
8 > design cases didnt seems so important. Versioning of distinct
9 > functions would be good, anyway not automatic.
10
11 In many cases these things are not broken design. My example is, to make
12 it clear. Unfortunately at the moment you do automatic dependencies, you
13 MUST be correct between 99.99999% and 100%. These cases are too common to
14 disregard. Changing semantics are common enough. Sometimes the old
15 semantics are just broken.
16
17 > > While the above example is clearly broken design, this does happen
18 > > enough in actual libraries in way more subtle ways. And that is
19 > > disregarding the fact that the linux/elf ABI does not include
20 > > argument lists in symbol linking. As such b(int,char) is
21 > > indistinguishable from b(int,int). To overcome this C++ uses name
22 > > mangling which creates names based on the signature.
23 >
24 > Ok, assuming broken design it's harder. Anyway, how would binary deps
25 > solve this? This seems to be humans work in all cases.
26
27 We don't look at what packages do. We take a default heuristic, and let
28 humans give the correct answer when the default is wrong.
29
30 > Anyway, as .h files have become as well as standard, you may show in
31 > code as well as in bin, which functions are needed.
32
33 Except that the way header files are used is not standard. You know there
34 is a reason for the way gcc handles precompiled header files. That is
35 because of the way they work. Header files can have unpredictable
36 interactions. Including A then B can be really really different from
37 including B then A. While in most cases it isn't, it is in too many
38 cases.
39
40 > Is there any advantage of bin dep over code dep is a question for me
41 > ..some here suggest that there is, i think that everything can be done
42 > as well with code deps.
43
44 bin dep is not about how to get the dependencies. Binary dependencies
45 specify what the dependencies have become after compilation. Binary
46 dependencies basically strengthen the runtime dependencies, by taking the
47 concrete dependencies used into account.
48
49 With binary dependencies comes the need to verify them. This is because
50 too many ebuilds do not use the version of a multislotted library that is
51 expected. As such verifying the dependencies of a binary package gives
52 the opportunity for developers to signal and fix this.
53
54 >
55 > > > I'm actually sure that this all can be calculated up from
56 > > > sourcecode and bindep would be after that a check if cpu didnt
57 > > > calculate something wrong :P Another question, how difficult it is
58 > > > and is it worth the time.
59 > >
60 > > Perhaps you should read up your knowledge of the C language. After
61 > > you found that the C language is a mess, try C++, it makes things
62 > > worse. After that's finished take a look at solving this problem for
63 > > ALL languages.
64 >
65 > I actually know c++, but dont know c. Anyway, what i mean is that if
66 > you have .h files, you may make up automatic check if interfaces fit
67 > each other. I have to think about if c++ is a mess :)
68
69 C++ is every bit as much a mess as C as it is a superset. The problem is
70 actually partly in the C language and partly in the preprocessor. First
71 the preprocessor is fragile, and can depend on arbitrary command line
72 options (Defines). The language problem is that C/C++ has no notion of
73 modules. file a.c can define a function as "int a(char)" and b.c can have
74 its prototype as "int a(wchar)". The compiler will not complain about
75 this, AND with the right defines it IS actually the same.
76
77 The fundamental problem is that the compiler only knows about symbols that
78 may be imported from other object files. It does not look at the actual
79 object to look at the prototypes of the symbols defined there (nor do .o
80 files outputted by gcc contain this information). Instead header files
81 can be used to share this information. Often they do this in a broken way
82 and errors similar to the above exist.
83
84 >
85 > > Automatic source analysis for dependency calculation is a dead end.
86 > > Even if you manage to find the proper interfaces (oops, the package
87 > > had it's own gl.h instead of using the system one), you don't know
88 > > anything about the semantics of those interfaces. Two things with the
89 > > same name may very well have very different behaviours.
90 >
91 > I think it should be solved with:
92 > // system's header
93 > #include <...>
94 > // own header
95 > #include "..."
96
97 It's not your code, you can't change it. And including two headers which
98 define the same symbols can be disastrous.
99
100 > Ok -- i agree that there are several problems, which have to be solved
101 > before taking this topic up again and that they are possibly
102 > unsolvable in many cases and possibly too much work in many others.
103
104 The thing is,
105
106 with reasonable effort you are probably going to be able to make something
107 that works in 95% of the cases of C and C++ programs. This however is not
108 enough to base automatic dependencies on. You need something a lot more
109 reliable.
110
111 For verification (instead of automatic deps) 95% is however acceptable.
112 The problem stays however that many of the packages in the tree are not C
113 or C++ based (or worse, mixes). Analysis of binary files is something
114 that works on all elf objects, regardless of their language or compiler.
115 The results of analysing are roughly the same as the result of source
116 analysis. It is "A LOT" easier to do though as there are tools, and you
117 don't need to parse at all. It's also faster to do, as the ELF format is
118 highly structured. Additionally one could do a fast scan of "#include"
119 headers of all installed header files.
120
121 These things are all not done yet, and quite cheap to do (both in time for
122 running, as in development time). This verification will probably yield a
123 better coverage than source analysis will. It is also closer to what we
124 are interested in.
125
126 Paul
127
128 --
129 Paul de Vrieze
130 Gentoo Developer
131 Mail: pauldv@g.o
132 Homepage: http://www.devrieze.net