Gentoo Archives: gentoo-commits

From: "David Abbott (dabbott)" <dabbott@g.o>
To: gentoo-commits@l.g.o
Subject: [gentoo-commits] gentoo commit in xml/htdocs/proj/en/pr: 20100401-andrzej-interview.xml
Date: Thu, 01 Apr 2010 19:32:59
Message-Id: E1NxQ8M-00015G-4B@stork.gentoo.org
1 dabbott 10/04/01 19:32:54
2
3 Added: 20100401-andrzej-interview.xml
4 Log:
5 Andrzej Wasylkowski interview
6
7 Revision Changes Path
8 1.1 xml/htdocs/proj/en/pr/20100401-andrzej-interview.xml
9
10 file : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/proj/en/pr/20100401-andrzej-interview.xml?rev=1.1&view=markup
11 plain: http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/proj/en/pr/20100401-andrzej-interview.xml?rev=1.1&content-type=text/plain
12
13 Index: 20100401-andrzej-interview.xml
14 ===================================================================
15 <?xml version='1.0'?>
16
17 <!DOCTYPE news SYSTEM "/dtd/guide.dtd">
18
19 <news gentoo="yes" category="gentoo">
20
21 <!-- Enter your name here -->
22 <poster>dabbott</poster>
23
24 <!-- Date to be displayed -->
25 <date>2010-04-01</date>
26 <title>Interview with Andrzej Wasylkowski from the checkmycode project.</title>
27
28 <body>
29
30 <p>
31 <b>Hi Andrzej, thanks for the interview.</b>
32 </p>
33
34 <p>
35 Hi! From my side, I would also like to thank for the interview. It is a real
36 pleasure to be your virtual guest :)
37 </p>
38
39 <p>
40 <b>1. I see on the web page for <uri
41 link="http://www.checkmycode.org/">http://www.checkmycode.org/</uri> I
42 enter into a form my C code and it generates a report of anomalies found in
43 my code with an explanation of why parts of my code are considered anomalous
44 and therefore possibly buggy. Walk me through what goes on in the background
45 to produce this?</b>
46 </p>
47
48 <p>
49 In a nutshell, we have mined all of the Gentoo Linux distribution for typical
50 usage of component interfaces -- that is, how Linux components are normally
51 used. If you use an interface in an uncommon way, this will be flagged as an
52 anomaly.
53 </p>
54
55 <p>
56 From a high-level point of view, there are three main steps involved. First, the
57 code you submit gets parsed and so-called "sequential constraints" are generated
58 from it. These are two-element sequences of function calls annotated with data
59 flow information, such as "retval of socket() -> 1st arg of listen()". They
60 represent, in an abstract way, how your code uses functions to operate on
61 "objects".
62 </p>
63
64 <p>
65 Second, we look at all possibly relevant C projects from a Gentoo Linux
66 distribution to see how these projects use the functions that your code uses,
67 too. Without going into too much detail, if you call "socket()", code from all
68 projects that also calls "socket()" is going to be taken into consideration to
69 find sequential constraints where "socket()" is present.
70 </p>
71
72 <p>
73 Third, we check if your code violates any of the patterns found in the second
74 step (you can find sample violations in the <uri
75 link="http://www.checkmycode.org/index.php?action=tutorial">tutorial</uri>.
76 Any violations found will be reported to you by the website interface.
77 </p>
78
79 <p>
80 In reality there is a lot more going on, but the high level picture is as
81 described above.
82 </p>
83
84 <p>
85 <b>2. When did the project get started and why?</b>
86 </p>
87
88 <p>
89 Today, we have several highly sophisticated techniques for checking code. What's
90 missing is the specification to check against. So we wanted to learn these
91 specifications from existing bodies of code. The project grew gradually and it
92 is hard to pinpoint exact starting date. We started with a lightweight parser
93 that was written by my student, <uri
94 link="http://www.st.cs.uni-saarland.de/publications/details/gruska-tr-2010/">Natalie
95 Gruska</uri>, as part of her Bachelor's thesis. The parser was finished in
96 July 2009, but the original intention had nothing to do with analysing lots of
97 source code. We just wanted to create a language-independent front-end for one
98 of my tools, <uri
99 link="http://www.st.cs.uni-saarland.de/models/jadet/">JADET</uri>. It turned
100 out that the parser was very fast, and shortly afterwards <uri
101 link="http://www.st.cs.uni-saarland.de/zeller/">Prof. Andreas Zeller</uri>
102 came up with the idea of analysing lots of source code with its help. The
103 remaining several months until creating a web service was a lot of hard work on
104 my part to actually make it all work and scale to the size of a Linux
105 distribution.
106 </p>
107
108 <p>
109 <b>3. Who are some of the other people involved?</b>
110 </p>
111
112 <p>
113 The parser that is used by the website was written by Natalie Gruska, who is
114 currently a student at Queen's University in Canada. The original idea comes
115 from my supervisor, Prof. Andreas Zeller. The web interface and the web
116 programming was done by a colleague of mine, <uri
117 link="http://www.st.cs.uni-saarland.de/~streit/">Kevin Streit</uri>, who is,
118 like me, a PhD student at Saarland University in Germany.
119 </p>
120
121 <p>
122 <b>4. How is Gentoo involved in the project?</b>
123 </p>
124
125 <p>
126 All the source code that the analysis uses to find patterns comes from the
127 Gentoo distribution (i.e., the snippet you submit gets compared to the source
128 code of projects coming from the Gentoo distribution).
129 </p>
130
131 <p>
132 <b>5. Why was it chosen?</b>
133 </p>
134
135 <p>
136 It gives us access to source code for all the projects in the distribution. This
137 in turn allows us to use our lightweight parser to find the way functions are
138 being used by those projects.
139 </p>
140
141 <p>
142 <b>6. What are the hardest parts of using Gentoo?</b>
143 </p>
144
145 <p>
146 There are none :) Using Gentoo was a piece of cake, really, and by far the
147 easiest part of the whole project.
148 </p>
149
150 <p>
151 <b>7. What would you change in Gentoo to make it easier to use for your
152 project?</b>
153 </p>
154
155 <p>
156 One thing that I wish I had access to, but did not have (or maybe simply could
157 not find it) was a web interface to the source code trees of all the projects.
158 Whenever a violation is found, the user also gets three examples of where the
159 "correct" code can be found. We provide a web interface for this, but before
160 there was www.checkmycode.org, I had to manually extract projects' files and it
161 was quite tedious.
162 </p>
163
164 <p>
165 <b>8. What do you like about Gentoo?</b>
166 </p>
167
168 <p>
169 I like the fact that portage is quite easy to use, and that Gentoo uses a
170 rolling release approach. Anyone who has ever used a non-rolling release Linux
171 distribution and bumped into unsolvable version conflicts while trying to use
172 the newest available version of some package knows what I am talking about.
173 </p>
174
175 <p>
176 Also, for obvious reasons, I like the fact that I have access to the source code
177 :) As a matter of fact, the machine that hosts the website runs on Gentoo.
178 </p>
179
180 <p>
181 <b>9. Thanks again for taking the time to discuss your checkmycode tool. Do
182 you have any further remarks?</b>
183 </p>
184
185 <p>
186 Thank you for asking me all those questions; it was a real pleasure! I would
187 just like to point out that <uri
188 link="http://www.checkmycode.org/">www.checkmycode.org</uri> is just a small
189 interface to a tool that is able to handle whole programs of quite large sizes
190 and detect violations in them. Therefore, we put a lot of stress on making the
191 tool able to filter what it thinks are false alarms and this significantly
192 reduces the number of violations found. Unfortunately, the side-effect is that
193 some real errors related to incorrect functions' usage will not be detected. So,
194 to paraphrase Edsger W. Dijkstra, the tool can only show the presence, not the
195 absence of potential problem locations in your code.
196 </p>
197
198 </body>
199
200 </news>