1 |
dabbott 10/04/01 19:32:54 |
2 |
|
3 |
Added: 20100401-andrzej-interview.xml |
4 |
Log: |
5 |
Andrzej Wasylkowski interview |
6 |
|
7 |
Revision Changes Path |
8 |
1.1 xml/htdocs/proj/en/pr/20100401-andrzej-interview.xml |
9 |
|
10 |
file : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/proj/en/pr/20100401-andrzej-interview.xml?rev=1.1&view=markup |
11 |
plain: http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/proj/en/pr/20100401-andrzej-interview.xml?rev=1.1&content-type=text/plain |
12 |
|
13 |
Index: 20100401-andrzej-interview.xml |
14 |
=================================================================== |
15 |
<?xml version='1.0'?> |
16 |
|
17 |
<!DOCTYPE news SYSTEM "/dtd/guide.dtd"> |
18 |
|
19 |
<news gentoo="yes" category="gentoo"> |
20 |
|
21 |
<!-- Enter your name here --> |
22 |
<poster>dabbott</poster> |
23 |
|
24 |
<!-- Date to be displayed --> |
25 |
<date>2010-04-01</date> |
26 |
<title>Interview with Andrzej Wasylkowski from the checkmycode project.</title> |
27 |
|
28 |
<body> |
29 |
|
30 |
<p> |
31 |
<b>Hi Andrzej, thanks for the interview.</b> |
32 |
</p> |
33 |
|
34 |
<p> |
35 |
Hi! From my side, I would also like to thank for the interview. It is a real |
36 |
pleasure to be your virtual guest :) |
37 |
</p> |
38 |
|
39 |
<p> |
40 |
<b>1. I see on the web page for <uri |
41 |
link="http://www.checkmycode.org/">http://www.checkmycode.org/</uri> I |
42 |
enter into a form my C code and it generates a report of anomalies found in |
43 |
my code with an explanation of why parts of my code are considered anomalous |
44 |
and therefore possibly buggy. Walk me through what goes on in the background |
45 |
to produce this?</b> |
46 |
</p> |
47 |
|
48 |
<p> |
49 |
In a nutshell, we have mined all of the Gentoo Linux distribution for typical |
50 |
usage of component interfaces -- that is, how Linux components are normally |
51 |
used. If you use an interface in an uncommon way, this will be flagged as an |
52 |
anomaly. |
53 |
</p> |
54 |
|
55 |
<p> |
56 |
From a high-level point of view, there are three main steps involved. First, the |
57 |
code you submit gets parsed and so-called "sequential constraints" are generated |
58 |
from it. These are two-element sequences of function calls annotated with data |
59 |
flow information, such as "retval of socket() -> 1st arg of listen()". They |
60 |
represent, in an abstract way, how your code uses functions to operate on |
61 |
"objects". |
62 |
</p> |
63 |
|
64 |
<p> |
65 |
Second, we look at all possibly relevant C projects from a Gentoo Linux |
66 |
distribution to see how these projects use the functions that your code uses, |
67 |
too. Without going into too much detail, if you call "socket()", code from all |
68 |
projects that also calls "socket()" is going to be taken into consideration to |
69 |
find sequential constraints where "socket()" is present. |
70 |
</p> |
71 |
|
72 |
<p> |
73 |
Third, we check if your code violates any of the patterns found in the second |
74 |
step (you can find sample violations in the <uri |
75 |
link="http://www.checkmycode.org/index.php?action=tutorial">tutorial</uri>. |
76 |
Any violations found will be reported to you by the website interface. |
77 |
</p> |
78 |
|
79 |
<p> |
80 |
In reality there is a lot more going on, but the high level picture is as |
81 |
described above. |
82 |
</p> |
83 |
|
84 |
<p> |
85 |
<b>2. When did the project get started and why?</b> |
86 |
</p> |
87 |
|
88 |
<p> |
89 |
Today, we have several highly sophisticated techniques for checking code. What's |
90 |
missing is the specification to check against. So we wanted to learn these |
91 |
specifications from existing bodies of code. The project grew gradually and it |
92 |
is hard to pinpoint exact starting date. We started with a lightweight parser |
93 |
that was written by my student, <uri |
94 |
link="http://www.st.cs.uni-saarland.de/publications/details/gruska-tr-2010/">Natalie |
95 |
Gruska</uri>, as part of her Bachelor's thesis. The parser was finished in |
96 |
July 2009, but the original intention had nothing to do with analysing lots of |
97 |
source code. We just wanted to create a language-independent front-end for one |
98 |
of my tools, <uri |
99 |
link="http://www.st.cs.uni-saarland.de/models/jadet/">JADET</uri>. It turned |
100 |
out that the parser was very fast, and shortly afterwards <uri |
101 |
link="http://www.st.cs.uni-saarland.de/zeller/">Prof. Andreas Zeller</uri> |
102 |
came up with the idea of analysing lots of source code with its help. The |
103 |
remaining several months until creating a web service was a lot of hard work on |
104 |
my part to actually make it all work and scale to the size of a Linux |
105 |
distribution. |
106 |
</p> |
107 |
|
108 |
<p> |
109 |
<b>3. Who are some of the other people involved?</b> |
110 |
</p> |
111 |
|
112 |
<p> |
113 |
The parser that is used by the website was written by Natalie Gruska, who is |
114 |
currently a student at Queen's University in Canada. The original idea comes |
115 |
from my supervisor, Prof. Andreas Zeller. The web interface and the web |
116 |
programming was done by a colleague of mine, <uri |
117 |
link="http://www.st.cs.uni-saarland.de/~streit/">Kevin Streit</uri>, who is, |
118 |
like me, a PhD student at Saarland University in Germany. |
119 |
</p> |
120 |
|
121 |
<p> |
122 |
<b>4. How is Gentoo involved in the project?</b> |
123 |
</p> |
124 |
|
125 |
<p> |
126 |
All the source code that the analysis uses to find patterns comes from the |
127 |
Gentoo distribution (i.e., the snippet you submit gets compared to the source |
128 |
code of projects coming from the Gentoo distribution). |
129 |
</p> |
130 |
|
131 |
<p> |
132 |
<b>5. Why was it chosen?</b> |
133 |
</p> |
134 |
|
135 |
<p> |
136 |
It gives us access to source code for all the projects in the distribution. This |
137 |
in turn allows us to use our lightweight parser to find the way functions are |
138 |
being used by those projects. |
139 |
</p> |
140 |
|
141 |
<p> |
142 |
<b>6. What are the hardest parts of using Gentoo?</b> |
143 |
</p> |
144 |
|
145 |
<p> |
146 |
There are none :) Using Gentoo was a piece of cake, really, and by far the |
147 |
easiest part of the whole project. |
148 |
</p> |
149 |
|
150 |
<p> |
151 |
<b>7. What would you change in Gentoo to make it easier to use for your |
152 |
project?</b> |
153 |
</p> |
154 |
|
155 |
<p> |
156 |
One thing that I wish I had access to, but did not have (or maybe simply could |
157 |
not find it) was a web interface to the source code trees of all the projects. |
158 |
Whenever a violation is found, the user also gets three examples of where the |
159 |
"correct" code can be found. We provide a web interface for this, but before |
160 |
there was www.checkmycode.org, I had to manually extract projects' files and it |
161 |
was quite tedious. |
162 |
</p> |
163 |
|
164 |
<p> |
165 |
<b>8. What do you like about Gentoo?</b> |
166 |
</p> |
167 |
|
168 |
<p> |
169 |
I like the fact that portage is quite easy to use, and that Gentoo uses a |
170 |
rolling release approach. Anyone who has ever used a non-rolling release Linux |
171 |
distribution and bumped into unsolvable version conflicts while trying to use |
172 |
the newest available version of some package knows what I am talking about. |
173 |
</p> |
174 |
|
175 |
<p> |
176 |
Also, for obvious reasons, I like the fact that I have access to the source code |
177 |
:) As a matter of fact, the machine that hosts the website runs on Gentoo. |
178 |
</p> |
179 |
|
180 |
<p> |
181 |
<b>9. Thanks again for taking the time to discuss your checkmycode tool. Do |
182 |
you have any further remarks?</b> |
183 |
</p> |
184 |
|
185 |
<p> |
186 |
Thank you for asking me all those questions; it was a real pleasure! I would |
187 |
just like to point out that <uri |
188 |
link="http://www.checkmycode.org/">www.checkmycode.org</uri> is just a small |
189 |
interface to a tool that is able to handle whole programs of quite large sizes |
190 |
and detect violations in them. Therefore, we put a lot of stress on making the |
191 |
tool able to filter what it thinks are false alarms and this significantly |
192 |
reduces the number of violations found. Unfortunately, the side-effect is that |
193 |
some real errors related to incorrect functions' usage will not be detected. So, |
194 |
to paraphrase Edsger W. Dijkstra, the tool can only show the presence, not the |
195 |
absence of potential problem locations in your code. |
196 |
</p> |
197 |
|
198 |
</body> |
199 |
|
200 |
</news> |