1 |
Hello, |
2 |
|
3 |
This is the weekly progress report, |
4 |
|
5 |
Last week I was trying to implement dlopen, dlclose, and dlsym, and |
6 |
they worked as I expected for most of the cases, however there seemed |
7 |
to be a complicated bug that only appeared in complex projects. That |
8 |
bug appears in the form of memory corruptions at random places. The |
9 |
problem here is that rust should in theory guarantee that this would |
10 |
never happen unless you are doing unsafe code, even more so, the place |
11 |
where I could observe this bug was safe code. |
12 |
|
13 |
Specifically that was the snipped that puzzled me: |
14 |
|
15 |
let mut data = Vec::new(); let mut file = File::open(&path_c, |
16 |
flags).map_err(/*irrelevant*/)?; file.read_to_end(&mut |
17 |
data).map_err(/*irrelevant*/)?; println!("load_recursive: name: {} |
18 |
\"{}\"", (name).as_ptr() as usize, name); file.read_to_end(&mut |
19 |
data).map_err(/*irrelevant*/); println!("load_recursive: name: {} |
20 |
\"{}\"", (name).as_ptr() as usize, name); |
21 |
|
22 |
and the output looked something like that |
23 |
load_recursive: name: 93869686340272 |
24 |
"/home/oddcoder/sources/gcc/build/./gcc/liblto_plugin.so" |
25 |
load_recursive: name: 93869686340272 "�o �o�o" |
26 |
|
27 |
that code above is part of dlopen subroutine invocation. |
28 |
|
29 |
At this stage I got stoned for little while with absolutely no idea |
30 |
what to do, then I tried to write my own program that is statically |
31 |
linked against that toy binary so I can observe what really happens. |
32 |
|
33 |
so I did that with this toy program |
34 |
//x86_64-lfs-linux-relibc-gcc code1.c -o code1 -llto_plugin |
35 |
#include <stdio.h> |
36 |
#include <unistd.h> |
37 |
int main() { |
38 |
printf("hello world\n"); |
39 |
} |
40 |
And it worked without crashing or anything so at this point I knew |
41 |
that the problem is not with dynamic linking at all. |
42 |
|
43 |
So I created attempt 2 |
44 |
//x86_64-lfs-linux-relibc-gcc crasher.c -o crasher |
45 |
#include <dlfcn.h> |
46 |
int main() { |
47 |
void *lib1 = |
48 |
dlopen("/home/oddcoder/sources/gcc/build/./gcc/liblto_plugin.so", 0); |
49 |
printf("Hello world\n"); |
50 |
} |
51 |
|
52 |
And this program did segfault at stdlib::exit in a way that makes me |
53 |
believe that is memory corruption, |
54 |
The thing is the difference between dlopen and code base used by ld.so |
55 |
is basically nothing, in fact it is the exact same code. |
56 |
The only difference is that dlopen is usually called after libc.so is loaded. |
57 |
|
58 |
My mentor (Luca Barbato) suggested using Rust GlobalAlloc to trace |
59 |
memory allocations and see where the corruption is happening. Although |
60 |
I understood the general idea, I still didn't know how to use it. |
61 |
However, I found it used in relibc, not to trace memory allocation but |
62 |
to implement relibc's custom memory allocator. So I thought maybe the |
63 |
issue is there. |
64 |
|
65 |
Under the hood for linux, Relibc is using dlmalloc with mmap disabled. |
66 |
which means that all the memory allocations are done via brk() |
67 |
syscall. |
68 |
|
69 |
That made me believe that perhaps this is the root cause of the issue. |
70 |
I also Learned that there is a global data structure used to do |
71 |
bookkeeping and maintain a list of free memory chunks, I assumed that |
72 |
if 2 instances of dlmalloc are running on the same heap, they would |
73 |
probably corrupt it. |
74 |
|
75 |
So to verify my assumption, I disabled normal malloc, and used mspaces |
76 |
in dlmalloc which is basically a way for sharing the book keeping data |
77 |
structure. |
78 |
|
79 |
After being half implemented. (ld.so has it's own bookkeeping |
80 |
structure which is isolated from relibc) the crasher no longer crashes |
81 |
for some reason. But still full implementation would share the |
82 |
bookkeeping structure between ld.so and relibc. |
83 |
(note that the issue was that the memory was shared but each of them |
84 |
maintaining different list of free/used chunk on the same memory, |
85 |
however what would be ideal is that they would share the memory and |
86 |
the list of used/freed memory) |
87 |
|
88 |
I tried it on the real gdb code base and unfortunately it crashed but |
89 |
due to completely different reasons (early crash). I did in fact skip |
90 |
multiple details, I was inserting printlns everywhere and hacking |
91 |
through the code trying to just get it working. There were things (for |
92 |
example : mmap ) disabled in dlmalloc but was required for |
93 |
for mspaces to work. |
94 |
|
95 |
Now that I know that the basic idea worked for the memory corruption |
96 |
proof of concept, I would just git reset --hard and try to implement |
97 |
the fix as clean as possible. |
98 |
|
99 |
What slightly worries me is that I could be totally unlucky and the |
100 |
crash in gcc and segfault in the POC I created end up being completely |
101 |
different bugs, and I will have to restart wondering what is causing |
102 |
memory corruptions. |
103 |
|
104 |
|
105 |
Thanks. |
106 |
Ahmed. |