Gentoo Archives: gentoo-soc

From: Ahmed Soliman <ahmedsoliman0x666@×××××.com>
To: gentoo-soc@l.g.o
Cc: Luca Barbato <lu_zero@g.o>
Subject: [gentoo-soc] Weekly Progress: Porting relibc
Date: Sun, 12 Jul 2020 22:58:40
Message-Id: CAAGnT3Y3vu1C+_REJ9Y_PEdp7qGjO45qViBSSnOJt75ao_9yOQ@mail.gmail.com
1 Hello,
2
3 This is the weekly progress report,
4
5 Last week I was trying to implement dlopen, dlclose, and dlsym, and
6 they worked as I expected for most of the cases, however there seemed
7 to be a complicated bug that only appeared in complex projects. That
8 bug appears in the form of memory corruptions at random places. The
9 problem here is that rust should in theory guarantee that this would
10 never happen unless you are doing unsafe code, even more so, the place
11 where I could observe this bug was safe code.
12
13 Specifically that was the snipped that puzzled me:
14
15 let mut data = Vec::new(); let mut file = File::open(&path_c,
16 flags).map_err(/*irrelevant*/)?; file.read_to_end(&mut
17 data).map_err(/*irrelevant*/)?; println!("load_recursive: name: {}
18 \"{}\"", (name).as_ptr() as usize, name); file.read_to_end(&mut
19 data).map_err(/*irrelevant*/); println!("load_recursive: name: {}
20 \"{}\"", (name).as_ptr() as usize, name);
21
22 and the output looked something like that
23 load_recursive: name: 93869686340272
24 "/home/oddcoder/sources/gcc/build/./gcc/liblto_plugin.so"
25 load_recursive: name: 93869686340272 "�o �o�o"
26
27 that code above is part of dlopen subroutine invocation.
28
29 At this stage I got stoned for little while with absolutely no idea
30 what to do, then I tried to write my own program that is statically
31 linked against that toy binary so I can observe what really happens.
32
33 so I did that with this toy program
34 //x86_64-lfs-linux-relibc-gcc code1.c -o code1 -llto_plugin
35 #include <stdio.h>
36 #include <unistd.h>
37 int main() {
38 printf("hello world\n");
39 }
40 And it worked without crashing or anything so at this point I knew
41 that the problem is not with dynamic linking at all.
42
43 So I created attempt 2
44 //x86_64-lfs-linux-relibc-gcc crasher.c -o crasher
45 #include <dlfcn.h>
46 int main() {
47 void *lib1 =
48 dlopen("/home/oddcoder/sources/gcc/build/./gcc/liblto_plugin.so", 0);
49 printf("Hello world\n");
50 }
51
52 And this program did segfault at stdlib::exit in a way that makes me
53 believe that is memory corruption,
54 The thing is the difference between dlopen and code base used by ld.so
55 is basically nothing, in fact it is the exact same code.
56 The only difference is that dlopen is usually called after libc.so is loaded.
57
58 My mentor (Luca Barbato) suggested using Rust GlobalAlloc to trace
59 memory allocations and see where the corruption is happening. Although
60 I understood the general idea, I still didn't know how to use it.
61 However, I found it used in relibc, not to trace memory allocation but
62 to implement relibc's custom memory allocator. So I thought maybe the
63 issue is there.
64
65 Under the hood for linux, Relibc is using dlmalloc with mmap disabled.
66 which means that all the memory allocations are done via brk()
67 syscall.
68
69 That made me believe that perhaps this is the root cause of the issue.
70 I also Learned that there is a global data structure used to do
71 bookkeeping and maintain a list of free memory chunks, I assumed that
72 if 2 instances of dlmalloc are running on the same heap, they would
73 probably corrupt it.
74
75 So to verify my assumption, I disabled normal malloc, and used mspaces
76 in dlmalloc which is basically a way for sharing the book keeping data
77 structure.
78
79 After being half implemented. (ld.so has it's own bookkeeping
80 structure which is isolated from relibc) the crasher no longer crashes
81 for some reason. But still full implementation would share the
82 bookkeeping structure between ld.so and relibc.
83 (note that the issue was that the memory was shared but each of them
84 maintaining different list of free/used chunk on the same memory,
85 however what would be ideal is that they would share the memory and
86 the list of used/freed memory)
87
88 I tried it on the real gdb code base and unfortunately it crashed but
89 due to completely different reasons (early crash). I did in fact skip
90 multiple details, I was inserting printlns everywhere and hacking
91 through the code trying to just get it working. There were things (for
92 example : mmap ) disabled in dlmalloc but was required for
93 for mspaces to work.
94
95 Now that I know that the basic idea worked for the memory corruption
96 proof of concept, I would just git reset --hard and try to implement
97 the fix as clean as possible.
98
99 What slightly worries me is that I could be totally unlucky and the
100 crash in gcc and segfault in the POC I created end up being completely
101 different bugs, and I will have to restart wondering what is causing
102 memory corruptions.
103
104
105 Thanks.
106 Ahmed.

Replies

Subject Author
Re: [gentoo-soc] Weekly Progress: Porting relibc Luca Barbato <lu_zero@g.o>