From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 2CCA4138359 for ; Sun, 12 Jul 2020 22:58:40 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 54E9AE0849; Sun, 12 Jul 2020 22:58:39 +0000 (UTC) Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 14EF7E0849 for ; Sun, 12 Jul 2020 22:58:38 +0000 (UTC) Received: by mail-pl1-x631.google.com with SMTP id t6so1476390plo.3 for ; Sun, 12 Jul 2020 15:58:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc :content-transfer-encoding; bh=pXnOEfZjqFHuDfxuDimX9YuoG4VnqSohkyKIMYOirj0=; b=BO2oyNmAasa8Ach2nijRKvV54GRoRfy2nhJ5bGlE3haS+DXWwb+StIJfCqTY+Ojbqk MiLy2/IXaVYg63bjP7Z3deqwjeqRVCHG/EKuxQlCXBk1nmVZsVH0JBssXFDWvw9hR7gk sR7ng9TqvMHxw2jK5N3FPH7KAOdwdMwX1PP0s32dUXOXuiIaOUxDTnXtPHJrrm13mWNn eoHwHqiy36cbbCPRocg0xZvm5oZwGjsYa/wKwmLwoeUKW+1W+WNKy3d4QuW7eLkZXsz1 jqBG9Mh0xPkLzaqHSEEg8MueoWbtUpPKqcU0PBWNCsDjgEyLpLnuvPBmTm1lf9Sk42yn QjgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc :content-transfer-encoding; bh=pXnOEfZjqFHuDfxuDimX9YuoG4VnqSohkyKIMYOirj0=; b=NZK4E0USNwtpSb7L9zmALHTMUicc7YVBzM5mZGqFHaMjBWbMEidVZwrWIWi9BkEyG8 w1x2DWJOdrNntvmrzqLrko3zMqYjSlhxViqMFyEV3V5Ffq1j806frqpTh4vQ+KsZzgRT yUQIyaUfheVMUMXj6pwYUdWBzPwElxBGajHQ+Air7/K9bAslLWoU++LdymA3u6HoX4vq 625rMD1drBWe0jFlAlsXs5lc5RJtY6/jc+NhPIPnszuEMro1oIGncz8/uyhKpgTvKt9h mBJEMpqB9icHKx424pE+8RuLXz4QAYZkDv0s0yqtBrBUsnmsW3bmumML/cxfV0FzpenW b6aw== X-Gm-Message-State: AOAM532UV7cCZ+3HbMkhWnIwWgEirswLtbJCEnB6u0g3GL+bEIw6A5iQ DYed/Dl+rk00aPwy7YJLQLLu12MTaM0WndN0haFaKYe3ZB0= X-Google-Smtp-Source: ABdhPJw1s5pu2/o1DhISJqNJohewQ0impfLaj2b3QFpJKxCROAItiJybIicXNxmw9WstLf5TCrrn2eDunDa0wMnKRMg= X-Received: by 2002:a17:902:6b43:: with SMTP id g3mr272127plt.319.1594594717708; Sun, 12 Jul 2020 15:58:37 -0700 (PDT) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-soc@lists.gentoo.org Reply-to: gentoo-soc@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 From: Ahmed Soliman Date: Mon, 13 Jul 2020 00:57:45 +0200 Message-ID: Subject: [gentoo-soc] Weekly Progress: Porting relibc To: gentoo-soc@lists.gentoo.org Cc: Luca Barbato Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Archives-Salt: c3d1aa99-2dc2-4991-bdb3-318d7dba6601 X-Archives-Hash: 4aea39025670e6bf2dccc364fb5e061a Hello, This is the weekly progress report, Last week I was trying to implement dlopen, dlclose, and dlsym, and they worked as I expected for most of the cases, however there seemed to be a complicated bug that only appeared in complex projects. That bug appears in the form of memory corruptions at random places. The problem here is that rust should in theory guarantee that this would never happen unless you are doing unsafe code, even more so, the place where I could observe this bug was safe code. Specifically that was the snipped that puzzled me: let mut data =3D Vec::new(); let mut file =3D File::open(&path_c, flags).map_err(/*irrelevant*/)?; file.read_to_end(&mut data).map_err(/*irrelevant*/)?; println!("load_recursive: name: {} \"{}\"", (name).as_ptr() as usize, name); file.read_to_end(&mut data).map_err(/*irrelevant*/); println!("load_recursive: name: {} \"{}\"", (name).as_ptr() as usize, name); and the output looked something like that load_recursive: name: 93869686340272 "/home/oddcoder/sources/gcc/build/./gcc/liblto_plugin.so" load_recursive: name: 93869686340272 "=EF=BF=BDo =EF=BF=BDo=EF=BF=BDo" that code above is part of dlopen subroutine invocation. At this stage I got stoned for little while with absolutely no idea what to do, then I tried to write my own program that is statically linked against that toy binary so I can observe what really happens. so I did that with this toy program //x86_64-lfs-linux-relibc-gcc code1.c -o code1 -llto_plugin #include #include int main() { printf("hello world\n"); } And it worked without crashing or anything so at this point I knew that the problem is not with dynamic linking at all. So I created attempt 2 //x86_64-lfs-linux-relibc-gcc crasher.c -o crasher #include int main() { void *lib1 =3D dlopen("/home/oddcoder/sources/gcc/build/./gcc/liblto_plugin.so", 0); printf("Hello world\n"); } And this program did segfault at stdlib::exit in a way that makes me believe that is memory corruption, The thing is the difference between dlopen and code base used by ld.so is basically nothing, in fact it is the exact same code. The only difference is that dlopen is usually called after libc.so is loade= d. My mentor (Luca Barbato) suggested using Rust GlobalAlloc to trace memory allocations and see where the corruption is happening. Although I understood the general idea, I still didn't know how to use it. However, I found it used in relibc, not to trace memory allocation but to implement relibc's custom memory allocator. So I thought maybe the issue is there. Under the hood for linux, Relibc is using dlmalloc with mmap disabled. which means that all the memory allocations are done via brk() syscall. That made me believe that perhaps this is the root cause of the issue. I also Learned that there is a global data structure used to do bookkeeping and maintain a list of free memory chunks, I assumed that if 2 instances of dlmalloc are running on the same heap, they would probably corrupt it. So to verify my assumption, I disabled normal malloc, and used mspaces in dlmalloc which is basically a way for sharing the book keeping data structure. After being half implemented. (ld.so has it's own bookkeeping structure which is isolated from relibc) the crasher no longer crashes for some reason. But still full implementation would share the bookkeeping structure between ld.so and relibc. (note that the issue was that the memory was shared but each of them maintaining different list of free/used chunk on the same memory, however what would be ideal is that they would share the memory and the list of used/freed memory) I tried it on the real gdb code base and unfortunately it crashed but due to completely different reasons (early crash). I did in fact skip multiple details, I was inserting printlns everywhere and hacking through the code trying to just get it working. There were things (for example : mmap ) disabled in dlmalloc but was required for for mspaces to work. Now that I know that the basic idea worked for the memory corruption proof of concept, I would just git reset --hard and try to implement the fix as clean as possible. What slightly worries me is that I could be totally unlucky and the crash in gcc and segfault in the POC I created end up being completely different bugs, and I will have to restart wondering what is causing memory corruptions. Thanks. Ahmed.