From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 34FB8158091 for ; Mon, 20 Jun 2022 13:40:53 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 50CDBE0A93; Mon, 20 Jun 2022 13:40:52 +0000 (UTC) Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 08489E0A93 for ; Mon, 20 Jun 2022 13:40:51 +0000 (UTC) Received: by mail-pg1-x530.google.com with SMTP id 68so4894689pgb.10 for ; Mon, 20 Jun 2022 06:40:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:subject:message-id:mime-version:content-disposition :content-transfer-encoding; bh=pu+hWCeRbeBxZbW9lOX4/tpy36ozC6RAYJYDV4ZENBg=; b=cjt1NbjaJEwSQDJL7THvTkj1Mp8u0fAGm49LAEpmPAaaC0nTgKhhSW34uDRneF1hpM kZyY1+WFDg69JnuWnORcB1Co4VnnhYPND0Y+G8qXHnXw+4YoPcZFQmrt+nljBwslaCgP expUz5T1kRi019+HPyjeRnVcSc2Oms4hJb6LAty1BzFryqoekw/XfNsvBKHzT6/izdfE IdVa5q2zfC1Gg3cVPBn5f+4vWqShU2Iw4Y3+Dj2rik+zvn9IY/fdvVyeyXyiJjqwLtpN 0DV3PDCWNQSZHTWoXLSJv8q7CuPeg/Fof7dsfiI6iXbvtVOU/3PzDutFD2r1yTQQ272I KfxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:content-transfer-encoding; bh=pu+hWCeRbeBxZbW9lOX4/tpy36ozC6RAYJYDV4ZENBg=; b=04+jZZUgh0tAwO0r4zUAmiC/xjv4DREGIpwlh1QG4BcfKMxWTptrU/Hx7gAmnrWEar KBK2CJFkPszw/v38U4o8dqd/zu446vQIt+CnaQfX54tVJbhzx5ng6Pj/g1CDl5DsyuRE kUchSTZoXccH+WMt1qt8Ou76M4aJUr+ZazmdrN4elQB9jG467RUwioauZwqC9joLSK6o 5pzTsyIWBBuk8VIWdGaSmMvRIjLJxPuCDu3zfV3hymNpNRHWPLfAY6emLvl6HjoXWcYb P6Htwt4bfXUmuUBBeoyE7i01KkXLWVTiwhvH0zP1/U2l76KjIE5zih5BuN4OFgE35+os bjlA== X-Gm-Message-State: AJIora82ver7hw/P7rC/MzczeIKOsr+HuoMOun8lt+wDDzKJ3oNWs9Sr yS/lE7d6Mh7+X0a9rtM0xg7NkybHMJfn3RAz X-Google-Smtp-Source: AGRyM1v/44efRDGz5stMEcF3+uFNjw1obX9MGGR2M6pqPDgBTb5HML3ki/j+gu71aC+nhszPxy01cA== X-Received: by 2002:a05:6a00:ad2:b0:4f1:2734:a3d9 with SMTP id c18-20020a056a000ad200b004f12734a3d9mr24747814pfl.61.1655732450820; Mon, 20 Jun 2022 06:40:50 -0700 (PDT) Received: from localhost (49.212.183.201.v6.sakura.ne.jp. [2403:3a00:202:1120:49:212:183:201]) by smtp.gmail.com with ESMTPSA id e3-20020a170903240300b0015ea3a491a1sm8720334plo.191.2022.06.20.06.40.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jun 2022 06:40:50 -0700 (PDT) Date: Mon, 20 Jun 2022 21:41:00 +0800 From: wuyy To: gentoo-soc Subject: [gentoo-soc] Week 1 Report for Refining ROCm Packages in Gentoo Message-ID: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-soc@lists.gentoo.org Reply-to: gentoo-soc@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable X-Archives-Salt: e256804f-8fe6-4915-8d91-ed735968925b X-Archives-Hash: 47d3f67742ff099f71ce4d35d4b5503a Hello all, This is my first week of GSoC at Gentoo, and I found it truly exciting. The= center of first week is around making dev-util/hip rely on vanilla clang. = In https://github.com/littlewu2508/gentoo/tree/blender-rocm, I bumped rocm-= device-libs, rocm-comgr, hip to 5.1.3 and use vanilla llvm/clang as backend= ; after that I bumped blender to 3.2.0 and enables its HIP cycles, and it w= orked on Radeon 6700XT (see [1])! That means I made a good start on replaci= ng llvm-roc with system llvm, which is originally the last thing in my GSoC= proposal. So, I changed the plan a bit, to move the last week's plan forwa= rd. The story begun when I heard blender 3.2.0 is finally released with HIP cyc= les support on Linux, so I decided to try it out. Also I searched the bugzi= lla and noticed a proposal to use llvm.eclass and rocm USE-flag[1]. After a quick bump for media-gfx/blender and its required dependencies, I e= nabled the HIP cycles in ebuild and started emerging. The build is surprisi= ngly smooth, since build commands are simply calling hipcc without too many= arguments which is already in good shape. However, blender was aborted whe= n I tried to use HIP cycles at runtime -- the error suggest that more than = one llvm libs are linked in. I realized that some dependencies like mesa li= nked vanilla llvm while blender itself has to link llvm-roc since it has co= mponents compiled with hipcc. I reported my trial in [1] and Sebastian Parb= org confirmed the reason of my failure, so I opened another bug about llvm-= roc at [2]. There I stated the situation and give two possible solutions: u= se vanilla clang as hip's backend, or make llvm-roc another slot of llvm/cl= ang. That is actually my last-week-plan in GSoC proposal, but at that time = I didn't realize the importance of making llvm-roc compatible of system llv= m, since I had never encountered a package that both use llvm and HIP. In t= he bug report I announced that the second solution should be easier so I pr= eferred that, but in my heart I think the first one is more elegant, so I w= ould try it first and fallback to the second solution if I failed. As a res= ult, I started my journey on removing llvm-roc from the ROCm dependency tre= e. The first thing is to modify rocm-device-libs. With the help of Micha=C5=82= G=C3=B3rny (who pointed out that packages should not assume llvm to have t= he "BUILD_SHARED_LIBS=3DON" and link llvm components in [2], knowledge++), = I patched the source made it only rely on llvm:14 (Fedora developers have a= lso discussed about this and they would like to upstream their patches). Th= en it's rocm-comgr, where I encounter serious problems. With the help from = Yuyi Wang, I figured out a patch [3] (however I do not understand why Debia= n and Fedora don't need it) and I prepare to upstream it to ROCm team in th= e future. After that only four test failures remain, but it took me a long = time to debug, and I found both Debian, Fedora team and me has not to come = to a solution yet, so I decided to open a github issue to upstream at https= ://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45. During ebui= ld writing I used llvm.eclass to determine llvm prefix and `clang -print-re= source-dir` to locate the CLANG_RESOURCE_DIR which is in `/usr/lib/clang/` but not the default relative path in llvm-project -- knowledge+=3D= 2. Then it was all about HIP. I encountered many issues about finding the corr= ect include locations, and they are fixed one-by-one. At last I came to a n= ew hipvars.pm and a patch to hipcc.pl, disabling poisoning `-isystem` and c= orrecting many paths. Now directly calling hipcc works, and blender rendere= d successfully using HIP cycles! I was amazed at this result. Then I continued to test -- compiling rocBLAS-5.1.3 using this new ROCm too= lchain. Sadly, there are paths that should be corrected in cmake files. I'v= e done some fixes, but there still needs more to let rocBLAS get configured= =2E Bumping the high-level libs using this new toolchain would be the major= task of the coming week. Another job is finalize and push low-level runtim= es and toolchain into ::gentoo via PRs, starting from https://github.com/ge= ntoo/gentoo/pull/25785. I'll also fix existing bugs when I bump the version= s of those in sci-libs. For https://bugs.gentoo.org/852236 I already have a= solution. For bugs of not respecting CFLAGS/LDFLAGS I shall investigate, a= nd I think the problem is in common with https://bugs.gentoo.org/851792. I'= ll check them one-by-one. **So, the plan is changed as follows:** I am currently half way in the middle of week 11's task. So plan of week 11= is merged into week 1, meaning that tasks in week 1-10 are postpone one we= ek. Also, since I'm using ROCm-5.1.3 as the test place of the new toolchain, I = would like to make use of rocm.eclass, if possible. That means the original= week 5-8 would be moved after week 2 (between CuPy and TensorFlow). In conclusion, in the first week I was persuaded by [1] that [2] is an impo= rtant blocker, so the task in week 11 is no longer optional but essential, = and get prompted. The good news is I'm getting nice progress on this issue,= and I believe I'm the first Gentoo user to package and use blender-3.2 wit= h HIP cycles. The bad news is I'm not finished with hacking cmake modules f= or HIP. [1] https://bugs.gentoo.org/693200 [2] https://bugs.gentoo.org/851702 [3] https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45#iss= uecomment-1155975910 -- Best wishes, Yiyang Wu