From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 93497158094 for ; Tue, 23 Aug 2022 14:51:26 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id B794DE07E6; Tue, 23 Aug 2022 14:51:25 +0000 (UTC) Received: from smtp.gentoo.org (woodpecker.gentoo.org [IPv6:2001:470:ea4a:1:5054:ff:fec7:86e4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 030FBE07E6 for ; Tue, 23 Aug 2022 14:51:23 +0000 (UTC) Received: from [2a0c:b641:69c:e7f1::2] (port=50674 helo=aurora) by muon with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oQVFQ-0007x2-2b for gentoo-soc@lists.gentoo.org; Tue, 23 Aug 2022 14:51:20 +0000 From: Benda Xu To: gentoo-soc Subject: Re: [gentoo-soc] Week 10 Report for Refining ROCm Packages in Gentoo References: Date: Tue, 23 Aug 2022 22:51:18 +0800 In-Reply-To: (wuyy's message of "Tue, 23 Aug 2022 22:03:05 +0800") Message-ID: <87sfln109l.fsf@gentoo.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-soc@lists.gentoo.org Reply-to: gentoo-soc@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 Content-Type: text/plain X-Archives-Salt: 49096819-d738-4b26-839a-58e0e51a45e4 X-Archives-Hash: e0fdfee6596d917ff4f9ba4bf6970cc8 Hi Yiyang, wuyy writes: > Sorry for the late report. I have been busy last week, so the actual > progress is slower than expected. > > This week I have leant a lot from Ulrich's comments on rocm.eclass. I > polished the eclass to v3 and send to gentoo-dev mailing list. However, > I observed another error introduced in v3, and I'll include a fix for it > in the v4 in the following days. > > Another half of my time is spent on testing sci-libs/roc-* packages on > various platforms, utilizing rocm.eclass. I can say that rocm.eclass did > its job as expected, so I believe after v4 it can be merged. Very good progress despite being busy. > With src_test enabled, I have found various test > failures. rocBLAS-5.1.3 fails 3 tests on Radeon RX 6700XT, slightly > exceeding tolerance, which seems not a big issue; rocFFT-5.1.3 fails > 16 suites on Radeon VII [1], which is serious and confirmed by > upstream, so I suggest masking amdgpu_targets_gfx906 USE flag for > rocFFT-5.1.3; just today I observe MIOpen is failing many tests, > probably due to vanilla clang. I'll open issues and report those test > failures to upstream. Running tests suite takes a lot of time, and > often drain the GPU. It may takes more than 15 hours testing rocBLAS, > even on performant CPU like Ryzen 5950X. If I use the GPU to render > graphics (run a desktop environment) and do test simultaneously, it > often result in amdgpu driver failure. I hope one day we can have a > testing farm for ROCm packages, but that would be expensive because > there are a lot of GPU architectures, and the compilation takes a lot > of time. Have the problems on Radeon VII reproduced on MI100? > I planned to finish the draft of wiki pages [2,3], but turns out I'm > running out of time. I'll catch up in week 11. My mentor is also busy in > week 10, so my PR about rocm-opencl-runtime is still pending for review. > Now we are working on solving the dependency issue of ROCm packages -- > gcc-12 and gcc-11.3.0 incompatibilities. Due to two bugs, the current > stable gcc, gcc-11.3.0 cannot compile some ROCm packages [4], and the > current unstable gcc, gcc-12, is unable to compile nearly all ROCm > packages [5]. If we cannot backport the gcc-12 compatiblity patch to llvm-14, the only way left is to live with gcc-11.4 while waiting for llvm-15. Good documentation on the wiki will educate the users why we are here. > I'll continue to do what's postponed in week 10 -- landing rocm.eclass > and sci-libs packages, preparing cupy, fixing bugs, and writing the wiki > pages. I'll investigate MIOpen's situation as well. Cheers, Benda