1 |
commit: 9597d3a8e121c0e0961de5642b1f550700dc8496 |
2 |
Author: Alexey Shvetsov <alexxy <AT> gentoo <DOT> org> |
3 |
AuthorDate: Tue Mar 5 05:34:54 2013 +0000 |
4 |
Commit: Alexey Shvetsov <alexxy <AT> gentoo <DOT> org> |
5 |
CommitDate: Tue Mar 5 05:34:54 2013 +0000 |
6 |
URL: http://git.overlays.gentoo.org/gitweb/?p=proj/x11.git;a=commit;h=9597d3a8 |
7 |
|
8 |
Update llvm R600 patch |
9 |
|
10 |
Package-Manager: portage-2.2.0_alpha166 |
11 |
RepoMan-Options: --force |
12 |
|
13 |
--- |
14 |
...-Add-R600-backend.patch => R600-Mesa-9.1.patch} | 7809 +++++++++++++------- |
15 |
sys-devel/llvm/llvm-3.2.ebuild | 57 +- |
16 |
sys-devel/llvm/metadata.xml | 1 - |
17 |
3 files changed, 5020 insertions(+), 2847 deletions(-) |
18 |
|
19 |
diff --git a/sys-devel/llvm/files/0001-Add-R600-backend.patch b/sys-devel/llvm/files/R600-Mesa-9.1.patch |
20 |
similarity index 81% |
21 |
rename from sys-devel/llvm/files/0001-Add-R600-backend.patch |
22 |
rename to sys-devel/llvm/files/R600-Mesa-9.1.patch |
23 |
index 4ebe499..9b9e1f5 100644 |
24 |
--- a/sys-devel/llvm/files/0001-Add-R600-backend.patch |
25 |
+++ b/sys-devel/llvm/files/R600-Mesa-9.1.patch |
26 |
@@ -1,517 +1,46 @@ |
27 |
-From 07d146158af424e4c0aa85a3de49516d97affbb9 Mon Sep 17 00:00:00 2001 |
28 |
-From: Tom Stellard <thomas.stellard@×××.com> |
29 |
-Date: Tue, 11 Dec 2012 21:25:42 +0000 |
30 |
-Subject: [PATCH] Add R600 backend |
31 |
-MIME-Version: 1.0 |
32 |
-Content-Type: text/plain; charset=UTF-8 |
33 |
-Content-Transfer-Encoding: 8bit |
34 |
- |
35 |
-A new backend supporting AMD GPUs: Radeon HD2XXX - HD7XXX |
36 |
- |
37 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169915 91177308-0d34-0410-b5e6-96231b3b80d8 |
38 |
- |
39 |
-Conflicts: |
40 |
- lib/Target/LLVMBuild.txt |
41 |
- |
42 |
-[CMake] Fixup R600. |
43 |
- |
44 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169962 91177308-0d34-0410-b5e6-96231b3b80d8 |
45 |
- |
46 |
-Avoid setIsInsideBundle in Target/R600. |
47 |
- |
48 |
-This function is going to be removed. |
49 |
- |
50 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170064 91177308-0d34-0410-b5e6-96231b3b80d8 |
51 |
- |
52 |
-R600: remove nonsense setPrefLoopAlignment |
53 |
- |
54 |
-The Align parameter is a power of two, so 16 results in 64K |
55 |
-alignment. Additional to that even 16 byte alignment doesn't |
56 |
-make any sense, so just remove it. |
57 |
- |
58 |
-Patch by: Christian König |
59 |
- |
60 |
-Reviewed-by: Tom Stellard <thomas.stellard@×××.com> |
61 |
-Tested-by: Michel Dänzer <michel.daenzer@×××.com> |
62 |
-Signed-off-by: Christian König <deathsimple@××××××××.de> |
63 |
- |
64 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170341 91177308-0d34-0410-b5e6-96231b3b80d8 |
65 |
- |
66 |
-R600: BB operand support for SI |
67 |
- |
68 |
-Patch by: Christian König |
69 |
- |
70 |
-Reviewed-by: Tom Stellard <thomas.stellard@×××.com> |
71 |
-Tested-by: Michel Dänzer <michel.daenzer@×××.com> |
72 |
-Signed-off-by: Christian König <deathsimple@××××××××.de> |
73 |
- |
74 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170342 91177308-0d34-0410-b5e6-96231b3b80d8 |
75 |
- |
76 |
-R600: enable S_*N2_* instructions |
77 |
- |
78 |
-They seem to work fine. |
79 |
- |
80 |
-Patch by: Christian König |
81 |
- |
82 |
-Reviewed-by: Tom Stellard <thomas.stellard@×××.com> |
83 |
-Tested-by: Michel Dänzer <michel.daenzer@×××.com> |
84 |
-Signed-off-by: Christian König <deathsimple@××××××××.de> |
85 |
- |
86 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170343 91177308-0d34-0410-b5e6-96231b3b80d8 |
87 |
- |
88 |
-R600: New control flow for SI v2 |
89 |
- |
90 |
-This patch replaces the control flow handling with a new |
91 |
-pass which structurize the graph before transforming it to |
92 |
-machine instruction. This has a couple of different advantages |
93 |
-and currently fixes 20 piglit tests without a single regression. |
94 |
- |
95 |
-It is now a general purpose transformation that could be not |
96 |
-only be used for SI/R6xx, but also for other hardware |
97 |
-implementations that use a form of structurized control flow. |
98 |
- |
99 |
-v2: further cleanup, fixes and documentation |
100 |
- |
101 |
-Patch by: Christian König |
102 |
- |
103 |
-Signed-off-by: Christian König <deathsimple@××××××××.de> |
104 |
-Reviewed-by: Tom Stellard <thomas.stellard@×××.com> |
105 |
-Tested-by: Michel Dänzer <michel.daenzer@×××.com> |
106 |
- |
107 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170591 91177308-0d34-0410-b5e6-96231b3b80d8 |
108 |
- |
109 |
-R600: control flow optimization |
110 |
- |
111 |
-Branch if we have enough instructions so that it makes sense. |
112 |
-Also remove branches if they don't make sense. |
113 |
- |
114 |
-Patch by: Christian König |
115 |
- |
116 |
-Reviewed-by: Tom Stellard <thomas.stellard@×××.com> |
117 |
-Tested-by: Michel Dänzer <michel.daenzer@×××.com> |
118 |
-Signed-off-by: Christian König <deathsimple@××××××××.de> |
119 |
- |
120 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170592 91177308-0d34-0410-b5e6-96231b3b80d8 |
121 |
- |
122 |
-R600: Remove unecessary VREG alignment. |
123 |
- |
124 |
-Unlike SGPRs VGPRs doesn't need to be aligned. |
125 |
- |
126 |
-Patch by: Christian König |
127 |
- |
128 |
-Reviewed-by: Tom Stellard <thomas.stellard@×××.com> |
129 |
-Tested-by: Michel Dänzer <michel.daenzer@×××.com> |
130 |
-Signed-off-by: Christian König <deathsimple@××××××××.de> |
131 |
- |
132 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170593 91177308-0d34-0410-b5e6-96231b3b80d8 |
133 |
- |
134 |
-R600: Add entry in CODE_OWNERS.TXT |
135 |
- |
136 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170594 91177308-0d34-0410-b5e6-96231b3b80d8 |
137 |
- |
138 |
-Conflicts: |
139 |
- CODE_OWNERS.TXT |
140 |
- |
141 |
-Target/R600: Update MIB according to r170588. |
142 |
- |
143 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170620 91177308-0d34-0410-b5e6-96231b3b80d8 |
144 |
- |
145 |
-R600: Expand vec4 INT <-> FP conversions |
146 |
- |
147 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170901 91177308-0d34-0410-b5e6-96231b3b80d8 |
148 |
- |
149 |
-R600: Add SHADOWCUBE to TEX_SHADOW pattern |
150 |
- |
151 |
-Patch by: Vadim Girlin |
152 |
- |
153 |
-Reviewed-by: Michel Dänzer <michel.daenzer@×××.com> |
154 |
- |
155 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170921 91177308-0d34-0410-b5e6-96231b3b80d8 |
156 |
- |
157 |
-R600: Fix MAX_UINT definition |
158 |
- |
159 |
-Patch by: Vadim Girlin |
160 |
- |
161 |
-Reviewed-by: Michel Dänzer <michel.daenzer@×××.com> |
162 |
- |
163 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170922 91177308-0d34-0410-b5e6-96231b3b80d8 |
164 |
- |
165 |
-R600: Coding style - remove empty spaces from the beginning of functions |
166 |
- |
167 |
-No functionality change. |
168 |
- |
169 |
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170923 91177308-0d34-0410-b5e6-96231b3b80d8 |
170 |
---- |
171 |
- CODE_OWNERS.TXT | 14 + |
172 |
- include/llvm/Intrinsics.td | 1 + |
173 |
- include/llvm/IntrinsicsR600.td | 36 + |
174 |
- lib/Target/LLVMBuild.txt | 2 +- |
175 |
- lib/Target/R600/AMDGPU.h | 49 + |
176 |
- lib/Target/R600/AMDGPU.td | 40 + |
177 |
- lib/Target/R600/AMDGPUAsmPrinter.cpp | 138 + |
178 |
- lib/Target/R600/AMDGPUAsmPrinter.h | 44 + |
179 |
- lib/Target/R600/AMDGPUCodeEmitter.h | 49 + |
180 |
- lib/Target/R600/AMDGPUConvertToISA.cpp | 62 + |
181 |
- lib/Target/R600/AMDGPUISelLowering.cpp | 417 +++ |
182 |
- lib/Target/R600/AMDGPUISelLowering.h | 144 + |
183 |
- lib/Target/R600/AMDGPUInstrInfo.cpp | 257 ++ |
184 |
- lib/Target/R600/AMDGPUInstrInfo.h | 149 + |
185 |
- lib/Target/R600/AMDGPUInstrInfo.td | 74 + |
186 |
- lib/Target/R600/AMDGPUInstructions.td | 190 ++ |
187 |
- lib/Target/R600/AMDGPUIntrinsics.td | 62 + |
188 |
- lib/Target/R600/AMDGPUMCInstLower.cpp | 83 + |
189 |
- lib/Target/R600/AMDGPUMCInstLower.h | 34 + |
190 |
- lib/Target/R600/AMDGPURegisterInfo.cpp | 51 + |
191 |
- lib/Target/R600/AMDGPURegisterInfo.h | 63 + |
192 |
- lib/Target/R600/AMDGPURegisterInfo.td | 22 + |
193 |
- lib/Target/R600/AMDGPUStructurizeCFG.cpp | 714 +++++ |
194 |
- lib/Target/R600/AMDGPUSubtarget.cpp | 87 + |
195 |
- lib/Target/R600/AMDGPUSubtarget.h | 65 + |
196 |
- lib/Target/R600/AMDGPUTargetMachine.cpp | 142 + |
197 |
- lib/Target/R600/AMDGPUTargetMachine.h | 70 + |
198 |
- lib/Target/R600/AMDIL.h | 106 + |
199 |
- lib/Target/R600/AMDIL7XXDevice.cpp | 115 + |
200 |
- lib/Target/R600/AMDIL7XXDevice.h | 72 + |
201 |
- lib/Target/R600/AMDILBase.td | 85 + |
202 |
- lib/Target/R600/AMDILCFGStructurizer.cpp | 3049 ++++++++++++++++++++ |
203 |
- lib/Target/R600/AMDILDevice.cpp | 124 + |
204 |
- lib/Target/R600/AMDILDevice.h | 117 + |
205 |
- lib/Target/R600/AMDILDeviceInfo.cpp | 94 + |
206 |
- lib/Target/R600/AMDILDeviceInfo.h | 88 + |
207 |
- lib/Target/R600/AMDILDevices.h | 19 + |
208 |
- lib/Target/R600/AMDILEvergreenDevice.cpp | 169 ++ |
209 |
- lib/Target/R600/AMDILEvergreenDevice.h | 93 + |
210 |
- lib/Target/R600/AMDILFrameLowering.cpp | 47 + |
211 |
- lib/Target/R600/AMDILFrameLowering.h | 40 + |
212 |
- lib/Target/R600/AMDILISelDAGToDAG.cpp | 485 ++++ |
213 |
- lib/Target/R600/AMDILISelLowering.cpp | 651 +++++ |
214 |
- lib/Target/R600/AMDILInstrInfo.td | 208 ++ |
215 |
- lib/Target/R600/AMDILIntrinsicInfo.cpp | 79 + |
216 |
- lib/Target/R600/AMDILIntrinsicInfo.h | 49 + |
217 |
- lib/Target/R600/AMDILIntrinsics.td | 242 ++ |
218 |
- lib/Target/R600/AMDILNIDevice.cpp | 65 + |
219 |
- lib/Target/R600/AMDILNIDevice.h | 57 + |
220 |
- lib/Target/R600/AMDILPeepholeOptimizer.cpp | 1215 ++++++++ |
221 |
- lib/Target/R600/AMDILRegisterInfo.td | 107 + |
222 |
- lib/Target/R600/AMDILSIDevice.cpp | 45 + |
223 |
- lib/Target/R600/AMDILSIDevice.h | 39 + |
224 |
- lib/Target/R600/CMakeLists.txt | 55 + |
225 |
- lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp | 132 + |
226 |
- lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h | 52 + |
227 |
- lib/Target/R600/InstPrinter/CMakeLists.txt | 7 + |
228 |
- lib/Target/R600/InstPrinter/LLVMBuild.txt | 24 + |
229 |
- lib/Target/R600/InstPrinter/Makefile | 15 + |
230 |
- lib/Target/R600/LLVMBuild.txt | 32 + |
231 |
- lib/Target/R600/MCTargetDesc/AMDGPUAsmBackend.cpp | 90 + |
232 |
- lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.cpp | 85 + |
233 |
- lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.h | 30 + |
234 |
- lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h | 60 + |
235 |
- .../R600/MCTargetDesc/AMDGPUMCTargetDesc.cpp | 113 + |
236 |
- lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.h | 55 + |
237 |
- lib/Target/R600/MCTargetDesc/CMakeLists.txt | 10 + |
238 |
- lib/Target/R600/MCTargetDesc/LLVMBuild.txt | 23 + |
239 |
- lib/Target/R600/MCTargetDesc/Makefile | 16 + |
240 |
- lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp | 575 ++++ |
241 |
- lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp | 298 ++ |
242 |
- lib/Target/R600/Makefile | 23 + |
243 |
- lib/Target/R600/Processors.td | 29 + |
244 |
- lib/Target/R600/R600Defines.h | 79 + |
245 |
- lib/Target/R600/R600ExpandSpecialInstrs.cpp | 334 +++ |
246 |
- lib/Target/R600/R600ISelLowering.cpp | 909 ++++++ |
247 |
- lib/Target/R600/R600ISelLowering.h | 72 + |
248 |
- lib/Target/R600/R600InstrInfo.cpp | 665 +++++ |
249 |
- lib/Target/R600/R600InstrInfo.h | 169 ++ |
250 |
- lib/Target/R600/R600Instructions.td | 1724 +++++++++++ |
251 |
- lib/Target/R600/R600Intrinsics.td | 32 + |
252 |
- lib/Target/R600/R600MachineFunctionInfo.cpp | 34 + |
253 |
- lib/Target/R600/R600MachineFunctionInfo.h | 39 + |
254 |
- lib/Target/R600/R600RegisterInfo.cpp | 89 + |
255 |
- lib/Target/R600/R600RegisterInfo.h | 55 + |
256 |
- lib/Target/R600/R600RegisterInfo.td | 107 + |
257 |
- lib/Target/R600/R600Schedule.td | 36 + |
258 |
- lib/Target/R600/SIAnnotateControlFlow.cpp | 330 +++ |
259 |
- lib/Target/R600/SIAssignInterpRegs.cpp | 152 + |
260 |
- lib/Target/R600/SIISelLowering.cpp | 512 ++++ |
261 |
- lib/Target/R600/SIISelLowering.h | 62 + |
262 |
- lib/Target/R600/SIInstrFormats.td | 146 + |
263 |
- lib/Target/R600/SIInstrInfo.cpp | 90 + |
264 |
- lib/Target/R600/SIInstrInfo.h | 62 + |
265 |
- lib/Target/R600/SIInstrInfo.td | 589 ++++ |
266 |
- lib/Target/R600/SIInstructions.td | 1351 +++++++++ |
267 |
- lib/Target/R600/SIIntrinsics.td | 52 + |
268 |
- lib/Target/R600/SILowerControlFlow.cpp | 331 +++ |
269 |
- lib/Target/R600/SILowerLiteralConstants.cpp | 108 + |
270 |
- lib/Target/R600/SIMachineFunctionInfo.cpp | 20 + |
271 |
- lib/Target/R600/SIMachineFunctionInfo.h | 34 + |
272 |
- lib/Target/R600/SIRegisterInfo.cpp | 48 + |
273 |
- lib/Target/R600/SIRegisterInfo.h | 47 + |
274 |
- lib/Target/R600/SIRegisterInfo.td | 167 ++ |
275 |
- lib/Target/R600/SISchedule.td | 15 + |
276 |
- lib/Target/R600/TargetInfo/AMDGPUTargetInfo.cpp | 26 + |
277 |
- lib/Target/R600/TargetInfo/CMakeLists.txt | 7 + |
278 |
- lib/Target/R600/TargetInfo/LLVMBuild.txt | 23 + |
279 |
- lib/Target/R600/TargetInfo/Makefile | 15 + |
280 |
- test/CodeGen/R600/add.v4i32.ll | 15 + |
281 |
- test/CodeGen/R600/and.v4i32.ll | 15 + |
282 |
- test/CodeGen/R600/fabs.ll | 16 + |
283 |
- test/CodeGen/R600/fadd.ll | 16 + |
284 |
- test/CodeGen/R600/fadd.v4f32.ll | 15 + |
285 |
- test/CodeGen/R600/fcmp-cnd.ll | 14 + |
286 |
- test/CodeGen/R600/fcmp-cnde-int-args.ll | 16 + |
287 |
- test/CodeGen/R600/fcmp.ll | 16 + |
288 |
- test/CodeGen/R600/fdiv.v4f32.ll | 19 + |
289 |
- test/CodeGen/R600/floor.ll | 16 + |
290 |
- test/CodeGen/R600/fmax.ll | 16 + |
291 |
- test/CodeGen/R600/fmin.ll | 16 + |
292 |
- test/CodeGen/R600/fmul.ll | 16 + |
293 |
- test/CodeGen/R600/fmul.v4f32.ll | 15 + |
294 |
- test/CodeGen/R600/fsub.ll | 17 + |
295 |
- test/CodeGen/R600/fsub.v4f32.ll | 15 + |
296 |
- test/CodeGen/R600/i8_to_double_to_float.ll | 11 + |
297 |
- test/CodeGen/R600/icmp-select-sete-reverse-args.ll | 18 + |
298 |
- test/CodeGen/R600/lit.local.cfg | 13 + |
299 |
- test/CodeGen/R600/literals.ll | 30 + |
300 |
- test/CodeGen/R600/llvm.AMDGPU.mul.ll | 17 + |
301 |
- test/CodeGen/R600/llvm.AMDGPU.trunc.ll | 16 + |
302 |
- test/CodeGen/R600/llvm.cos.ll | 16 + |
303 |
- test/CodeGen/R600/llvm.pow.ll | 19 + |
304 |
- test/CodeGen/R600/llvm.sin.ll | 16 + |
305 |
- test/CodeGen/R600/load.constant_addrspace.f32.ll | 9 + |
306 |
- test/CodeGen/R600/load.i8.ll | 10 + |
307 |
- test/CodeGen/R600/reciprocal.ll | 16 + |
308 |
- test/CodeGen/R600/sdiv.ll | 21 + |
309 |
- test/CodeGen/R600/selectcc-icmp-select-float.ll | 15 + |
310 |
- test/CodeGen/R600/selectcc_cnde.ll | 11 + |
311 |
- test/CodeGen/R600/selectcc_cnde_int.ll | 11 + |
312 |
- test/CodeGen/R600/setcc.v4i32.ll | 12 + |
313 |
- test/CodeGen/R600/short-args.ll | 37 + |
314 |
- test/CodeGen/R600/store.v4f32.ll | 9 + |
315 |
- test/CodeGen/R600/store.v4i32.ll | 9 + |
316 |
- test/CodeGen/R600/udiv.v4i32.ll | 15 + |
317 |
- test/CodeGen/R600/urem.v4i32.ll | 15 + |
318 |
- test/CodeGen/R600/vec4-expand.ll | 52 + |
319 |
- test/CodeGen/SI/sanity.ll | 37 + |
320 |
- 149 files changed, 21461 insertions(+), 1 deletion(-) |
321 |
- create mode 100644 include/llvm/IntrinsicsR600.td |
322 |
- create mode 100644 lib/Target/R600/AMDGPU.h |
323 |
- create mode 100644 lib/Target/R600/AMDGPU.td |
324 |
- create mode 100644 lib/Target/R600/AMDGPUAsmPrinter.cpp |
325 |
- create mode 100644 lib/Target/R600/AMDGPUAsmPrinter.h |
326 |
- create mode 100644 lib/Target/R600/AMDGPUCodeEmitter.h |
327 |
- create mode 100644 lib/Target/R600/AMDGPUConvertToISA.cpp |
328 |
- create mode 100644 lib/Target/R600/AMDGPUISelLowering.cpp |
329 |
- create mode 100644 lib/Target/R600/AMDGPUISelLowering.h |
330 |
- create mode 100644 lib/Target/R600/AMDGPUInstrInfo.cpp |
331 |
- create mode 100644 lib/Target/R600/AMDGPUInstrInfo.h |
332 |
- create mode 100644 lib/Target/R600/AMDGPUInstrInfo.td |
333 |
- create mode 100644 lib/Target/R600/AMDGPUInstructions.td |
334 |
- create mode 100644 lib/Target/R600/AMDGPUIntrinsics.td |
335 |
- create mode 100644 lib/Target/R600/AMDGPUMCInstLower.cpp |
336 |
- create mode 100644 lib/Target/R600/AMDGPUMCInstLower.h |
337 |
- create mode 100644 lib/Target/R600/AMDGPURegisterInfo.cpp |
338 |
- create mode 100644 lib/Target/R600/AMDGPURegisterInfo.h |
339 |
- create mode 100644 lib/Target/R600/AMDGPURegisterInfo.td |
340 |
- create mode 100644 lib/Target/R600/AMDGPUStructurizeCFG.cpp |
341 |
- create mode 100644 lib/Target/R600/AMDGPUSubtarget.cpp |
342 |
- create mode 100644 lib/Target/R600/AMDGPUSubtarget.h |
343 |
- create mode 100644 lib/Target/R600/AMDGPUTargetMachine.cpp |
344 |
- create mode 100644 lib/Target/R600/AMDGPUTargetMachine.h |
345 |
- create mode 100644 lib/Target/R600/AMDIL.h |
346 |
- create mode 100644 lib/Target/R600/AMDIL7XXDevice.cpp |
347 |
- create mode 100644 lib/Target/R600/AMDIL7XXDevice.h |
348 |
- create mode 100644 lib/Target/R600/AMDILBase.td |
349 |
- create mode 100644 lib/Target/R600/AMDILCFGStructurizer.cpp |
350 |
- create mode 100644 lib/Target/R600/AMDILDevice.cpp |
351 |
- create mode 100644 lib/Target/R600/AMDILDevice.h |
352 |
- create mode 100644 lib/Target/R600/AMDILDeviceInfo.cpp |
353 |
- create mode 100644 lib/Target/R600/AMDILDeviceInfo.h |
354 |
- create mode 100644 lib/Target/R600/AMDILDevices.h |
355 |
- create mode 100644 lib/Target/R600/AMDILEvergreenDevice.cpp |
356 |
- create mode 100644 lib/Target/R600/AMDILEvergreenDevice.h |
357 |
- create mode 100644 lib/Target/R600/AMDILFrameLowering.cpp |
358 |
- create mode 100644 lib/Target/R600/AMDILFrameLowering.h |
359 |
- create mode 100644 lib/Target/R600/AMDILISelDAGToDAG.cpp |
360 |
- create mode 100644 lib/Target/R600/AMDILISelLowering.cpp |
361 |
- create mode 100644 lib/Target/R600/AMDILInstrInfo.td |
362 |
- create mode 100644 lib/Target/R600/AMDILIntrinsicInfo.cpp |
363 |
- create mode 100644 lib/Target/R600/AMDILIntrinsicInfo.h |
364 |
- create mode 100644 lib/Target/R600/AMDILIntrinsics.td |
365 |
- create mode 100644 lib/Target/R600/AMDILNIDevice.cpp |
366 |
- create mode 100644 lib/Target/R600/AMDILNIDevice.h |
367 |
- create mode 100644 lib/Target/R600/AMDILPeepholeOptimizer.cpp |
368 |
- create mode 100644 lib/Target/R600/AMDILRegisterInfo.td |
369 |
- create mode 100644 lib/Target/R600/AMDILSIDevice.cpp |
370 |
- create mode 100644 lib/Target/R600/AMDILSIDevice.h |
371 |
- create mode 100644 lib/Target/R600/CMakeLists.txt |
372 |
- create mode 100644 lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp |
373 |
- create mode 100644 lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h |
374 |
- create mode 100644 lib/Target/R600/InstPrinter/CMakeLists.txt |
375 |
- create mode 100644 lib/Target/R600/InstPrinter/LLVMBuild.txt |
376 |
- create mode 100644 lib/Target/R600/InstPrinter/Makefile |
377 |
- create mode 100644 lib/Target/R600/LLVMBuild.txt |
378 |
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUAsmBackend.cpp |
379 |
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.cpp |
380 |
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.h |
381 |
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h |
382 |
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.cpp |
383 |
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.h |
384 |
- create mode 100644 lib/Target/R600/MCTargetDesc/CMakeLists.txt |
385 |
- create mode 100644 lib/Target/R600/MCTargetDesc/LLVMBuild.txt |
386 |
- create mode 100644 lib/Target/R600/MCTargetDesc/Makefile |
387 |
- create mode 100644 lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp |
388 |
- create mode 100644 lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp |
389 |
- create mode 100644 lib/Target/R600/Makefile |
390 |
- create mode 100644 lib/Target/R600/Processors.td |
391 |
- create mode 100644 lib/Target/R600/R600Defines.h |
392 |
- create mode 100644 lib/Target/R600/R600ExpandSpecialInstrs.cpp |
393 |
- create mode 100644 lib/Target/R600/R600ISelLowering.cpp |
394 |
- create mode 100644 lib/Target/R600/R600ISelLowering.h |
395 |
- create mode 100644 lib/Target/R600/R600InstrInfo.cpp |
396 |
- create mode 100644 lib/Target/R600/R600InstrInfo.h |
397 |
- create mode 100644 lib/Target/R600/R600Instructions.td |
398 |
- create mode 100644 lib/Target/R600/R600Intrinsics.td |
399 |
- create mode 100644 lib/Target/R600/R600MachineFunctionInfo.cpp |
400 |
- create mode 100644 lib/Target/R600/R600MachineFunctionInfo.h |
401 |
- create mode 100644 lib/Target/R600/R600RegisterInfo.cpp |
402 |
- create mode 100644 lib/Target/R600/R600RegisterInfo.h |
403 |
- create mode 100644 lib/Target/R600/R600RegisterInfo.td |
404 |
- create mode 100644 lib/Target/R600/R600Schedule.td |
405 |
- create mode 100644 lib/Target/R600/SIAnnotateControlFlow.cpp |
406 |
- create mode 100644 lib/Target/R600/SIAssignInterpRegs.cpp |
407 |
- create mode 100644 lib/Target/R600/SIISelLowering.cpp |
408 |
- create mode 100644 lib/Target/R600/SIISelLowering.h |
409 |
- create mode 100644 lib/Target/R600/SIInstrFormats.td |
410 |
- create mode 100644 lib/Target/R600/SIInstrInfo.cpp |
411 |
- create mode 100644 lib/Target/R600/SIInstrInfo.h |
412 |
- create mode 100644 lib/Target/R600/SIInstrInfo.td |
413 |
- create mode 100644 lib/Target/R600/SIInstructions.td |
414 |
- create mode 100644 lib/Target/R600/SIIntrinsics.td |
415 |
- create mode 100644 lib/Target/R600/SILowerControlFlow.cpp |
416 |
- create mode 100644 lib/Target/R600/SILowerLiteralConstants.cpp |
417 |
- create mode 100644 lib/Target/R600/SIMachineFunctionInfo.cpp |
418 |
- create mode 100644 lib/Target/R600/SIMachineFunctionInfo.h |
419 |
- create mode 100644 lib/Target/R600/SIRegisterInfo.cpp |
420 |
- create mode 100644 lib/Target/R600/SIRegisterInfo.h |
421 |
- create mode 100644 lib/Target/R600/SIRegisterInfo.td |
422 |
- create mode 100644 lib/Target/R600/SISchedule.td |
423 |
- create mode 100644 lib/Target/R600/TargetInfo/AMDGPUTargetInfo.cpp |
424 |
- create mode 100644 lib/Target/R600/TargetInfo/CMakeLists.txt |
425 |
- create mode 100644 lib/Target/R600/TargetInfo/LLVMBuild.txt |
426 |
- create mode 100644 lib/Target/R600/TargetInfo/Makefile |
427 |
- create mode 100644 test/CodeGen/R600/add.v4i32.ll |
428 |
- create mode 100644 test/CodeGen/R600/and.v4i32.ll |
429 |
- create mode 100644 test/CodeGen/R600/fabs.ll |
430 |
- create mode 100644 test/CodeGen/R600/fadd.ll |
431 |
- create mode 100644 test/CodeGen/R600/fadd.v4f32.ll |
432 |
- create mode 100644 test/CodeGen/R600/fcmp-cnd.ll |
433 |
- create mode 100644 test/CodeGen/R600/fcmp-cnde-int-args.ll |
434 |
- create mode 100644 test/CodeGen/R600/fcmp.ll |
435 |
- create mode 100644 test/CodeGen/R600/fdiv.v4f32.ll |
436 |
- create mode 100644 test/CodeGen/R600/floor.ll |
437 |
- create mode 100644 test/CodeGen/R600/fmax.ll |
438 |
- create mode 100644 test/CodeGen/R600/fmin.ll |
439 |
- create mode 100644 test/CodeGen/R600/fmul.ll |
440 |
- create mode 100644 test/CodeGen/R600/fmul.v4f32.ll |
441 |
- create mode 100644 test/CodeGen/R600/fsub.ll |
442 |
- create mode 100644 test/CodeGen/R600/fsub.v4f32.ll |
443 |
- create mode 100644 test/CodeGen/R600/i8_to_double_to_float.ll |
444 |
- create mode 100644 test/CodeGen/R600/icmp-select-sete-reverse-args.ll |
445 |
- create mode 100644 test/CodeGen/R600/lit.local.cfg |
446 |
- create mode 100644 test/CodeGen/R600/literals.ll |
447 |
- create mode 100644 test/CodeGen/R600/llvm.AMDGPU.mul.ll |
448 |
- create mode 100644 test/CodeGen/R600/llvm.AMDGPU.trunc.ll |
449 |
- create mode 100644 test/CodeGen/R600/llvm.cos.ll |
450 |
- create mode 100644 test/CodeGen/R600/llvm.pow.ll |
451 |
- create mode 100644 test/CodeGen/R600/llvm.sin.ll |
452 |
- create mode 100644 test/CodeGen/R600/load.constant_addrspace.f32.ll |
453 |
- create mode 100644 test/CodeGen/R600/load.i8.ll |
454 |
- create mode 100644 test/CodeGen/R600/reciprocal.ll |
455 |
- create mode 100644 test/CodeGen/R600/sdiv.ll |
456 |
- create mode 100644 test/CodeGen/R600/selectcc-icmp-select-float.ll |
457 |
- create mode 100644 test/CodeGen/R600/selectcc_cnde.ll |
458 |
- create mode 100644 test/CodeGen/R600/selectcc_cnde_int.ll |
459 |
- create mode 100644 test/CodeGen/R600/setcc.v4i32.ll |
460 |
- create mode 100644 test/CodeGen/R600/short-args.ll |
461 |
- create mode 100644 test/CodeGen/R600/store.v4f32.ll |
462 |
- create mode 100644 test/CodeGen/R600/store.v4i32.ll |
463 |
- create mode 100644 test/CodeGen/R600/udiv.v4i32.ll |
464 |
- create mode 100644 test/CodeGen/R600/urem.v4i32.ll |
465 |
- create mode 100644 test/CodeGen/R600/vec4-expand.ll |
466 |
- create mode 100644 test/CodeGen/SI/sanity.ll |
467 |
- |
468 |
-diff --git a/CODE_OWNERS.TXT b/CODE_OWNERS.TXT |
469 |
-index fd7bcda..90285be 100644 |
470 |
---- a/CODE_OWNERS.TXT |
471 |
-+++ b/CODE_OWNERS.TXT |
472 |
-@@ -49,3 +49,17 @@ D: Register allocators and TableGen |
473 |
- N: Duncan Sands |
474 |
- E: baldrick@××××.fr |
475 |
- D: DragonEgg |
476 |
-+ |
477 |
-+N: Tom Stellard |
478 |
-+E: thomas.stellard@×××.com |
479 |
-+E: mesa-dev@×××××××××××××××××.org |
480 |
-+D: R600 Backend |
481 |
-+ |
482 |
-+N: Andrew Trick |
483 |
-+E: atrick@×××××.com |
484 |
-+D: IndVar Simplify, Loop Strength Reduction, Instruction Scheduling |
485 |
-+ |
486 |
-+N: Bill Wendling |
487 |
-+E: wendling@×××××.com |
488 |
-+D: libLTO & IR Linker |
489 |
-+ |
490 |
-diff --git a/include/llvm/Intrinsics.td b/include/llvm/Intrinsics.td |
491 |
-index 2e1597f..059bd80 100644 |
492 |
---- a/include/llvm/Intrinsics.td |
493 |
-+++ b/include/llvm/Intrinsics.td |
494 |
-@@ -469,3 +469,4 @@ include "llvm/IntrinsicsXCore.td" |
495 |
- include "llvm/IntrinsicsHexagon.td" |
496 |
- include "llvm/IntrinsicsNVVM.td" |
497 |
- include "llvm/IntrinsicsMips.td" |
498 |
-+include "llvm/IntrinsicsR600.td" |
499 |
-diff --git a/include/llvm/IntrinsicsR600.td b/include/llvm/IntrinsicsR600.td |
500 |
-new file mode 100644 |
501 |
-index 0000000..ecb5668 |
502 |
---- /dev/null |
503 |
-+++ b/include/llvm/IntrinsicsR600.td |
504 |
-@@ -0,0 +1,36 @@ |
505 |
-+//===- IntrinsicsR600.td - Defines R600 intrinsics ---------*- tablegen -*-===// |
506 |
-+// |
507 |
-+// The LLVM Compiler Infrastructure |
508 |
-+// |
509 |
-+// This file is distributed under the University of Illinois Open Source |
510 |
-+// License. See LICENSE.TXT for details. |
511 |
-+// |
512 |
-+//===----------------------------------------------------------------------===// |
513 |
-+// |
514 |
-+// This file defines all of the R600-specific intrinsics. |
515 |
-+// |
516 |
-+//===----------------------------------------------------------------------===// |
517 |
-+ |
518 |
-+let TargetPrefix = "r600" in { |
519 |
-+ |
520 |
-+class R600ReadPreloadRegisterIntrinsic<string name> |
521 |
-+ : Intrinsic<[llvm_i32_ty], [], [IntrNoMem]>, |
522 |
-+ GCCBuiltin<name>; |
523 |
-+ |
524 |
-+multiclass R600ReadPreloadRegisterIntrinsic_xyz<string prefix> { |
525 |
-+ def _x : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_x")>; |
526 |
-+ def _y : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_y")>; |
527 |
-+ def _z : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_z")>; |
528 |
-+} |
529 |
-+ |
530 |
-+defm int_r600_read_global_size : R600ReadPreloadRegisterIntrinsic_xyz < |
531 |
-+ "__builtin_r600_read_global_size">; |
532 |
-+defm int_r600_read_local_size : R600ReadPreloadRegisterIntrinsic_xyz < |
533 |
-+ "__builtin_r600_read_local_size">; |
534 |
-+defm int_r600_read_ngroups : R600ReadPreloadRegisterIntrinsic_xyz < |
535 |
-+ "__builtin_r600_read_ngroups">; |
536 |
-+defm int_r600_read_tgid : R600ReadPreloadRegisterIntrinsic_xyz < |
537 |
-+ "__builtin_r600_read_tgid">; |
538 |
-+defm int_r600_read_tidig : R600ReadPreloadRegisterIntrinsic_xyz < |
539 |
-+ "__builtin_r600_read_tidig">; |
540 |
-+} // End TargetPrefix = "r600" |
541 |
+diff --git a/autoconf/configure.ac b/autoconf/configure.ac |
542 |
+index 7715531..1330c36 100644 |
543 |
+--- a/autoconf/configure.ac |
544 |
++++ b/autoconf/configure.ac |
545 |
+@@ -751,6 +751,11 @@ AC_ARG_ENABLE([experimental-targets],AS_HELP_STRING([--enable-experimental-targe |
546 |
+ |
547 |
+ if test ${enableval} != "disable" |
548 |
+ then |
549 |
++ if test ${enableval} = "AMDGPU" |
550 |
++ then |
551 |
++ AC_MSG_ERROR([The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600]) |
552 |
++ enableval="R600" |
553 |
++ fi |
554 |
+ TARGETS_TO_BUILD="$enableval $TARGETS_TO_BUILD" |
555 |
+ fi |
556 |
+ |
557 |
+diff --git a/configure b/configure |
558 |
+index 4fa0705..02012b9 100755 |
559 |
+--- a/configure |
560 |
++++ b/configure |
561 |
+@@ -5473,6 +5473,13 @@ fi |
562 |
+ |
563 |
+ if test ${enableval} != "disable" |
564 |
+ then |
565 |
++ if test ${enableval} = "AMDGPU" |
566 |
++ then |
567 |
++ { { echo "$as_me:$LINENO: error: The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600" >&5 |
568 |
++echo "$as_me: error: The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600" >&2;} |
569 |
++ { (exit 1); exit 1; }; } |
570 |
++ enableval="R600" |
571 |
++ fi |
572 |
+ TARGETS_TO_BUILD="$enableval $TARGETS_TO_BUILD" |
573 |
+ fi |
574 |
+ |
575 |
+@@ -10316,7 +10323,7 @@ else |
576 |
+ lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 |
577 |
+ lt_status=$lt_dlunknown |
578 |
+ cat > conftest.$ac_ext <<EOF |
579 |
+-#line 10317 "configure" |
580 |
++#line 10326 "configure" |
581 |
+ #include "confdefs.h" |
582 |
+ |
583 |
+ #if HAVE_DLFCN_H |
584 |
diff --git a/lib/Target/LLVMBuild.txt b/lib/Target/LLVMBuild.txt |
585 |
index 8995080..84c4111 100644 |
586 |
--- a/lib/Target/LLVMBuild.txt |
587 |
@@ -527,10 +56,10 @@ index 8995080..84c4111 100644 |
588 |
; with the best execution engine (the native JIT, if available, or the |
589 |
diff --git a/lib/Target/R600/AMDGPU.h b/lib/Target/R600/AMDGPU.h |
590 |
new file mode 100644 |
591 |
-index 0000000..0f5125d |
592 |
+index 0000000..ba87918 |
593 |
--- /dev/null |
594 |
+++ b/lib/Target/R600/AMDGPU.h |
595 |
-@@ -0,0 +1,49 @@ |
596 |
+@@ -0,0 +1,51 @@ |
597 |
+//===-- AMDGPU.h - MachineFunction passes hw codegen --------------*- C++ -*-=// |
598 |
+// |
599 |
+// The LLVM Compiler Infrastructure |
600 |
@@ -556,17 +85,19 @@ index 0000000..0f5125d |
601 |
+// R600 Passes |
602 |
+FunctionPass* createR600KernelParametersPass(const DataLayout *TD); |
603 |
+FunctionPass *createR600ExpandSpecialInstrsPass(TargetMachine &tm); |
604 |
++FunctionPass *createR600LowerConstCopy(TargetMachine &tm); |
605 |
+ |
606 |
+// SI Passes |
607 |
+FunctionPass *createSIAnnotateControlFlowPass(); |
608 |
+FunctionPass *createSIAssignInterpRegsPass(TargetMachine &tm); |
609 |
+FunctionPass *createSILowerControlFlowPass(TargetMachine &tm); |
610 |
+FunctionPass *createSICodeEmitterPass(formatted_raw_ostream &OS); |
611 |
-+FunctionPass *createSILowerLiteralConstantsPass(TargetMachine &tm); |
612 |
++FunctionPass *createSIInsertWaits(TargetMachine &tm); |
613 |
+ |
614 |
+// Passes common to R600 and SI |
615 |
+Pass *createAMDGPUStructurizeCFGPass(); |
616 |
+FunctionPass *createAMDGPUConvertToISAPass(TargetMachine &tm); |
617 |
++FunctionPass* createAMDGPUIndirectAddressingPass(TargetMachine &tm); |
618 |
+ |
619 |
+} // End namespace llvm |
620 |
+ |
621 |
@@ -628,10 +159,10 @@ index 0000000..40f4741 |
622 |
+include "AMDGPUInstructions.td" |
623 |
diff --git a/lib/Target/R600/AMDGPUAsmPrinter.cpp b/lib/Target/R600/AMDGPUAsmPrinter.cpp |
624 |
new file mode 100644 |
625 |
-index 0000000..4553c45 |
626 |
+index 0000000..254e62e |
627 |
--- /dev/null |
628 |
+++ b/lib/Target/R600/AMDGPUAsmPrinter.cpp |
629 |
-@@ -0,0 +1,138 @@ |
630 |
+@@ -0,0 +1,145 @@ |
631 |
+//===-- AMDGPUAsmPrinter.cpp - AMDGPU Assebly printer --------------------===// |
632 |
+// |
633 |
+// The LLVM Compiler Infrastructure |
634 |
@@ -681,6 +212,9 @@ index 0000000..4553c45 |
635 |
+#endif |
636 |
+ } |
637 |
+ SetupMachineFunction(MF); |
638 |
++ if (OutStreamer.hasRawTextSupport()) { |
639 |
++ OutStreamer.EmitRawText("@" + MF.getName() + ":"); |
640 |
++ } |
641 |
+ OutStreamer.SwitchSection(getObjFileLowering().getTextSection()); |
642 |
+ if (STM.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) { |
643 |
+ EmitProgramInfo(MF); |
644 |
@@ -722,8 +256,6 @@ index 0000000..4553c45 |
645 |
+ switch (reg) { |
646 |
+ default: break; |
647 |
+ case AMDGPU::EXEC: |
648 |
-+ case AMDGPU::SI_LITERAL_CONSTANT: |
649 |
-+ case AMDGPU::SREG_LIT_0: |
650 |
+ case AMDGPU::M0: |
651 |
+ continue; |
652 |
+ } |
653 |
@@ -749,10 +281,16 @@ index 0000000..4553c45 |
654 |
+ } else if (AMDGPU::SReg_256RegClass.contains(reg)) { |
655 |
+ isSGPR = true; |
656 |
+ width = 8; |
657 |
++ } else if (AMDGPU::VReg_256RegClass.contains(reg)) { |
658 |
++ isSGPR = false; |
659 |
++ width = 8; |
660 |
++ } else if (AMDGPU::VReg_512RegClass.contains(reg)) { |
661 |
++ isSGPR = false; |
662 |
++ width = 16; |
663 |
+ } else { |
664 |
+ assert(!"Unknown register class"); |
665 |
+ } |
666 |
-+ hwReg = RI->getEncodingValue(reg); |
667 |
++ hwReg = RI->getEncodingValue(reg) & 0xff; |
668 |
+ maxUsed = hwReg + width - 1; |
669 |
+ if (isSGPR) { |
670 |
+ MaxSGPR = maxUsed > MaxSGPR ? maxUsed : MaxSGPR; |
671 |
@@ -820,61 +358,6 @@ index 0000000..3812282 |
672 |
+} // End anonymous llvm |
673 |
+ |
674 |
+#endif //AMDGPU_ASMPRINTER_H |
675 |
-diff --git a/lib/Target/R600/AMDGPUCodeEmitter.h b/lib/Target/R600/AMDGPUCodeEmitter.h |
676 |
-new file mode 100644 |
677 |
-index 0000000..84f3588 |
678 |
---- /dev/null |
679 |
-+++ b/lib/Target/R600/AMDGPUCodeEmitter.h |
680 |
-@@ -0,0 +1,49 @@ |
681 |
-+//===-- AMDGPUCodeEmitter.h - AMDGPU Code Emitter interface -----------------===// |
682 |
-+// |
683 |
-+// The LLVM Compiler Infrastructure |
684 |
-+// |
685 |
-+// This file is distributed under the University of Illinois Open Source |
686 |
-+// License. See LICENSE.TXT for details. |
687 |
-+// |
688 |
-+//===----------------------------------------------------------------------===// |
689 |
-+// |
690 |
-+/// \file |
691 |
-+/// \brief CodeEmitter interface for R600 and SI codegen. |
692 |
-+// |
693 |
-+//===----------------------------------------------------------------------===// |
694 |
-+ |
695 |
-+#ifndef AMDGPUCODEEMITTER_H |
696 |
-+#define AMDGPUCODEEMITTER_H |
697 |
-+ |
698 |
-+namespace llvm { |
699 |
-+ |
700 |
-+class AMDGPUCodeEmitter { |
701 |
-+public: |
702 |
-+ uint64_t getBinaryCodeForInstr(const MachineInstr &MI) const; |
703 |
-+ virtual uint64_t getMachineOpValue(const MachineInstr &MI, |
704 |
-+ const MachineOperand &MO) const { return 0; } |
705 |
-+ virtual unsigned GPR4AlignEncode(const MachineInstr &MI, |
706 |
-+ unsigned OpNo) const { |
707 |
-+ return 0; |
708 |
-+ } |
709 |
-+ virtual unsigned GPR2AlignEncode(const MachineInstr &MI, |
710 |
-+ unsigned OpNo) const { |
711 |
-+ return 0; |
712 |
-+ } |
713 |
-+ virtual uint64_t VOPPostEncode(const MachineInstr &MI, |
714 |
-+ uint64_t Value) const { |
715 |
-+ return Value; |
716 |
-+ } |
717 |
-+ virtual uint64_t i32LiteralEncode(const MachineInstr &MI, |
718 |
-+ unsigned OpNo) const { |
719 |
-+ return 0; |
720 |
-+ } |
721 |
-+ virtual uint32_t SMRDmemriEncode(const MachineInstr &MI, unsigned OpNo) |
722 |
-+ const { |
723 |
-+ return 0; |
724 |
-+ } |
725 |
-+}; |
726 |
-+ |
727 |
-+} // End namespace llvm |
728 |
-+ |
729 |
-+#endif // AMDGPUCODEEMITTER_H |
730 |
diff --git a/lib/Target/R600/AMDGPUConvertToISA.cpp b/lib/Target/R600/AMDGPUConvertToISA.cpp |
731 |
new file mode 100644 |
732 |
index 0000000..50297d1 |
733 |
@@ -943,12 +426,190 @@ index 0000000..50297d1 |
734 |
+ } |
735 |
+ return false; |
736 |
+} |
737 |
+diff --git a/lib/Target/R600/AMDGPUFrameLowering.cpp b/lib/Target/R600/AMDGPUFrameLowering.cpp |
738 |
+new file mode 100644 |
739 |
+index 0000000..a3b6936 |
740 |
+--- /dev/null |
741 |
++++ b/lib/Target/R600/AMDGPUFrameLowering.cpp |
742 |
+@@ -0,0 +1,122 @@ |
743 |
++//===----------------------- AMDGPUFrameLowering.cpp ----------------------===// |
744 |
++// |
745 |
++// The LLVM Compiler Infrastructure |
746 |
++// |
747 |
++// This file is distributed under the University of Illinois Open Source |
748 |
++// License. See LICENSE.TXT for details. |
749 |
++// |
750 |
++//==-----------------------------------------------------------------------===// |
751 |
++// |
752 |
++// Interface to describe a layout of a stack frame on a AMDIL target machine |
753 |
++// |
754 |
++//===----------------------------------------------------------------------===// |
755 |
++#include "AMDGPUFrameLowering.h" |
756 |
++#include "AMDGPURegisterInfo.h" |
757 |
++#include "R600MachineFunctionInfo.h" |
758 |
++#include "llvm/CodeGen/MachineFrameInfo.h" |
759 |
++#include "llvm/CodeGen/MachineRegisterInfo.h" |
760 |
++#include "llvm/Instructions.h" |
761 |
++ |
762 |
++using namespace llvm; |
763 |
++AMDGPUFrameLowering::AMDGPUFrameLowering(StackDirection D, unsigned StackAl, |
764 |
++ int LAO, unsigned TransAl) |
765 |
++ : TargetFrameLowering(D, StackAl, LAO, TransAl) { } |
766 |
++ |
767 |
++AMDGPUFrameLowering::~AMDGPUFrameLowering() { } |
768 |
++ |
769 |
++unsigned AMDGPUFrameLowering::getStackWidth(const MachineFunction &MF) const { |
770 |
++ |
771 |
++ // XXX: Hardcoding to 1 for now. |
772 |
++ // |
773 |
++ // I think the StackWidth should stored as metadata associated with the |
774 |
++ // MachineFunction. This metadata can either be added by a frontend, or |
775 |
++ // calculated by a R600 specific LLVM IR pass. |
776 |
++ // |
777 |
++ // The StackWidth determines how stack objects are laid out in memory. |
778 |
++ // For a vector stack variable, like: int4 stack[2], the data will be stored |
779 |
++ // in the following ways depending on the StackWidth. |
780 |
++ // |
781 |
++ // StackWidth = 1: |
782 |
++ // |
783 |
++ // T0.X = stack[0].x |
784 |
++ // T1.X = stack[0].y |
785 |
++ // T2.X = stack[0].z |
786 |
++ // T3.X = stack[0].w |
787 |
++ // T4.X = stack[1].x |
788 |
++ // T5.X = stack[1].y |
789 |
++ // T6.X = stack[1].z |
790 |
++ // T7.X = stack[1].w |
791 |
++ // |
792 |
++ // StackWidth = 2: |
793 |
++ // |
794 |
++ // T0.X = stack[0].x |
795 |
++ // T0.Y = stack[0].y |
796 |
++ // T1.X = stack[0].z |
797 |
++ // T1.Y = stack[0].w |
798 |
++ // T2.X = stack[1].x |
799 |
++ // T2.Y = stack[1].y |
800 |
++ // T3.X = stack[1].z |
801 |
++ // T3.Y = stack[1].w |
802 |
++ // |
803 |
++ // StackWidth = 4: |
804 |
++ // T0.X = stack[0].x |
805 |
++ // T0.Y = stack[0].y |
806 |
++ // T0.Z = stack[0].z |
807 |
++ // T0.W = stack[0].w |
808 |
++ // T1.X = stack[1].x |
809 |
++ // T1.Y = stack[1].y |
810 |
++ // T1.Z = stack[1].z |
811 |
++ // T1.W = stack[1].w |
812 |
++ return 1; |
813 |
++} |
814 |
++ |
815 |
++/// \returns The number of registers allocated for \p FI. |
816 |
++int AMDGPUFrameLowering::getFrameIndexOffset(const MachineFunction &MF, |
817 |
++ int FI) const { |
818 |
++ const MachineFrameInfo *MFI = MF.getFrameInfo(); |
819 |
++ unsigned Offset = 0; |
820 |
++ int UpperBound = FI == -1 ? MFI->getNumObjects() : FI; |
821 |
++ |
822 |
++ for (int i = MFI->getObjectIndexBegin(); i < UpperBound; ++i) { |
823 |
++ const AllocaInst *Alloca = MFI->getObjectAllocation(i); |
824 |
++ unsigned ArrayElements; |
825 |
++ const Type *AllocaType = Alloca->getAllocatedType(); |
826 |
++ const Type *ElementType; |
827 |
++ |
828 |
++ if (AllocaType->isArrayTy()) { |
829 |
++ ArrayElements = AllocaType->getArrayNumElements(); |
830 |
++ ElementType = AllocaType->getArrayElementType(); |
831 |
++ } else { |
832 |
++ ArrayElements = 1; |
833 |
++ ElementType = AllocaType; |
834 |
++ } |
835 |
++ |
836 |
++ unsigned VectorElements; |
837 |
++ if (ElementType->isVectorTy()) { |
838 |
++ VectorElements = ElementType->getVectorNumElements(); |
839 |
++ } else { |
840 |
++ VectorElements = 1; |
841 |
++ } |
842 |
++ |
843 |
++ Offset += (VectorElements / getStackWidth(MF)) * ArrayElements; |
844 |
++ } |
845 |
++ return Offset; |
846 |
++} |
847 |
++ |
848 |
++const TargetFrameLowering::SpillSlot * |
849 |
++AMDGPUFrameLowering::getCalleeSavedSpillSlots(unsigned &NumEntries) const { |
850 |
++ NumEntries = 0; |
851 |
++ return 0; |
852 |
++} |
853 |
++void |
854 |
++AMDGPUFrameLowering::emitPrologue(MachineFunction &MF) const { |
855 |
++} |
856 |
++void |
857 |
++AMDGPUFrameLowering::emitEpilogue(MachineFunction &MF, |
858 |
++ MachineBasicBlock &MBB) const { |
859 |
++} |
860 |
++ |
861 |
++bool |
862 |
++AMDGPUFrameLowering::hasFP(const MachineFunction &MF) const { |
863 |
++ return false; |
864 |
++} |
865 |
+diff --git a/lib/Target/R600/AMDGPUFrameLowering.h b/lib/Target/R600/AMDGPUFrameLowering.h |
866 |
+new file mode 100644 |
867 |
+index 0000000..cf5742e |
868 |
+--- /dev/null |
869 |
++++ b/lib/Target/R600/AMDGPUFrameLowering.h |
870 |
+@@ -0,0 +1,44 @@ |
871 |
++//===--------------------- AMDGPUFrameLowering.h ----------------*- C++ -*-===// |
872 |
++// |
873 |
++// The LLVM Compiler Infrastructure |
874 |
++// |
875 |
++// This file is distributed under the University of Illinois Open Source |
876 |
++// License. See LICENSE.TXT for details. |
877 |
++// |
878 |
++//===----------------------------------------------------------------------===// |
879 |
++// |
880 |
++/// \file |
881 |
++/// \brief Interface to describe a layout of a stack frame on a AMDIL target |
882 |
++/// machine. |
883 |
++// |
884 |
++//===----------------------------------------------------------------------===// |
885 |
++#ifndef AMDILFRAME_LOWERING_H |
886 |
++#define AMDILFRAME_LOWERING_H |
887 |
++ |
888 |
++#include "llvm/CodeGen/MachineFunction.h" |
889 |
++#include "llvm/Target/TargetFrameLowering.h" |
890 |
++ |
891 |
++namespace llvm { |
892 |
++ |
893 |
++/// \brief Information about the stack frame layout on the AMDGPU targets. |
894 |
++/// |
895 |
++/// It holds the direction of the stack growth, the known stack alignment on |
896 |
++/// entry to each function, and the offset to the locals area. |
897 |
++/// See TargetFrameInfo for more comments. |
898 |
++class AMDGPUFrameLowering : public TargetFrameLowering { |
899 |
++public: |
900 |
++ AMDGPUFrameLowering(StackDirection D, unsigned StackAl, int LAO, |
901 |
++ unsigned TransAl = 1); |
902 |
++ virtual ~AMDGPUFrameLowering(); |
903 |
++ |
904 |
++ /// \returns The number of 32-bit sub-registers that are used when storing |
905 |
++ /// values to the stack. |
906 |
++ virtual unsigned getStackWidth(const MachineFunction &MF) const; |
907 |
++ virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const; |
908 |
++ virtual const SpillSlot *getCalleeSavedSpillSlots(unsigned &NumEntries) const; |
909 |
++ virtual void emitPrologue(MachineFunction &MF) const; |
910 |
++ virtual void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const; |
911 |
++ virtual bool hasFP(const MachineFunction &MF) const; |
912 |
++}; |
913 |
++} // namespace llvm |
914 |
++#endif // AMDILFRAME_LOWERING_H |
915 |
diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp |
916 |
new file mode 100644 |
917 |
-index 0000000..473dac4 |
918 |
+index 0000000..d0d23d6 |
919 |
--- /dev/null |
920 |
+++ b/lib/Target/R600/AMDGPUISelLowering.cpp |
921 |
-@@ -0,0 +1,417 @@ |
922 |
+@@ -0,0 +1,418 @@ |
923 |
+//===-- AMDGPUISelLowering.cpp - AMDGPU Common DAG lowering functions -----===// |
924 |
+// |
925 |
+// The LLVM Compiler Infrastructure |
926 |
@@ -1361,17 +1022,18 @@ index 0000000..473dac4 |
927 |
+ NODE_NAME_CASE(SMIN) |
928 |
+ NODE_NAME_CASE(UMIN) |
929 |
+ NODE_NAME_CASE(URECIP) |
930 |
-+ NODE_NAME_CASE(INTERP) |
931 |
-+ NODE_NAME_CASE(INTERP_P0) |
932 |
+ NODE_NAME_CASE(EXPORT) |
933 |
++ NODE_NAME_CASE(CONST_ADDRESS) |
934 |
++ NODE_NAME_CASE(REGISTER_LOAD) |
935 |
++ NODE_NAME_CASE(REGISTER_STORE) |
936 |
+ } |
937 |
+} |
938 |
diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h |
939 |
new file mode 100644 |
940 |
-index 0000000..c7abaf6 |
941 |
+index 0000000..99a11ff |
942 |
--- /dev/null |
943 |
+++ b/lib/Target/R600/AMDGPUISelLowering.h |
944 |
-@@ -0,0 +1,144 @@ |
945 |
+@@ -0,0 +1,140 @@ |
946 |
+//===-- AMDGPUISelLowering.h - AMDGPU Lowering Interface --------*- C++ -*-===// |
947 |
+// |
948 |
+// The LLVM Compiler Infrastructure |
949 |
@@ -1427,6 +1089,11 @@ index 0000000..c7abaf6 |
950 |
+ const SmallVectorImpl<ISD::OutputArg> &Outs, |
951 |
+ const SmallVectorImpl<SDValue> &OutVals, |
952 |
+ DebugLoc DL, SelectionDAG &DAG) const; |
953 |
++ virtual SDValue LowerCall(CallLoweringInfo &CLI, |
954 |
++ SmallVectorImpl<SDValue> &InVals) const { |
955 |
++ CLI.Callee.dump(); |
956 |
++ llvm_unreachable("Undefined function"); |
957 |
++ } |
958 |
+ |
959 |
+ virtual SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const; |
960 |
+ SDValue LowerIntrinsicIABS(SDValue Op, SelectionDAG &DAG) const; |
961 |
@@ -1494,35 +1161,26 @@ index 0000000..c7abaf6 |
962 |
+ SMIN, |
963 |
+ UMIN, |
964 |
+ URECIP, |
965 |
-+ INTERP, |
966 |
-+ INTERP_P0, |
967 |
+ EXPORT, |
968 |
++ CONST_ADDRESS, |
969 |
++ REGISTER_LOAD, |
970 |
++ REGISTER_STORE, |
971 |
+ LAST_AMDGPU_ISD_NUMBER |
972 |
+}; |
973 |
+ |
974 |
+ |
975 |
+} // End namespace AMDGPUISD |
976 |
+ |
977 |
-+namespace SIISD { |
978 |
-+ |
979 |
-+enum { |
980 |
-+ SI_FIRST = AMDGPUISD::LAST_AMDGPU_ISD_NUMBER, |
981 |
-+ VCC_AND, |
982 |
-+ VCC_BITCAST |
983 |
-+}; |
984 |
-+ |
985 |
-+} // End namespace SIISD |
986 |
-+ |
987 |
+} // End namespace llvm |
988 |
+ |
989 |
+#endif // AMDGPUISELLOWERING_H |
990 |
-diff --git a/lib/Target/R600/AMDGPUInstrInfo.cpp b/lib/Target/R600/AMDGPUInstrInfo.cpp |
991 |
+diff --git a/lib/Target/R600/AMDGPUIndirectAddressing.cpp b/lib/Target/R600/AMDGPUIndirectAddressing.cpp |
992 |
new file mode 100644 |
993 |
-index 0000000..e42a46d |
994 |
+index 0000000..15840b3 |
995 |
--- /dev/null |
996 |
-+++ b/lib/Target/R600/AMDGPUInstrInfo.cpp |
997 |
-@@ -0,0 +1,257 @@ |
998 |
-+//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===// |
999 |
++++ b/lib/Target/R600/AMDGPUIndirectAddressing.cpp |
1000 |
+@@ -0,0 +1,344 @@ |
1001 |
++//===-- AMDGPUIndirectAddressing.cpp - Indirect Adressing Support ---------===// |
1002 |
+// |
1003 |
+// The LLVM Compiler Infrastructure |
1004 |
+// |
1005 |
@@ -1532,60 +1190,410 @@ index 0000000..e42a46d |
1006 |
+//===----------------------------------------------------------------------===// |
1007 |
+// |
1008 |
+/// \file |
1009 |
-+/// \brief Implementation of the TargetInstrInfo class that is common to all |
1010 |
-+/// AMD GPUs. |
1011 |
++/// |
1012 |
++/// Instructions can use indirect addressing to index the register file as if it |
1013 |
++/// were memory. This pass lowers RegisterLoad and RegisterStore instructions |
1014 |
++/// to either a COPY or a MOV that uses indirect addressing. |
1015 |
+// |
1016 |
+//===----------------------------------------------------------------------===// |
1017 |
+ |
1018 |
-+#include "AMDGPUInstrInfo.h" |
1019 |
-+#include "AMDGPURegisterInfo.h" |
1020 |
-+#include "AMDGPUTargetMachine.h" |
1021 |
-+#include "AMDIL.h" |
1022 |
-+#include "llvm/CodeGen/MachineFrameInfo.h" |
1023 |
++#include "AMDGPU.h" |
1024 |
++#include "R600InstrInfo.h" |
1025 |
++#include "R600MachineFunctionInfo.h" |
1026 |
++#include "llvm/CodeGen/MachineFunction.h" |
1027 |
++#include "llvm/CodeGen/MachineFunctionPass.h" |
1028 |
+#include "llvm/CodeGen/MachineInstrBuilder.h" |
1029 |
+#include "llvm/CodeGen/MachineRegisterInfo.h" |
1030 |
-+ |
1031 |
-+#define GET_INSTRINFO_CTOR |
1032 |
-+#include "AMDGPUGenInstrInfo.inc" |
1033 |
++#include "llvm/Support/Debug.h" |
1034 |
+ |
1035 |
+using namespace llvm; |
1036 |
+ |
1037 |
-+AMDGPUInstrInfo::AMDGPUInstrInfo(TargetMachine &tm) |
1038 |
-+ : AMDGPUGenInstrInfo(0,0), RI(tm, *this), TM(tm) { } |
1039 |
++namespace { |
1040 |
+ |
1041 |
-+const AMDGPURegisterInfo &AMDGPUInstrInfo::getRegisterInfo() const { |
1042 |
-+ return RI; |
1043 |
-+} |
1044 |
++class AMDGPUIndirectAddressingPass : public MachineFunctionPass { |
1045 |
+ |
1046 |
-+bool AMDGPUInstrInfo::isCoalescableExtInstr(const MachineInstr &MI, |
1047 |
-+ unsigned &SrcReg, unsigned &DstReg, |
1048 |
-+ unsigned &SubIdx) const { |
1049 |
-+// TODO: Implement this function |
1050 |
-+ return false; |
1051 |
-+} |
1052 |
++private: |
1053 |
++ static char ID; |
1054 |
++ const AMDGPUInstrInfo *TII; |
1055 |
+ |
1056 |
-+unsigned AMDGPUInstrInfo::isLoadFromStackSlot(const MachineInstr *MI, |
1057 |
-+ int &FrameIndex) const { |
1058 |
-+// TODO: Implement this function |
1059 |
-+ return 0; |
1060 |
-+} |
1061 |
++ bool regHasExplicitDef(MachineRegisterInfo &MRI, unsigned Reg) const; |
1062 |
+ |
1063 |
-+unsigned AMDGPUInstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI, |
1064 |
-+ int &FrameIndex) const { |
1065 |
-+// TODO: Implement this function |
1066 |
-+ return 0; |
1067 |
-+} |
1068 |
++public: |
1069 |
++ AMDGPUIndirectAddressingPass(TargetMachine &tm) : |
1070 |
++ MachineFunctionPass(ID), |
1071 |
++ TII(static_cast<const AMDGPUInstrInfo*>(tm.getInstrInfo())) |
1072 |
++ { } |
1073 |
+ |
1074 |
-+bool AMDGPUInstrInfo::hasLoadFromStackSlot(const MachineInstr *MI, |
1075 |
-+ const MachineMemOperand *&MMO, |
1076 |
-+ int &FrameIndex) const { |
1077 |
-+// TODO: Implement this function |
1078 |
-+ return false; |
1079 |
++ virtual bool runOnMachineFunction(MachineFunction &MF); |
1080 |
++ |
1081 |
++ const char *getPassName() const { return "R600 Handle indirect addressing"; } |
1082 |
++ |
1083 |
++}; |
1084 |
++ |
1085 |
++} // End anonymous namespace |
1086 |
++ |
1087 |
++char AMDGPUIndirectAddressingPass::ID = 0; |
1088 |
++ |
1089 |
++FunctionPass *llvm::createAMDGPUIndirectAddressingPass(TargetMachine &tm) { |
1090 |
++ return new AMDGPUIndirectAddressingPass(tm); |
1091 |
+} |
1092 |
-+unsigned AMDGPUInstrInfo::isStoreFromStackSlot(const MachineInstr *MI, |
1093 |
-+ int &FrameIndex) const { |
1094 |
-+// TODO: Implement this function |
1095 |
-+ return 0; |
1096 |
++ |
1097 |
++bool AMDGPUIndirectAddressingPass::runOnMachineFunction(MachineFunction &MF) { |
1098 |
++ MachineRegisterInfo &MRI = MF.getRegInfo(); |
1099 |
++ |
1100 |
++ int IndirectBegin = TII->getIndirectIndexBegin(MF); |
1101 |
++ int IndirectEnd = TII->getIndirectIndexEnd(MF); |
1102 |
++ |
1103 |
++ if (IndirectBegin == -1) { |
1104 |
++ // No indirect addressing, we can skip this pass |
1105 |
++ assert(IndirectEnd == -1); |
1106 |
++ return false; |
1107 |
++ } |
1108 |
++ |
1109 |
++ // The map keeps track of the indirect address that is represented by |
1110 |
++ // each virtual register. The key is the register and the value is the |
1111 |
++ // indirect address it uses. |
1112 |
++ std::map<unsigned, unsigned> RegisterAddressMap; |
1113 |
++ |
1114 |
++ // First pass - Lower all of the RegisterStore instructions and track which |
1115 |
++ // registers are live. |
1116 |
++ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end(); |
1117 |
++ BB != BB_E; ++BB) { |
1118 |
++ // This map keeps track of the current live indirect registers. |
1119 |
++ // The key is the address and the value is the register |
1120 |
++ std::map<unsigned, unsigned> LiveAddressRegisterMap; |
1121 |
++ MachineBasicBlock &MBB = *BB; |
1122 |
++ |
1123 |
++ for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I); |
1124 |
++ I != MBB.end(); I = Next) { |
1125 |
++ Next = llvm::next(I); |
1126 |
++ MachineInstr &MI = *I; |
1127 |
++ |
1128 |
++ if (!TII->isRegisterStore(MI)) { |
1129 |
++ continue; |
1130 |
++ } |
1131 |
++ |
1132 |
++ // Lower RegisterStore |
1133 |
++ |
1134 |
++ unsigned RegIndex = MI.getOperand(2).getImm(); |
1135 |
++ unsigned Channel = MI.getOperand(3).getImm(); |
1136 |
++ unsigned Address = TII->calculateIndirectAddress(RegIndex, Channel); |
1137 |
++ const TargetRegisterClass *IndirectStoreRegClass = |
1138 |
++ TII->getIndirectAddrStoreRegClass(MI.getOperand(0).getReg()); |
1139 |
++ |
1140 |
++ if (MI.getOperand(1).getReg() == AMDGPU::INDIRECT_BASE_ADDR) { |
1141 |
++ // Direct register access. |
1142 |
++ unsigned DstReg = MRI.createVirtualRegister(IndirectStoreRegClass); |
1143 |
++ |
1144 |
++ BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY), DstReg) |
1145 |
++ .addOperand(MI.getOperand(0)); |
1146 |
++ |
1147 |
++ RegisterAddressMap[DstReg] = Address; |
1148 |
++ LiveAddressRegisterMap[Address] = DstReg; |
1149 |
++ } else { |
1150 |
++ // Indirect register access. |
1151 |
++ MachineInstrBuilder MOV = TII->buildIndirectWrite(BB, I, |
1152 |
++ MI.getOperand(0).getReg(), // Value |
1153 |
++ Address, |
1154 |
++ MI.getOperand(1).getReg()); // Offset |
1155 |
++ for (int i = IndirectBegin; i <= IndirectEnd; ++i) { |
1156 |
++ unsigned Addr = TII->calculateIndirectAddress(i, Channel); |
1157 |
++ unsigned DstReg = MRI.createVirtualRegister(IndirectStoreRegClass); |
1158 |
++ MOV.addReg(DstReg, RegState::Define | RegState::Implicit); |
1159 |
++ RegisterAddressMap[DstReg] = Addr; |
1160 |
++ LiveAddressRegisterMap[Addr] = DstReg; |
1161 |
++ } |
1162 |
++ } |
1163 |
++ MI.eraseFromParent(); |
1164 |
++ } |
1165 |
++ |
1166 |
++ // Update the live-ins of the succesor blocks |
1167 |
++ for (MachineBasicBlock::succ_iterator Succ = MBB.succ_begin(), |
1168 |
++ SuccEnd = MBB.succ_end(); |
1169 |
++ SuccEnd != Succ; ++Succ) { |
1170 |
++ std::map<unsigned, unsigned>::const_iterator Key, KeyEnd; |
1171 |
++ for (Key = LiveAddressRegisterMap.begin(), |
1172 |
++ KeyEnd = LiveAddressRegisterMap.end(); KeyEnd != Key; ++Key) { |
1173 |
++ (*Succ)->addLiveIn(Key->second); |
1174 |
++ } |
1175 |
++ } |
1176 |
++ } |
1177 |
++ |
1178 |
++ // Second pass - Lower the RegisterLoad instructions |
1179 |
++ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end(); |
1180 |
++ BB != BB_E; ++BB) { |
1181 |
++ // Key is the address and the value is the register |
1182 |
++ std::map<unsigned, unsigned> LiveAddressRegisterMap; |
1183 |
++ MachineBasicBlock &MBB = *BB; |
1184 |
++ |
1185 |
++ MachineBasicBlock::livein_iterator LI = MBB.livein_begin(); |
1186 |
++ while (LI != MBB.livein_end()) { |
1187 |
++ std::vector<unsigned> PhiRegisters; |
1188 |
++ |
1189 |
++ // Make sure this live in is used for indirect addressing |
1190 |
++ if (RegisterAddressMap.find(*LI) == RegisterAddressMap.end()) { |
1191 |
++ ++LI; |
1192 |
++ continue; |
1193 |
++ } |
1194 |
++ |
1195 |
++ unsigned Address = RegisterAddressMap[*LI]; |
1196 |
++ LiveAddressRegisterMap[Address] = *LI; |
1197 |
++ PhiRegisters.push_back(*LI); |
1198 |
++ |
1199 |
++ // Check if there are other live in registers which map to the same |
1200 |
++ // indirect address. |
1201 |
++ for (MachineBasicBlock::livein_iterator LJ = llvm::next(LI), |
1202 |
++ LE = MBB.livein_end(); |
1203 |
++ LJ != LE; ++LJ) { |
1204 |
++ unsigned Reg = *LJ; |
1205 |
++ if (RegisterAddressMap.find(Reg) == RegisterAddressMap.end()) { |
1206 |
++ continue; |
1207 |
++ } |
1208 |
++ |
1209 |
++ if (RegisterAddressMap[Reg] == Address) { |
1210 |
++ PhiRegisters.push_back(Reg); |
1211 |
++ } |
1212 |
++ } |
1213 |
++ |
1214 |
++ if (PhiRegisters.size() == 1) { |
1215 |
++ // We don't need to insert a Phi instruction, so we can just add the |
1216 |
++ // registers to the live list for the block. |
1217 |
++ LiveAddressRegisterMap[Address] = *LI; |
1218 |
++ MBB.removeLiveIn(*LI); |
1219 |
++ } else { |
1220 |
++ // We need to insert a PHI, because we have the same address being |
1221 |
++ // written in multiple predecessor blocks. |
1222 |
++ const TargetRegisterClass *PhiDstClass = |
1223 |
++ TII->getIndirectAddrStoreRegClass(*(PhiRegisters.begin())); |
1224 |
++ unsigned PhiDstReg = MRI.createVirtualRegister(PhiDstClass); |
1225 |
++ MachineInstrBuilder Phi = BuildMI(MBB, MBB.begin(), |
1226 |
++ MBB.findDebugLoc(MBB.begin()), |
1227 |
++ TII->get(AMDGPU::PHI), PhiDstReg); |
1228 |
++ |
1229 |
++ for (std::vector<unsigned>::const_iterator RI = PhiRegisters.begin(), |
1230 |
++ RE = PhiRegisters.end(); |
1231 |
++ RI != RE; ++RI) { |
1232 |
++ unsigned Reg = *RI; |
1233 |
++ MachineInstr *DefInst = MRI.getVRegDef(Reg); |
1234 |
++ assert(DefInst); |
1235 |
++ MachineBasicBlock *RegBlock = DefInst->getParent(); |
1236 |
++ Phi.addReg(Reg); |
1237 |
++ Phi.addMBB(RegBlock); |
1238 |
++ MBB.removeLiveIn(Reg); |
1239 |
++ } |
1240 |
++ RegisterAddressMap[PhiDstReg] = Address; |
1241 |
++ LiveAddressRegisterMap[Address] = PhiDstReg; |
1242 |
++ } |
1243 |
++ LI = MBB.livein_begin(); |
1244 |
++ } |
1245 |
++ |
1246 |
++ for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I); |
1247 |
++ I != MBB.end(); I = Next) { |
1248 |
++ Next = llvm::next(I); |
1249 |
++ MachineInstr &MI = *I; |
1250 |
++ |
1251 |
++ if (!TII->isRegisterLoad(MI)) { |
1252 |
++ if (MI.getOpcode() == AMDGPU::PHI) { |
1253 |
++ continue; |
1254 |
++ } |
1255 |
++ // Check for indirect register defs |
1256 |
++ for (unsigned OpIdx = 0, NumOperands = MI.getNumOperands(); |
1257 |
++ OpIdx < NumOperands; ++OpIdx) { |
1258 |
++ MachineOperand &MO = MI.getOperand(OpIdx); |
1259 |
++ if (MO.isReg() && MO.isDef() && |
1260 |
++ RegisterAddressMap.find(MO.getReg()) != RegisterAddressMap.end()) { |
1261 |
++ unsigned Reg = MO.getReg(); |
1262 |
++ unsigned LiveAddress = RegisterAddressMap[Reg]; |
1263 |
++ // Chain the live-ins |
1264 |
++ if (LiveAddressRegisterMap.find(LiveAddress) != |
1265 |
++ RegisterAddressMap.end()) { |
1266 |
++ MI.addOperand(MachineOperand::CreateReg( |
1267 |
++ LiveAddressRegisterMap[LiveAddress], |
1268 |
++ false, // isDef |
1269 |
++ true, // isImp |
1270 |
++ true)); // isKill |
1271 |
++ } |
1272 |
++ LiveAddressRegisterMap[LiveAddress] = Reg; |
1273 |
++ } |
1274 |
++ } |
1275 |
++ continue; |
1276 |
++ } |
1277 |
++ |
1278 |
++ const TargetRegisterClass *SuperIndirectRegClass = |
1279 |
++ TII->getSuperIndirectRegClass(); |
1280 |
++ const TargetRegisterClass *IndirectLoadRegClass = |
1281 |
++ TII->getIndirectAddrLoadRegClass(); |
1282 |
++ unsigned IndirectReg = MRI.createVirtualRegister(SuperIndirectRegClass); |
1283 |
++ |
1284 |
++ unsigned RegIndex = MI.getOperand(2).getImm(); |
1285 |
++ unsigned Channel = MI.getOperand(3).getImm(); |
1286 |
++ unsigned Address = TII->calculateIndirectAddress(RegIndex, Channel); |
1287 |
++ |
1288 |
++ if (MI.getOperand(1).getReg() == AMDGPU::INDIRECT_BASE_ADDR) { |
1289 |
++ // Direct register access |
1290 |
++ unsigned Reg = LiveAddressRegisterMap[Address]; |
1291 |
++ unsigned AddrReg = IndirectLoadRegClass->getRegister(Address); |
1292 |
++ |
1293 |
++ if (regHasExplicitDef(MRI, Reg)) { |
1294 |
++ // If the register we are reading from has an explicit def, then that |
1295 |
++ // means it was written via a direct register access (i.e. COPY |
1296 |
++ // or other instruction that doesn't use indirect addressing). In |
1297 |
++ // this case we know where the value has been stored, so we can just |
1298 |
++ // issue a copy. |
1299 |
++ BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY), |
1300 |
++ MI.getOperand(0).getReg()) |
1301 |
++ .addReg(Reg); |
1302 |
++ } else { |
1303 |
++ // If the register we are reading has an implicit def, then that |
1304 |
++ // means it was written by an indirect register access (i.e. An |
1305 |
++ // instruction that uses indirect addressing. |
1306 |
++ BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY), |
1307 |
++ MI.getOperand(0).getReg()) |
1308 |
++ .addReg(AddrReg) |
1309 |
++ .addReg(Reg, RegState::Implicit); |
1310 |
++ } |
1311 |
++ } else { |
1312 |
++ // Indirect register access |
1313 |
++ |
1314 |
++ // Note on REQ_SEQUENCE instructons: You can't actually use the register |
1315 |
++ // it defines unless you have an instruction that takes the defined |
1316 |
++ // register class as an operand. |
1317 |
++ |
1318 |
++ MachineInstrBuilder Sequence = BuildMI(MBB, I, MBB.findDebugLoc(I), |
1319 |
++ TII->get(AMDGPU::REG_SEQUENCE), |
1320 |
++ IndirectReg); |
1321 |
++ for (int i = IndirectBegin; i <= IndirectEnd; ++i) { |
1322 |
++ unsigned Addr = TII->calculateIndirectAddress(i, Channel); |
1323 |
++ if (LiveAddressRegisterMap.find(Addr) == LiveAddressRegisterMap.end()) { |
1324 |
++ continue; |
1325 |
++ } |
1326 |
++ unsigned Reg = LiveAddressRegisterMap[Addr]; |
1327 |
++ |
1328 |
++ // We only need to use REG_SEQUENCE for explicit defs, since the |
1329 |
++ // register coalescer won't do anything with the implicit defs. |
1330 |
++ MachineInstr *DefInstr = MRI.getVRegDef(Reg); |
1331 |
++ if (!regHasExplicitDef(MRI, Reg)) { |
1332 |
++ continue; |
1333 |
++ } |
1334 |
++ |
1335 |
++ // Insert a REQ_SEQUENCE instruction to force the register allocator |
1336 |
++ // to allocate the virtual register to the correct physical register. |
1337 |
++ Sequence.addReg(LiveAddressRegisterMap[Addr]); |
1338 |
++ Sequence.addImm(TII->getRegisterInfo().getIndirectSubReg(Addr)); |
1339 |
++ } |
1340 |
++ MachineInstrBuilder Mov = TII->buildIndirectRead(BB, I, |
1341 |
++ MI.getOperand(0).getReg(), // Value |
1342 |
++ Address, |
1343 |
++ MI.getOperand(1).getReg()); // Offset |
1344 |
++ |
1345 |
++ |
1346 |
++ |
1347 |
++ Mov.addReg(IndirectReg, RegState::Implicit | RegState::Kill); |
1348 |
++ Mov.addReg(LiveAddressRegisterMap[Address], RegState::Implicit); |
1349 |
++ |
1350 |
++ } |
1351 |
++ MI.eraseFromParent(); |
1352 |
++ } |
1353 |
++ } |
1354 |
++ return false; |
1355 |
++} |
1356 |
++ |
1357 |
++bool AMDGPUIndirectAddressingPass::regHasExplicitDef(MachineRegisterInfo &MRI, |
1358 |
++ unsigned Reg) const { |
1359 |
++ MachineInstr *DefInstr = MRI.getVRegDef(Reg); |
1360 |
++ |
1361 |
++ if (!DefInstr) { |
1362 |
++ return false; |
1363 |
++ } |
1364 |
++ |
1365 |
++ if (DefInstr->getOpcode() == AMDGPU::PHI) { |
1366 |
++ bool Explicit = false; |
1367 |
++ for (MachineInstr::const_mop_iterator I = DefInstr->operands_begin(), |
1368 |
++ E = DefInstr->operands_end(); |
1369 |
++ I != E; ++I) { |
1370 |
++ const MachineOperand &MO = *I; |
1371 |
++ if (!MO.isReg() || MO.isDef()) { |
1372 |
++ continue; |
1373 |
++ } |
1374 |
++ |
1375 |
++ Explicit = Explicit || regHasExplicitDef(MRI, MO.getReg()); |
1376 |
++ } |
1377 |
++ return Explicit; |
1378 |
++ } |
1379 |
++ |
1380 |
++ return DefInstr->getOperand(0).isReg() && |
1381 |
++ DefInstr->getOperand(0).getReg() == Reg; |
1382 |
++} |
1383 |
+diff --git a/lib/Target/R600/AMDGPUInstrInfo.cpp b/lib/Target/R600/AMDGPUInstrInfo.cpp |
1384 |
+new file mode 100644 |
1385 |
+index 0000000..640707d |
1386 |
+--- /dev/null |
1387 |
++++ b/lib/Target/R600/AMDGPUInstrInfo.cpp |
1388 |
+@@ -0,0 +1,266 @@ |
1389 |
++//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===// |
1390 |
++// |
1391 |
++// The LLVM Compiler Infrastructure |
1392 |
++// |
1393 |
++// This file is distributed under the University of Illinois Open Source |
1394 |
++// License. See LICENSE.TXT for details. |
1395 |
++// |
1396 |
++//===----------------------------------------------------------------------===// |
1397 |
++// |
1398 |
++/// \file |
1399 |
++/// \brief Implementation of the TargetInstrInfo class that is common to all |
1400 |
++/// AMD GPUs. |
1401 |
++// |
1402 |
++//===----------------------------------------------------------------------===// |
1403 |
++ |
1404 |
++#include "AMDGPUInstrInfo.h" |
1405 |
++#include "AMDGPURegisterInfo.h" |
1406 |
++#include "AMDGPUTargetMachine.h" |
1407 |
++#include "AMDIL.h" |
1408 |
++#include "llvm/CodeGen/MachineFrameInfo.h" |
1409 |
++#include "llvm/CodeGen/MachineInstrBuilder.h" |
1410 |
++#include "llvm/CodeGen/MachineRegisterInfo.h" |
1411 |
++ |
1412 |
++#define GET_INSTRINFO_CTOR |
1413 |
++#include "AMDGPUGenInstrInfo.inc" |
1414 |
++ |
1415 |
++using namespace llvm; |
1416 |
++ |
1417 |
++AMDGPUInstrInfo::AMDGPUInstrInfo(TargetMachine &tm) |
1418 |
++ : AMDGPUGenInstrInfo(0,0), RI(tm, *this), TM(tm) { } |
1419 |
++ |
1420 |
++const AMDGPURegisterInfo &AMDGPUInstrInfo::getRegisterInfo() const { |
1421 |
++ return RI; |
1422 |
++} |
1423 |
++ |
1424 |
++bool AMDGPUInstrInfo::isCoalescableExtInstr(const MachineInstr &MI, |
1425 |
++ unsigned &SrcReg, unsigned &DstReg, |
1426 |
++ unsigned &SubIdx) const { |
1427 |
++// TODO: Implement this function |
1428 |
++ return false; |
1429 |
++} |
1430 |
++ |
1431 |
++unsigned AMDGPUInstrInfo::isLoadFromStackSlot(const MachineInstr *MI, |
1432 |
++ int &FrameIndex) const { |
1433 |
++// TODO: Implement this function |
1434 |
++ return 0; |
1435 |
++} |
1436 |
++ |
1437 |
++unsigned AMDGPUInstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI, |
1438 |
++ int &FrameIndex) const { |
1439 |
++// TODO: Implement this function |
1440 |
++ return 0; |
1441 |
++} |
1442 |
++ |
1443 |
++bool AMDGPUInstrInfo::hasLoadFromStackSlot(const MachineInstr *MI, |
1444 |
++ const MachineMemOperand *&MMO, |
1445 |
++ int &FrameIndex) const { |
1446 |
++// TODO: Implement this function |
1447 |
++ return false; |
1448 |
++} |
1449 |
++unsigned AMDGPUInstrInfo::isStoreFromStackSlot(const MachineInstr *MI, |
1450 |
++ int &FrameIndex) const { |
1451 |
++// TODO: Implement this function |
1452 |
++ return 0; |
1453 |
+} |
1454 |
+unsigned AMDGPUInstrInfo::isStoreFromStackSlotPostFE(const MachineInstr *MI, |
1455 |
+ int &FrameIndex) const { |
1456 |
@@ -1758,7 +1766,16 @@ index 0000000..e42a46d |
1457 |
+ // TODO: Implement this function |
1458 |
+ return true; |
1459 |
+} |
1460 |
-+ |
1461 |
++ |
1462 |
++bool AMDGPUInstrInfo::isRegisterStore(const MachineInstr &MI) const { |
1463 |
++ return get(MI.getOpcode()).TSFlags & AMDGPU_FLAG_REGISTER_STORE; |
1464 |
++} |
1465 |
++ |
1466 |
++bool AMDGPUInstrInfo::isRegisterLoad(const MachineInstr &MI) const { |
1467 |
++ return get(MI.getOpcode()).TSFlags & AMDGPU_FLAG_REGISTER_LOAD; |
1468 |
++} |
1469 |
++ |
1470 |
++ |
1471 |
+void AMDGPUInstrInfo::convertToISA(MachineInstr & MI, MachineFunction &MF, |
1472 |
+ DebugLoc DL) const { |
1473 |
+ MachineRegisterInfo &MRI = MF.getRegInfo(); |
1474 |
@@ -1781,10 +1798,10 @@ index 0000000..e42a46d |
1475 |
+} |
1476 |
diff --git a/lib/Target/R600/AMDGPUInstrInfo.h b/lib/Target/R600/AMDGPUInstrInfo.h |
1477 |
new file mode 100644 |
1478 |
-index 0000000..32ac691 |
1479 |
+index 0000000..5220aa0 |
1480 |
--- /dev/null |
1481 |
+++ b/lib/Target/R600/AMDGPUInstrInfo.h |
1482 |
-@@ -0,0 +1,149 @@ |
1483 |
+@@ -0,0 +1,207 @@ |
1484 |
+//===-- AMDGPUInstrInfo.h - AMDGPU Instruction Information ------*- C++ -*-===// |
1485 |
+// |
1486 |
+// The LLVM Compiler Infrastructure |
1487 |
@@ -1828,9 +1845,10 @@ index 0000000..32ac691 |
1488 |
+class AMDGPUInstrInfo : public AMDGPUGenInstrInfo { |
1489 |
+private: |
1490 |
+ const AMDGPURegisterInfo RI; |
1491 |
-+ TargetMachine &TM; |
1492 |
+ bool getNextBranchInstr(MachineBasicBlock::iterator &iter, |
1493 |
+ MachineBasicBlock &MBB) const; |
1494 |
++protected: |
1495 |
++ TargetMachine &TM; |
1496 |
+public: |
1497 |
+ explicit AMDGPUInstrInfo(TargetMachine &tm); |
1498 |
+ |
1499 |
@@ -1918,12 +1936,66 @@ index 0000000..32ac691 |
1500 |
+ bool isAExtLoadInst(llvm::MachineInstr *MI) const; |
1501 |
+ bool isStoreInst(llvm::MachineInstr *MI) const; |
1502 |
+ bool isTruncStoreInst(llvm::MachineInstr *MI) const; |
1503 |
++ bool isRegisterStore(const MachineInstr &MI) const; |
1504 |
++ bool isRegisterLoad(const MachineInstr &MI) const; |
1505 |
++ |
1506 |
++//===---------------------------------------------------------------------===// |
1507 |
++// Pure virtual funtions to be implemented by sub-classes. |
1508 |
++//===---------------------------------------------------------------------===// |
1509 |
+ |
1510 |
+ virtual MachineInstr* getMovImmInstr(MachineFunction *MF, unsigned DstReg, |
1511 |
+ int64_t Imm) const = 0; |
1512 |
+ virtual unsigned getIEQOpcode() const = 0; |
1513 |
+ virtual bool isMov(unsigned opcode) const = 0; |
1514 |
+ |
1515 |
++ /// \returns the smallest register index that will be accessed by an indirect |
1516 |
++ /// read or write or -1 if indirect addressing is not used by this program. |
1517 |
++ virtual int getIndirectIndexBegin(const MachineFunction &MF) const = 0; |
1518 |
++ |
1519 |
++ /// \returns the largest register index that will be accessed by an indirect |
1520 |
++ /// read or write or -1 if indirect addressing is not used by this program. |
1521 |
++ virtual int getIndirectIndexEnd(const MachineFunction &MF) const = 0; |
1522 |
++ |
1523 |
++ /// \brief Calculate the "Indirect Address" for the given \p RegIndex and |
1524 |
++ /// \p Channel |
1525 |
++ /// |
1526 |
++ /// We model indirect addressing using a virtual address space that can be |
1527 |
++ /// accesed with loads and stores. The "Indirect Address" is the memory |
1528 |
++ /// address in this virtual address space that maps to the given \p RegIndex |
1529 |
++ /// and \p Channel. |
1530 |
++ virtual unsigned calculateIndirectAddress(unsigned RegIndex, |
1531 |
++ unsigned Channel) const = 0; |
1532 |
++ |
1533 |
++ /// \returns The register class to be used for storing values to an |
1534 |
++ /// "Indirect Address" . |
1535 |
++ virtual const TargetRegisterClass *getIndirectAddrStoreRegClass( |
1536 |
++ unsigned SourceReg) const = 0; |
1537 |
++ |
1538 |
++ /// \returns The register class to be used for loading values from |
1539 |
++ /// an "Indirect Address" . |
1540 |
++ virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const = 0; |
1541 |
++ |
1542 |
++ /// \brief Build instruction(s) for an indirect register write. |
1543 |
++ /// |
1544 |
++ /// \returns The instruction that performs the indirect register write |
1545 |
++ virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB, |
1546 |
++ MachineBasicBlock::iterator I, |
1547 |
++ unsigned ValueReg, unsigned Address, |
1548 |
++ unsigned OffsetReg) const = 0; |
1549 |
++ |
1550 |
++ /// \brief Build instruction(s) for an indirect register read. |
1551 |
++ /// |
1552 |
++ /// \returns The instruction that performs the indirect register read |
1553 |
++ virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB, |
1554 |
++ MachineBasicBlock::iterator I, |
1555 |
++ unsigned ValueReg, unsigned Address, |
1556 |
++ unsigned OffsetReg) const = 0; |
1557 |
++ |
1558 |
++ /// \returns the register class whose sub registers are the set of all |
1559 |
++ /// possible registers that can be used for indirect addressing. |
1560 |
++ virtual const TargetRegisterClass *getSuperIndirectRegClass() const = 0; |
1561 |
++ |
1562 |
++ |
1563 |
+ /// \brief Convert the AMDIL MachineInstr to a supported ISA |
1564 |
+ /// MachineInstr |
1565 |
+ virtual void convertToISA(MachineInstr & MI, MachineFunction &MF, |
1566 |
@@ -1933,13 +2005,16 @@ index 0000000..32ac691 |
1567 |
+ |
1568 |
+} // End llvm namespace |
1569 |
+ |
1570 |
++#define AMDGPU_FLAG_REGISTER_LOAD (UINT64_C(1) << 63) |
1571 |
++#define AMDGPU_FLAG_REGISTER_STORE (UINT64_C(1) << 62) |
1572 |
++ |
1573 |
+#endif // AMDGPUINSTRINFO_H |
1574 |
diff --git a/lib/Target/R600/AMDGPUInstrInfo.td b/lib/Target/R600/AMDGPUInstrInfo.td |
1575 |
new file mode 100644 |
1576 |
-index 0000000..96368e8 |
1577 |
+index 0000000..b66ae87 |
1578 |
--- /dev/null |
1579 |
+++ b/lib/Target/R600/AMDGPUInstrInfo.td |
1580 |
-@@ -0,0 +1,74 @@ |
1581 |
+@@ -0,0 +1,82 @@ |
1582 |
+//===-- AMDGPUInstrInfo.td - AMDGPU DAG nodes --------------*- tablegen -*-===// |
1583 |
+// |
1584 |
+// The LLVM Compiler Infrastructure |
1585 |
@@ -2014,12 +2089,20 @@ index 0000000..96368e8 |
1586 |
+def AMDGPUurecip : SDNode<"AMDGPUISD::URECIP", SDTIntUnaryOp>; |
1587 |
+ |
1588 |
+def fpow : SDNode<"ISD::FPOW", SDTFPBinOp>; |
1589 |
++ |
1590 |
++def AMDGPUregister_load : SDNode<"AMDGPUISD::REGISTER_LOAD", |
1591 |
++ SDTypeProfile<1, 2, [SDTCisPtrTy<1>, SDTCisInt<2>]>, |
1592 |
++ [SDNPHasChain, SDNPMayLoad]>; |
1593 |
++ |
1594 |
++def AMDGPUregister_store : SDNode<"AMDGPUISD::REGISTER_STORE", |
1595 |
++ SDTypeProfile<0, 3, [SDTCisPtrTy<1>, SDTCisInt<2>]>, |
1596 |
++ [SDNPHasChain, SDNPMayStore]>; |
1597 |
diff --git a/lib/Target/R600/AMDGPUInstructions.td b/lib/Target/R600/AMDGPUInstructions.td |
1598 |
new file mode 100644 |
1599 |
-index 0000000..e634d20 |
1600 |
+index 0000000..0559a5a |
1601 |
--- /dev/null |
1602 |
+++ b/lib/Target/R600/AMDGPUInstructions.td |
1603 |
-@@ -0,0 +1,190 @@ |
1604 |
+@@ -0,0 +1,268 @@ |
1605 |
+//===-- AMDGPUInstructions.td - Common instruction defs ---*- tablegen -*-===// |
1606 |
+// |
1607 |
+// The LLVM Compiler Infrastructure |
1608 |
@@ -2035,8 +2118,8 @@ index 0000000..e634d20 |
1609 |
+//===----------------------------------------------------------------------===// |
1610 |
+ |
1611 |
+class AMDGPUInst <dag outs, dag ins, string asm, list<dag> pattern> : Instruction { |
1612 |
-+ field bits<16> AMDILOp = 0; |
1613 |
-+ field bits<3> Gen = 0; |
1614 |
++ field bit isRegisterLoad = 0; |
1615 |
++ field bit isRegisterStore = 0; |
1616 |
+ |
1617 |
+ let Namespace = "AMDGPU"; |
1618 |
+ let OutOperandList = outs; |
1619 |
@@ -2044,8 +2127,9 @@ index 0000000..e634d20 |
1620 |
+ let AsmString = asm; |
1621 |
+ let Pattern = pattern; |
1622 |
+ let Itinerary = NullALU; |
1623 |
-+ let TSFlags{42-40} = Gen; |
1624 |
-+ let TSFlags{63-48} = AMDILOp; |
1625 |
++ |
1626 |
++ let TSFlags{63} = isRegisterLoad; |
1627 |
++ let TSFlags{62} = isRegisterStore; |
1628 |
+} |
1629 |
+ |
1630 |
+class AMDGPUShaderInst <dag outs, dag ins, string asm, list<dag> pattern> |
1631 |
@@ -2123,7 +2207,9 @@ index 0000000..e634d20 |
1632 |
+ [{return N->isExactlyValue(1.0);}] |
1633 |
+>; |
1634 |
+ |
1635 |
-+let isCodeGenOnly = 1, isPseudo = 1, usesCustomInserter = 1 in { |
1636 |
++let isCodeGenOnly = 1, isPseudo = 1 in { |
1637 |
++ |
1638 |
++let usesCustomInserter = 1 in { |
1639 |
+ |
1640 |
+class CLAMP <RegisterClass rc> : AMDGPUShaderInst < |
1641 |
+ (outs rc:$dst), |
1642 |
@@ -2153,7 +2239,31 @@ index 0000000..e634d20 |
1643 |
+ [(int_AMDGPU_shader_type imm:$type)] |
1644 |
+>; |
1645 |
+ |
1646 |
-+} // End isCodeGenOnly = 1, isPseudo = 1, hasCustomInserter = 1 |
1647 |
++} // usesCustomInserter = 1 |
1648 |
++ |
1649 |
++multiclass RegisterLoadStore <RegisterClass dstClass, Operand addrClass, |
1650 |
++ ComplexPattern addrPat> { |
1651 |
++ def RegisterLoad : AMDGPUShaderInst < |
1652 |
++ (outs dstClass:$dst), |
1653 |
++ (ins addrClass:$addr, i32imm:$chan), |
1654 |
++ "RegisterLoad $dst, $addr", |
1655 |
++ [(set (i32 dstClass:$dst), (AMDGPUregister_load addrPat:$addr, |
1656 |
++ (i32 timm:$chan)))] |
1657 |
++ > { |
1658 |
++ let isRegisterLoad = 1; |
1659 |
++ } |
1660 |
++ |
1661 |
++ def RegisterStore : AMDGPUShaderInst < |
1662 |
++ (outs), |
1663 |
++ (ins dstClass:$val, addrClass:$addr, i32imm:$chan), |
1664 |
++ "RegisterStore $val, $addr", |
1665 |
++ [(AMDGPUregister_store (i32 dstClass:$val), addrPat:$addr, (i32 timm:$chan))] |
1666 |
++ > { |
1667 |
++ let isRegisterStore = 1; |
1668 |
++ } |
1669 |
++} |
1670 |
++ |
1671 |
++} // End isCodeGenOnly = 1, isPseudo = 1 |
1672 |
+ |
1673 |
+/* Generic helper patterns for intrinsics */ |
1674 |
+/* -------------------------------------- */ |
1675 |
@@ -2186,13 +2296,64 @@ index 0000000..e634d20 |
1676 |
+>; |
1677 |
+ |
1678 |
+// Vector Build pattern |
1679 |
++class Vector1_Build <ValueType vecType, RegisterClass vectorClass, |
1680 |
++ ValueType elemType, RegisterClass elemClass> : Pat < |
1681 |
++ (vecType (build_vector (elemType elemClass:$src))), |
1682 |
++ (vecType elemClass:$src) |
1683 |
++>; |
1684 |
++ |
1685 |
++class Vector2_Build <ValueType vecType, RegisterClass vectorClass, |
1686 |
++ ValueType elemType, RegisterClass elemClass> : Pat < |
1687 |
++ (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1))), |
1688 |
++ (INSERT_SUBREG (INSERT_SUBREG |
1689 |
++ (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1) |
1690 |
++>; |
1691 |
++ |
1692 |
+class Vector_Build <ValueType vecType, RegisterClass vectorClass, |
1693 |
+ ValueType elemType, RegisterClass elemClass> : Pat < |
1694 |
+ (vecType (build_vector (elemType elemClass:$x), (elemType elemClass:$y), |
1695 |
+ (elemType elemClass:$z), (elemType elemClass:$w))), |
1696 |
+ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG |
1697 |
-+ (vecType (IMPLICIT_DEF)), elemClass:$x, sel_x), elemClass:$y, sel_y), |
1698 |
-+ elemClass:$z, sel_z), elemClass:$w, sel_w) |
1699 |
++ (vecType (IMPLICIT_DEF)), elemClass:$x, sub0), elemClass:$y, sub1), |
1700 |
++ elemClass:$z, sub2), elemClass:$w, sub3) |
1701 |
++>; |
1702 |
++ |
1703 |
++class Vector8_Build <ValueType vecType, RegisterClass vectorClass, |
1704 |
++ ValueType elemType, RegisterClass elemClass> : Pat < |
1705 |
++ (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1), |
1706 |
++ (elemType elemClass:$sub2), (elemType elemClass:$sub3), |
1707 |
++ (elemType elemClass:$sub4), (elemType elemClass:$sub5), |
1708 |
++ (elemType elemClass:$sub6), (elemType elemClass:$sub7))), |
1709 |
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG |
1710 |
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG |
1711 |
++ (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1), |
1712 |
++ elemClass:$sub2, sub2), elemClass:$sub3, sub3), |
1713 |
++ elemClass:$sub4, sub4), elemClass:$sub5, sub5), |
1714 |
++ elemClass:$sub6, sub6), elemClass:$sub7, sub7) |
1715 |
++>; |
1716 |
++ |
1717 |
++class Vector16_Build <ValueType vecType, RegisterClass vectorClass, |
1718 |
++ ValueType elemType, RegisterClass elemClass> : Pat < |
1719 |
++ (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1), |
1720 |
++ (elemType elemClass:$sub2), (elemType elemClass:$sub3), |
1721 |
++ (elemType elemClass:$sub4), (elemType elemClass:$sub5), |
1722 |
++ (elemType elemClass:$sub6), (elemType elemClass:$sub7), |
1723 |
++ (elemType elemClass:$sub8), (elemType elemClass:$sub9), |
1724 |
++ (elemType elemClass:$sub10), (elemType elemClass:$sub11), |
1725 |
++ (elemType elemClass:$sub12), (elemType elemClass:$sub13), |
1726 |
++ (elemType elemClass:$sub14), (elemType elemClass:$sub15))), |
1727 |
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG |
1728 |
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG |
1729 |
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG |
1730 |
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG |
1731 |
++ (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1), |
1732 |
++ elemClass:$sub2, sub2), elemClass:$sub3, sub3), |
1733 |
++ elemClass:$sub4, sub4), elemClass:$sub5, sub5), |
1734 |
++ elemClass:$sub6, sub6), elemClass:$sub7, sub7), |
1735 |
++ elemClass:$sub8, sub8), elemClass:$sub9, sub9), |
1736 |
++ elemClass:$sub10, sub10), elemClass:$sub11, sub11), |
1737 |
++ elemClass:$sub12, sub12), elemClass:$sub13, sub13), |
1738 |
++ elemClass:$sub14, sub14), elemClass:$sub15, sub15) |
1739 |
+>; |
1740 |
+ |
1741 |
+// bitconvert pattern |
1742 |
@@ -2409,10 +2570,10 @@ index 0000000..d7d538e |
1743 |
+#endif //AMDGPU_MCINSTLOWER_H |
1744 |
diff --git a/lib/Target/R600/AMDGPURegisterInfo.cpp b/lib/Target/R600/AMDGPURegisterInfo.cpp |
1745 |
new file mode 100644 |
1746 |
-index 0000000..eeafec8 |
1747 |
+index 0000000..d62e57b |
1748 |
--- /dev/null |
1749 |
+++ b/lib/Target/R600/AMDGPURegisterInfo.cpp |
1750 |
-@@ -0,0 +1,51 @@ |
1751 |
+@@ -0,0 +1,74 @@ |
1752 |
+//===-- AMDGPURegisterInfo.cpp - AMDGPU Register Information -------------===// |
1753 |
+// |
1754 |
+// The LLVM Compiler Infrastructure |
1755 |
@@ -2462,14 +2623,37 @@ index 0000000..eeafec8 |
1756 |
+ return 0; |
1757 |
+} |
1758 |
+ |
1759 |
++unsigned AMDGPURegisterInfo::getIndirectSubReg(unsigned IndirectIndex) const { |
1760 |
++ |
1761 |
++ switch(IndirectIndex) { |
1762 |
++ case 0: return AMDGPU::sub0; |
1763 |
++ case 1: return AMDGPU::sub1; |
1764 |
++ case 2: return AMDGPU::sub2; |
1765 |
++ case 3: return AMDGPU::sub3; |
1766 |
++ case 4: return AMDGPU::sub4; |
1767 |
++ case 5: return AMDGPU::sub5; |
1768 |
++ case 6: return AMDGPU::sub6; |
1769 |
++ case 7: return AMDGPU::sub7; |
1770 |
++ case 8: return AMDGPU::sub8; |
1771 |
++ case 9: return AMDGPU::sub9; |
1772 |
++ case 10: return AMDGPU::sub10; |
1773 |
++ case 11: return AMDGPU::sub11; |
1774 |
++ case 12: return AMDGPU::sub12; |
1775 |
++ case 13: return AMDGPU::sub13; |
1776 |
++ case 14: return AMDGPU::sub14; |
1777 |
++ case 15: return AMDGPU::sub15; |
1778 |
++ default: llvm_unreachable("indirect index out of range"); |
1779 |
++ } |
1780 |
++} |
1781 |
++ |
1782 |
+#define GET_REGINFO_TARGET_DESC |
1783 |
+#include "AMDGPUGenRegisterInfo.inc" |
1784 |
diff --git a/lib/Target/R600/AMDGPURegisterInfo.h b/lib/Target/R600/AMDGPURegisterInfo.h |
1785 |
new file mode 100644 |
1786 |
-index 0000000..76ee7ae |
1787 |
+index 0000000..5007ff5 |
1788 |
--- /dev/null |
1789 |
+++ b/lib/Target/R600/AMDGPURegisterInfo.h |
1790 |
-@@ -0,0 +1,63 @@ |
1791 |
+@@ -0,0 +1,65 @@ |
1792 |
+//===-- AMDGPURegisterInfo.h - AMDGPURegisterInfo Interface -*- C++ -*-----===// |
1793 |
+// |
1794 |
+// The LLVM Compiler Infrastructure |
1795 |
@@ -2528,6 +2712,8 @@ index 0000000..76ee7ae |
1796 |
+ RegScavenger *RS) const; |
1797 |
+ unsigned getFrameRegister(const MachineFunction &MF) const; |
1798 |
+ |
1799 |
++ unsigned getIndirectSubReg(unsigned IndirectIndex) const; |
1800 |
++ |
1801 |
+}; |
1802 |
+ |
1803 |
+} // End namespace llvm |
1804 |
@@ -2535,10 +2721,10 @@ index 0000000..76ee7ae |
1805 |
+#endif // AMDIDSAREGISTERINFO_H |
1806 |
diff --git a/lib/Target/R600/AMDGPURegisterInfo.td b/lib/Target/R600/AMDGPURegisterInfo.td |
1807 |
new file mode 100644 |
1808 |
-index 0000000..8181e02 |
1809 |
+index 0000000..b5aca03 |
1810 |
--- /dev/null |
1811 |
+++ b/lib/Target/R600/AMDGPURegisterInfo.td |
1812 |
-@@ -0,0 +1,22 @@ |
1813 |
+@@ -0,0 +1,25 @@ |
1814 |
+//===-- AMDGPURegisterInfo.td - AMDGPU register info -------*- tablegen -*-===// |
1815 |
+// |
1816 |
+// The LLVM Compiler Infrastructure |
1817 |
@@ -2553,20 +2739,23 @@ index 0000000..8181e02 |
1818 |
+//===----------------------------------------------------------------------===// |
1819 |
+ |
1820 |
+let Namespace = "AMDGPU" in { |
1821 |
-+ def sel_x : SubRegIndex; |
1822 |
-+ def sel_y : SubRegIndex; |
1823 |
-+ def sel_z : SubRegIndex; |
1824 |
-+ def sel_w : SubRegIndex; |
1825 |
++ |
1826 |
++foreach Index = 0-15 in { |
1827 |
++ def sub#Index : SubRegIndex; |
1828 |
++} |
1829 |
++ |
1830 |
++def INDIRECT_BASE_ADDR : Register <"INDIRECT_BASE_ADDR">; |
1831 |
++ |
1832 |
+} |
1833 |
+ |
1834 |
+include "R600RegisterInfo.td" |
1835 |
+include "SIRegisterInfo.td" |
1836 |
diff --git a/lib/Target/R600/AMDGPUStructurizeCFG.cpp b/lib/Target/R600/AMDGPUStructurizeCFG.cpp |
1837 |
new file mode 100644 |
1838 |
-index 0000000..22338b5 |
1839 |
+index 0000000..a8c9621 |
1840 |
--- /dev/null |
1841 |
+++ b/lib/Target/R600/AMDGPUStructurizeCFG.cpp |
1842 |
-@@ -0,0 +1,714 @@ |
1843 |
+@@ -0,0 +1,893 @@ |
1844 |
+//===-- AMDGPUStructurizeCFG.cpp - ------------------===// |
1845 |
+// |
1846 |
+// The LLVM Compiler Infrastructure |
1847 |
@@ -2591,30 +2780,101 @@ index 0000000..22338b5 |
1848 |
+#include "llvm/Analysis/RegionInfo.h" |
1849 |
+#include "llvm/Analysis/RegionPass.h" |
1850 |
+#include "llvm/Transforms/Utils/SSAUpdater.h" |
1851 |
++#include "llvm/Support/PatternMatch.h" |
1852 |
+ |
1853 |
+using namespace llvm; |
1854 |
++using namespace llvm::PatternMatch; |
1855 |
+ |
1856 |
+namespace { |
1857 |
+ |
1858 |
+// Definition of the complex types used in this pass. |
1859 |
+ |
1860 |
+typedef std::pair<BasicBlock *, Value *> BBValuePair; |
1861 |
-+typedef ArrayRef<BasicBlock*> BBVecRef; |
1862 |
+ |
1863 |
+typedef SmallVector<RegionNode*, 8> RNVector; |
1864 |
+typedef SmallVector<BasicBlock*, 8> BBVector; |
1865 |
++typedef SmallVector<BranchInst*, 8> BranchVector; |
1866 |
+typedef SmallVector<BBValuePair, 2> BBValueVector; |
1867 |
+ |
1868 |
++typedef SmallPtrSet<BasicBlock *, 8> BBSet; |
1869 |
++ |
1870 |
+typedef DenseMap<PHINode *, BBValueVector> PhiMap; |
1871 |
++typedef DenseMap<DomTreeNode *, unsigned> DTN2UnsignedMap; |
1872 |
+typedef DenseMap<BasicBlock *, PhiMap> BBPhiMap; |
1873 |
+typedef DenseMap<BasicBlock *, Value *> BBPredicates; |
1874 |
+typedef DenseMap<BasicBlock *, BBPredicates> PredMap; |
1875 |
-+typedef DenseMap<BasicBlock *, unsigned> VisitedMap; |
1876 |
++typedef DenseMap<BasicBlock *, BasicBlock*> BB2BBMap; |
1877 |
++typedef DenseMap<BasicBlock *, BBVector> BB2BBVecMap; |
1878 |
+ |
1879 |
+// The name for newly created blocks. |
1880 |
+ |
1881 |
+static const char *FlowBlockName = "Flow"; |
1882 |
+ |
1883 |
++/// @brief Find the nearest common dominator for multiple BasicBlocks |
1884 |
++/// |
1885 |
++/// Helper class for AMDGPUStructurizeCFG |
1886 |
++/// TODO: Maybe move into common code |
1887 |
++class NearestCommonDominator { |
1888 |
++ |
1889 |
++ DominatorTree *DT; |
1890 |
++ |
1891 |
++ DTN2UnsignedMap IndexMap; |
1892 |
++ |
1893 |
++ BasicBlock *Result; |
1894 |
++ unsigned ResultIndex; |
1895 |
++ bool ExplicitMentioned; |
1896 |
++ |
1897 |
++public: |
1898 |
++ /// \brief Start a new query |
1899 |
++ NearestCommonDominator(DominatorTree *DomTree) { |
1900 |
++ DT = DomTree; |
1901 |
++ Result = 0; |
1902 |
++ } |
1903 |
++ |
1904 |
++ /// \brief Add BB to the resulting dominator |
1905 |
++ void addBlock(BasicBlock *BB, bool Remember = true) { |
1906 |
++ |
1907 |
++ DomTreeNode *Node = DT->getNode(BB); |
1908 |
++ |
1909 |
++ if (Result == 0) { |
1910 |
++ unsigned Numbering = 0; |
1911 |
++ for (;Node;Node = Node->getIDom()) |
1912 |
++ IndexMap[Node] = ++Numbering; |
1913 |
++ Result = BB; |
1914 |
++ ResultIndex = 1; |
1915 |
++ ExplicitMentioned = Remember; |
1916 |
++ return; |
1917 |
++ } |
1918 |
++ |
1919 |
++ for (;Node;Node = Node->getIDom()) |
1920 |
++ if (IndexMap.count(Node)) |
1921 |
++ break; |
1922 |
++ else |
1923 |
++ IndexMap[Node] = 0; |
1924 |
++ |
1925 |
++ assert(Node && "Dominator tree invalid!"); |
1926 |
++ |
1927 |
++ unsigned Numbering = IndexMap[Node]; |
1928 |
++ if (Numbering > ResultIndex) { |
1929 |
++ Result = Node->getBlock(); |
1930 |
++ ResultIndex = Numbering; |
1931 |
++ ExplicitMentioned = Remember && (Result == BB); |
1932 |
++ } else if (Numbering == ResultIndex) { |
1933 |
++ ExplicitMentioned |= Remember; |
1934 |
++ } |
1935 |
++ } |
1936 |
++ |
1937 |
++ /// \brief Is "Result" one of the BBs added with "Remember" = True? |
1938 |
++ bool wasResultExplicitMentioned() { |
1939 |
++ return ExplicitMentioned; |
1940 |
++ } |
1941 |
++ |
1942 |
++ /// \brief Get the query result |
1943 |
++ BasicBlock *getResult() { |
1944 |
++ return Result; |
1945 |
++ } |
1946 |
++}; |
1947 |
++ |
1948 |
+/// @brief Transforms the control flow graph on one single entry/exit region |
1949 |
+/// at a time. |
1950 |
+/// |
1951 |
@@ -2675,45 +2935,62 @@ index 0000000..22338b5 |
1952 |
+ DominatorTree *DT; |
1953 |
+ |
1954 |
+ RNVector Order; |
1955 |
-+ VisitedMap Visited; |
1956 |
-+ PredMap Predicates; |
1957 |
++ BBSet Visited; |
1958 |
++ |
1959 |
+ BBPhiMap DeletedPhis; |
1960 |
-+ BBVector FlowsInserted; |
1961 |
++ BB2BBVecMap AddedPhis; |
1962 |
++ |
1963 |
++ PredMap Predicates; |
1964 |
++ BranchVector Conditions; |
1965 |
++ |
1966 |
++ BB2BBMap Loops; |
1967 |
++ PredMap LoopPreds; |
1968 |
++ BranchVector LoopConds; |
1969 |
+ |
1970 |
-+ BasicBlock *LoopStart; |
1971 |
-+ BasicBlock *LoopEnd; |
1972 |
-+ BBPredicates LoopPred; |
1973 |
++ RegionNode *PrevNode; |
1974 |
+ |
1975 |
+ void orderNodes(); |
1976 |
+ |
1977 |
-+ void buildPredicate(BranchInst *Term, unsigned Idx, |
1978 |
-+ BBPredicates &Pred, bool Invert); |
1979 |
++ void analyzeLoops(RegionNode *N); |
1980 |
+ |
1981 |
-+ void analyzeBlock(BasicBlock *BB); |
1982 |
++ Value *invert(Value *Condition); |
1983 |
+ |
1984 |
-+ void analyzeLoop(BasicBlock *BB, unsigned &LoopIdx); |
1985 |
++ Value *buildCondition(BranchInst *Term, unsigned Idx, bool Invert); |
1986 |
++ |
1987 |
++ void gatherPredicates(RegionNode *N); |
1988 |
+ |
1989 |
+ void collectInfos(); |
1990 |
+ |
1991 |
-+ bool dominatesPredicates(BasicBlock *A, BasicBlock *B); |
1992 |
++ void insertConditions(bool Loops); |
1993 |
++ |
1994 |
++ void delPhiValues(BasicBlock *From, BasicBlock *To); |
1995 |
++ |
1996 |
++ void addPhiValues(BasicBlock *From, BasicBlock *To); |
1997 |
++ |
1998 |
++ void setPhiValues(); |
1999 |
+ |
2000 |
+ void killTerminator(BasicBlock *BB); |
2001 |
+ |
2002 |
-+ RegionNode *skipChained(RegionNode *Node); |
2003 |
++ void changeExit(RegionNode *Node, BasicBlock *NewExit, |
2004 |
++ bool IncludeDominator); |
2005 |
+ |
2006 |
-+ void delPhiValues(BasicBlock *From, BasicBlock *To); |
2007 |
++ BasicBlock *getNextFlow(BasicBlock *Dominator); |
2008 |
+ |
2009 |
-+ void addPhiValues(BasicBlock *From, BasicBlock *To); |
2010 |
++ BasicBlock *needPrefix(bool NeedEmpty); |
2011 |
+ |
2012 |
-+ BasicBlock *getNextFlow(BasicBlock *Prev); |
2013 |
++ BasicBlock *needPostfix(BasicBlock *Flow, bool ExitUseAllowed); |
2014 |
+ |
2015 |
-+ bool isPredictableTrue(BasicBlock *Prev, BasicBlock *Node); |
2016 |
++ void setPrevNode(BasicBlock *BB); |
2017 |
+ |
2018 |
-+ BasicBlock *wireFlowBlock(BasicBlock *Prev, RegionNode *Node); |
2019 |
++ bool dominatesPredicates(BasicBlock *BB, RegionNode *Node); |
2020 |
+ |
2021 |
-+ void createFlow(); |
2022 |
++ bool isPredictableTrue(RegionNode *Node); |
2023 |
++ |
2024 |
++ void wireFlow(bool ExitUseAllowed, BasicBlock *LoopEnd); |
2025 |
+ |
2026 |
-+ void insertConditions(); |
2027 |
++ void handleLoops(bool ExitUseAllowed, BasicBlock *LoopEnd); |
2028 |
++ |
2029 |
++ void createFlow(); |
2030 |
+ |
2031 |
+ void rebuildSSA(); |
2032 |
+ |
2033 |
@@ -2767,212 +3044,214 @@ index 0000000..22338b5 |
2034 |
+ } |
2035 |
+} |
2036 |
+ |
2037 |
-+/// \brief Build blocks and loop predicates |
2038 |
-+void AMDGPUStructurizeCFG::buildPredicate(BranchInst *Term, unsigned Idx, |
2039 |
-+ BBPredicates &Pred, bool Invert) { |
2040 |
-+ Value *True = Invert ? BoolFalse : BoolTrue; |
2041 |
-+ Value *False = Invert ? BoolTrue : BoolFalse; |
2042 |
++/// \brief Determine the end of the loops |
2043 |
++void AMDGPUStructurizeCFG::analyzeLoops(RegionNode *N) { |
2044 |
+ |
2045 |
-+ RegionInfo *RI = ParentRegion->getRegionInfo(); |
2046 |
-+ BasicBlock *BB = Term->getParent(); |
2047 |
++ if (N->isSubRegion()) { |
2048 |
++ // Test for exit as back edge |
2049 |
++ BasicBlock *Exit = N->getNodeAs<Region>()->getExit(); |
2050 |
++ if (Visited.count(Exit)) |
2051 |
++ Loops[Exit] = N->getEntry(); |
2052 |
++ |
2053 |
++ } else { |
2054 |
++ // Test for sucessors as back edge |
2055 |
++ BasicBlock *BB = N->getNodeAs<BasicBlock>(); |
2056 |
++ BranchInst *Term = cast<BranchInst>(BB->getTerminator()); |
2057 |
+ |
2058 |
-+ // Handle the case where multiple regions start at the same block |
2059 |
-+ Region *R = BB != ParentRegion->getEntry() ? |
2060 |
-+ RI->getRegionFor(BB) : ParentRegion; |
2061 |
++ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) { |
2062 |
++ BasicBlock *Succ = Term->getSuccessor(i); |
2063 |
+ |
2064 |
-+ if (R == ParentRegion) { |
2065 |
-+ // It's a top level block in our region |
2066 |
-+ Value *Cond = True; |
2067 |
-+ if (Term->isConditional()) { |
2068 |
-+ BasicBlock *Other = Term->getSuccessor(!Idx); |
2069 |
++ if (Visited.count(Succ)) |
2070 |
++ Loops[Succ] = BB; |
2071 |
++ } |
2072 |
++ } |
2073 |
++} |
2074 |
+ |
2075 |
-+ if (Visited.count(Other)) { |
2076 |
-+ if (!Pred.count(Other)) |
2077 |
-+ Pred[Other] = False; |
2078 |
++/// \brief Invert the given condition |
2079 |
++Value *AMDGPUStructurizeCFG::invert(Value *Condition) { |
2080 |
+ |
2081 |
-+ if (!Pred.count(BB)) |
2082 |
-+ Pred[BB] = True; |
2083 |
-+ return; |
2084 |
-+ } |
2085 |
-+ Cond = Term->getCondition(); |
2086 |
++ // First: Check if it's a constant |
2087 |
++ if (Condition == BoolTrue) |
2088 |
++ return BoolFalse; |
2089 |
+ |
2090 |
-+ if (Idx != Invert) |
2091 |
-+ Cond = BinaryOperator::CreateNot(Cond, "", Term); |
2092 |
-+ } |
2093 |
++ if (Condition == BoolFalse) |
2094 |
++ return BoolTrue; |
2095 |
+ |
2096 |
-+ Pred[BB] = Cond; |
2097 |
++ if (Condition == BoolUndef) |
2098 |
++ return BoolUndef; |
2099 |
+ |
2100 |
-+ } else if (ParentRegion->contains(R)) { |
2101 |
-+ // It's a block in a sub region |
2102 |
-+ while(R->getParent() != ParentRegion) |
2103 |
-+ R = R->getParent(); |
2104 |
++ // Second: If the condition is already inverted, return the original value |
2105 |
++ if (match(Condition, m_Not(m_Value(Condition)))) |
2106 |
++ return Condition; |
2107 |
+ |
2108 |
-+ Pred[R->getEntry()] = True; |
2109 |
++ // Third: Check all the users for an invert |
2110 |
++ BasicBlock *Parent = cast<Instruction>(Condition)->getParent(); |
2111 |
++ for (Value::use_iterator I = Condition->use_begin(), |
2112 |
++ E = Condition->use_end(); I != E; ++I) { |
2113 |
+ |
2114 |
-+ } else { |
2115 |
-+ // It's a branch from outside into our parent region |
2116 |
-+ Pred[BB] = True; |
2117 |
++ Instruction *User = dyn_cast<Instruction>(*I); |
2118 |
++ if (!User || User->getParent() != Parent) |
2119 |
++ continue; |
2120 |
++ |
2121 |
++ if (match(*I, m_Not(m_Specific(Condition)))) |
2122 |
++ return *I; |
2123 |
+ } |
2124 |
-+} |
2125 |
+ |
2126 |
-+/// \brief Analyze the successors of each block and build up predicates |
2127 |
-+void AMDGPUStructurizeCFG::analyzeBlock(BasicBlock *BB) { |
2128 |
-+ pred_iterator PI = pred_begin(BB), PE = pred_end(BB); |
2129 |
-+ BBPredicates &Pred = Predicates[BB]; |
2130 |
++ // Last option: Create a new instruction |
2131 |
++ return BinaryOperator::CreateNot(Condition, "", Parent->getTerminator()); |
2132 |
++} |
2133 |
+ |
2134 |
-+ for (; PI != PE; ++PI) { |
2135 |
-+ BranchInst *Term = cast<BranchInst>((*PI)->getTerminator()); |
2136 |
++/// \brief Build the condition for one edge |
2137 |
++Value *AMDGPUStructurizeCFG::buildCondition(BranchInst *Term, unsigned Idx, |
2138 |
++ bool Invert) { |
2139 |
++ Value *Cond = Invert ? BoolFalse : BoolTrue; |
2140 |
++ if (Term->isConditional()) { |
2141 |
++ Cond = Term->getCondition(); |
2142 |
+ |
2143 |
-+ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) { |
2144 |
-+ BasicBlock *Succ = Term->getSuccessor(i); |
2145 |
-+ if (Succ != BB) |
2146 |
-+ continue; |
2147 |
-+ buildPredicate(Term, i, Pred, false); |
2148 |
-+ } |
2149 |
++ if (Idx != Invert) |
2150 |
++ Cond = invert(Cond); |
2151 |
+ } |
2152 |
++ return Cond; |
2153 |
+} |
2154 |
+ |
2155 |
-+/// \brief Analyze the conditions leading to loop to a previous block |
2156 |
-+void AMDGPUStructurizeCFG::analyzeLoop(BasicBlock *BB, unsigned &LoopIdx) { |
2157 |
-+ BranchInst *Term = cast<BranchInst>(BB->getTerminator()); |
2158 |
++/// \brief Analyze the predecessors of each block and build up predicates |
2159 |
++void AMDGPUStructurizeCFG::gatherPredicates(RegionNode *N) { |
2160 |
+ |
2161 |
-+ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) { |
2162 |
-+ BasicBlock *Succ = Term->getSuccessor(i); |
2163 |
++ RegionInfo *RI = ParentRegion->getRegionInfo(); |
2164 |
++ BasicBlock *BB = N->getEntry(); |
2165 |
++ BBPredicates &Pred = Predicates[BB]; |
2166 |
++ BBPredicates &LPred = LoopPreds[BB]; |
2167 |
++ |
2168 |
++ for (pred_iterator PI = pred_begin(BB), PE = pred_end(BB); |
2169 |
++ PI != PE; ++PI) { |
2170 |
+ |
2171 |
-+ // Ignore it if it's not a back edge |
2172 |
-+ if (!Visited.count(Succ)) |
2173 |
++ // Ignore it if it's a branch from outside into our region entry |
2174 |
++ if (!ParentRegion->contains(*PI)) |
2175 |
+ continue; |
2176 |
+ |
2177 |
-+ buildPredicate(Term, i, LoopPred, true); |
2178 |
++ Region *R = RI->getRegionFor(*PI); |
2179 |
++ if (R == ParentRegion) { |
2180 |
++ |
2181 |
++ // It's a top level block in our region |
2182 |
++ BranchInst *Term = cast<BranchInst>((*PI)->getTerminator()); |
2183 |
++ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) { |
2184 |
++ BasicBlock *Succ = Term->getSuccessor(i); |
2185 |
++ if (Succ != BB) |
2186 |
++ continue; |
2187 |
++ |
2188 |
++ if (Visited.count(*PI)) { |
2189 |
++ // Normal forward edge |
2190 |
++ if (Term->isConditional()) { |
2191 |
++ // Try to treat it like an ELSE block |
2192 |
++ BasicBlock *Other = Term->getSuccessor(!i); |
2193 |
++ if (Visited.count(Other) && !Loops.count(Other) && |
2194 |
++ !Pred.count(Other) && !Pred.count(*PI)) { |
2195 |
++ |
2196 |
++ Pred[Other] = BoolFalse; |
2197 |
++ Pred[*PI] = BoolTrue; |
2198 |
++ continue; |
2199 |
++ } |
2200 |
++ } |
2201 |
++ Pred[*PI] = buildCondition(Term, i, false); |
2202 |
++ |
2203 |
++ } else { |
2204 |
++ // Back edge |
2205 |
++ LPred[*PI] = buildCondition(Term, i, true); |
2206 |
++ } |
2207 |
++ } |
2208 |
++ |
2209 |
++ } else { |
2210 |
++ |
2211 |
++ // It's an exit from a sub region |
2212 |
++ while(R->getParent() != ParentRegion) |
2213 |
++ R = R->getParent(); |
2214 |
++ |
2215 |
++ // Edge from inside a subregion to its entry, ignore it |
2216 |
++ if (R == N) |
2217 |
++ continue; |
2218 |
+ |
2219 |
-+ LoopEnd = BB; |
2220 |
-+ if (Visited[Succ] < LoopIdx) { |
2221 |
-+ LoopIdx = Visited[Succ]; |
2222 |
-+ LoopStart = Succ; |
2223 |
++ BasicBlock *Entry = R->getEntry(); |
2224 |
++ if (Visited.count(Entry)) |
2225 |
++ Pred[Entry] = BoolTrue; |
2226 |
++ else |
2227 |
++ LPred[Entry] = BoolFalse; |
2228 |
+ } |
2229 |
+ } |
2230 |
+} |
2231 |
+ |
2232 |
+/// \brief Collect various loop and predicate infos |
2233 |
+void AMDGPUStructurizeCFG::collectInfos() { |
2234 |
-+ unsigned Number = 0, LoopIdx = ~0; |
2235 |
+ |
2236 |
+ // Reset predicate |
2237 |
+ Predicates.clear(); |
2238 |
+ |
2239 |
+ // and loop infos |
2240 |
-+ LoopStart = LoopEnd = 0; |
2241 |
-+ LoopPred.clear(); |
2242 |
++ Loops.clear(); |
2243 |
++ LoopPreds.clear(); |
2244 |
++ |
2245 |
++ // Reset the visited nodes |
2246 |
++ Visited.clear(); |
2247 |
+ |
2248 |
-+ RNVector::reverse_iterator OI = Order.rbegin(), OE = Order.rend(); |
2249 |
-+ for (Visited.clear(); OI != OE; Visited[(*OI++)->getEntry()] = ++Number) { |
2250 |
++ for (RNVector::reverse_iterator OI = Order.rbegin(), OE = Order.rend(); |
2251 |
++ OI != OE; ++OI) { |
2252 |
+ |
2253 |
+ // Analyze all the conditions leading to a node |
2254 |
-+ analyzeBlock((*OI)->getEntry()); |
2255 |
++ gatherPredicates(*OI); |
2256 |
+ |
2257 |
-+ if ((*OI)->isSubRegion()) |
2258 |
-+ continue; |
2259 |
++ // Remember that we've seen this node |
2260 |
++ Visited.insert((*OI)->getEntry()); |
2261 |
+ |
2262 |
-+ // Find the first/last loop nodes and loop predicates |
2263 |
-+ analyzeLoop((*OI)->getNodeAs<BasicBlock>(), LoopIdx); |
2264 |
++ // Find the last back edges |
2265 |
++ analyzeLoops(*OI); |
2266 |
+ } |
2267 |
+} |
2268 |
+ |
2269 |
-+/// \brief Does A dominate all the predicates of B ? |
2270 |
-+bool AMDGPUStructurizeCFG::dominatesPredicates(BasicBlock *A, BasicBlock *B) { |
2271 |
-+ BBPredicates &Preds = Predicates[B]; |
2272 |
-+ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end(); |
2273 |
-+ PI != PE; ++PI) { |
2274 |
++/// \brief Insert the missing branch conditions |
2275 |
++void AMDGPUStructurizeCFG::insertConditions(bool Loops) { |
2276 |
++ BranchVector &Conds = Loops ? LoopConds : Conditions; |
2277 |
++ Value *Default = Loops ? BoolTrue : BoolFalse; |
2278 |
++ SSAUpdater PhiInserter; |
2279 |
+ |
2280 |
-+ if (!DT->dominates(A, PI->first)) |
2281 |
-+ return false; |
2282 |
-+ } |
2283 |
-+ return true; |
2284 |
-+} |
2285 |
++ for (BranchVector::iterator I = Conds.begin(), |
2286 |
++ E = Conds.end(); I != E; ++I) { |
2287 |
+ |
2288 |
-+/// \brief Remove phi values from all successors and the remove the terminator. |
2289 |
-+void AMDGPUStructurizeCFG::killTerminator(BasicBlock *BB) { |
2290 |
-+ TerminatorInst *Term = BB->getTerminator(); |
2291 |
-+ if (!Term) |
2292 |
-+ return; |
2293 |
++ BranchInst *Term = *I; |
2294 |
++ assert(Term->isConditional()); |
2295 |
+ |
2296 |
-+ for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB); |
2297 |
-+ SI != SE; ++SI) { |
2298 |
++ BasicBlock *Parent = Term->getParent(); |
2299 |
++ BasicBlock *SuccTrue = Term->getSuccessor(0); |
2300 |
++ BasicBlock *SuccFalse = Term->getSuccessor(1); |
2301 |
+ |
2302 |
-+ delPhiValues(BB, *SI); |
2303 |
-+ } |
2304 |
++ PhiInserter.Initialize(Boolean, ""); |
2305 |
++ PhiInserter.AddAvailableValue(&Func->getEntryBlock(), Default); |
2306 |
++ PhiInserter.AddAvailableValue(Loops ? SuccFalse : Parent, Default); |
2307 |
+ |
2308 |
-+ Term->eraseFromParent(); |
2309 |
-+} |
2310 |
++ BBPredicates &Preds = Loops ? LoopPreds[SuccFalse] : Predicates[SuccTrue]; |
2311 |
+ |
2312 |
-+/// First: Skip forward to the first region node that either isn't a subregion or not |
2313 |
-+/// dominating it's exit, remove all the skipped nodes from the node order. |
2314 |
-+/// |
2315 |
-+/// Second: Handle the first successor directly if the resulting nodes successor |
2316 |
-+/// predicates are still dominated by the original entry |
2317 |
-+RegionNode *AMDGPUStructurizeCFG::skipChained(RegionNode *Node) { |
2318 |
-+ BasicBlock *Entry = Node->getEntry(); |
2319 |
++ NearestCommonDominator Dominator(DT); |
2320 |
++ Dominator.addBlock(Parent, false); |
2321 |
+ |
2322 |
-+ // Skip forward as long as it is just a linear flow |
2323 |
-+ while (true) { |
2324 |
-+ BasicBlock *Entry = Node->getEntry(); |
2325 |
-+ BasicBlock *Exit; |
2326 |
++ Value *ParentValue = 0; |
2327 |
++ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end(); |
2328 |
++ PI != PE; ++PI) { |
2329 |
+ |
2330 |
-+ if (Node->isSubRegion()) { |
2331 |
-+ Exit = Node->getNodeAs<Region>()->getExit(); |
2332 |
-+ } else { |
2333 |
-+ TerminatorInst *Term = Entry->getTerminator(); |
2334 |
-+ if (Term->getNumSuccessors() != 1) |
2335 |
++ if (PI->first == Parent) { |
2336 |
++ ParentValue = PI->second; |
2337 |
+ break; |
2338 |
-+ Exit = Term->getSuccessor(0); |
2339 |
++ } |
2340 |
++ PhiInserter.AddAvailableValue(PI->first, PI->second); |
2341 |
++ Dominator.addBlock(PI->first); |
2342 |
+ } |
2343 |
+ |
2344 |
-+ // It's a back edge, break here so we can insert a loop node |
2345 |
-+ if (!Visited.count(Exit)) |
2346 |
-+ return Node; |
2347 |
-+ |
2348 |
-+ // More than node edges are pointing to exit |
2349 |
-+ if (!DT->dominates(Entry, Exit)) |
2350 |
-+ return Node; |
2351 |
-+ |
2352 |
-+ RegionNode *Next = ParentRegion->getNode(Exit); |
2353 |
-+ RNVector::iterator I = std::find(Order.begin(), Order.end(), Next); |
2354 |
-+ assert(I != Order.end()); |
2355 |
-+ |
2356 |
-+ Visited.erase(Next->getEntry()); |
2357 |
-+ Order.erase(I); |
2358 |
-+ Node = Next; |
2359 |
-+ } |
2360 |
++ if (ParentValue) { |
2361 |
++ Term->setCondition(ParentValue); |
2362 |
++ } else { |
2363 |
++ if (!Dominator.wasResultExplicitMentioned()) |
2364 |
++ PhiInserter.AddAvailableValue(Dominator.getResult(), Default); |
2365 |
+ |
2366 |
-+ BasicBlock *BB = Node->getEntry(); |
2367 |
-+ TerminatorInst *Term = BB->getTerminator(); |
2368 |
-+ if (Term->getNumSuccessors() != 2) |
2369 |
-+ return Node; |
2370 |
-+ |
2371 |
-+ // Our node has exactly two succesors, check if we can handle |
2372 |
-+ // any of them directly |
2373 |
-+ BasicBlock *Succ = Term->getSuccessor(0); |
2374 |
-+ if (!Visited.count(Succ) || !dominatesPredicates(Entry, Succ)) { |
2375 |
-+ Succ = Term->getSuccessor(1); |
2376 |
-+ if (!Visited.count(Succ) || !dominatesPredicates(Entry, Succ)) |
2377 |
-+ return Node; |
2378 |
-+ } else { |
2379 |
-+ BasicBlock *Succ2 = Term->getSuccessor(1); |
2380 |
-+ if (Visited.count(Succ2) && Visited[Succ] > Visited[Succ2] && |
2381 |
-+ dominatesPredicates(Entry, Succ2)) |
2382 |
-+ Succ = Succ2; |
2383 |
++ Term->setCondition(PhiInserter.GetValueInMiddleOfBlock(Parent)); |
2384 |
++ } |
2385 |
+ } |
2386 |
-+ |
2387 |
-+ RegionNode *Next = ParentRegion->getNode(Succ); |
2388 |
-+ RNVector::iterator E = Order.end(); |
2389 |
-+ RNVector::iterator I = std::find(Order.begin(), E, Next); |
2390 |
-+ assert(I != E); |
2391 |
-+ |
2392 |
-+ killTerminator(BB); |
2393 |
-+ FlowsInserted.push_back(BB); |
2394 |
-+ Visited.erase(Succ); |
2395 |
-+ Order.erase(I); |
2396 |
-+ return ParentRegion->getNode(wireFlowBlock(BB, Next)); |
2397 |
+} |
2398 |
+ |
2399 |
+/// \brief Remove all PHI values coming from "From" into "To" and remember |
2400 |
@@ -2990,224 +3269,306 @@ index 0000000..22338b5 |
2401 |
+ } |
2402 |
+} |
2403 |
+ |
2404 |
-+/// \brief Add the PHI values back once we knew the new predecessor |
2405 |
++/// \brief Add a dummy PHI value as soon as we knew the new predecessor |
2406 |
+void AMDGPUStructurizeCFG::addPhiValues(BasicBlock *From, BasicBlock *To) { |
2407 |
-+ if (!DeletedPhis.count(To)) |
2408 |
-+ return; |
2409 |
++ for (BasicBlock::iterator I = To->begin(), E = To->end(); |
2410 |
++ I != E && isa<PHINode>(*I);) { |
2411 |
+ |
2412 |
-+ PhiMap &Map = DeletedPhis[To]; |
2413 |
++ PHINode &Phi = cast<PHINode>(*I++); |
2414 |
++ Value *Undef = UndefValue::get(Phi.getType()); |
2415 |
++ Phi.addIncoming(Undef, From); |
2416 |
++ } |
2417 |
++ AddedPhis[To].push_back(From); |
2418 |
++} |
2419 |
++ |
2420 |
++/// \brief Add the real PHI value as soon as everything is set up |
2421 |
++void AMDGPUStructurizeCFG::setPhiValues() { |
2422 |
++ |
2423 |
+ SSAUpdater Updater; |
2424 |
++ for (BB2BBVecMap::iterator AI = AddedPhis.begin(), AE = AddedPhis.end(); |
2425 |
++ AI != AE; ++AI) { |
2426 |
+ |
2427 |
-+ for (PhiMap::iterator I = Map.begin(), E = Map.end(); I != E; ++I) { |
2428 |
++ BasicBlock *To = AI->first; |
2429 |
++ BBVector &From = AI->second; |
2430 |
+ |
2431 |
-+ PHINode *Phi = I->first; |
2432 |
-+ Updater.Initialize(Phi->getType(), ""); |
2433 |
-+ BasicBlock *Fallback = To; |
2434 |
-+ bool HaveFallback = false; |
2435 |
++ if (!DeletedPhis.count(To)) |
2436 |
++ continue; |
2437 |
+ |
2438 |
-+ for (BBValueVector::iterator VI = I->second.begin(), VE = I->second.end(); |
2439 |
-+ VI != VE; ++VI) { |
2440 |
++ PhiMap &Map = DeletedPhis[To]; |
2441 |
++ for (PhiMap::iterator PI = Map.begin(), PE = Map.end(); |
2442 |
++ PI != PE; ++PI) { |
2443 |
+ |
2444 |
-+ Updater.AddAvailableValue(VI->first, VI->second); |
2445 |
-+ BasicBlock *Dom = DT->findNearestCommonDominator(Fallback, VI->first); |
2446 |
-+ if (Dom == VI->first) |
2447 |
-+ HaveFallback = true; |
2448 |
-+ else if (Dom != Fallback) |
2449 |
-+ HaveFallback = false; |
2450 |
-+ Fallback = Dom; |
2451 |
-+ } |
2452 |
-+ if (!HaveFallback) { |
2453 |
++ PHINode *Phi = PI->first; |
2454 |
+ Value *Undef = UndefValue::get(Phi->getType()); |
2455 |
-+ Updater.AddAvailableValue(Fallback, Undef); |
2456 |
++ Updater.Initialize(Phi->getType(), ""); |
2457 |
++ Updater.AddAvailableValue(&Func->getEntryBlock(), Undef); |
2458 |
++ Updater.AddAvailableValue(To, Undef); |
2459 |
++ |
2460 |
++ NearestCommonDominator Dominator(DT); |
2461 |
++ Dominator.addBlock(To, false); |
2462 |
++ for (BBValueVector::iterator VI = PI->second.begin(), |
2463 |
++ VE = PI->second.end(); VI != VE; ++VI) { |
2464 |
++ |
2465 |
++ Updater.AddAvailableValue(VI->first, VI->second); |
2466 |
++ Dominator.addBlock(VI->first); |
2467 |
++ } |
2468 |
++ |
2469 |
++ if (!Dominator.wasResultExplicitMentioned()) |
2470 |
++ Updater.AddAvailableValue(Dominator.getResult(), Undef); |
2471 |
++ |
2472 |
++ for (BBVector::iterator FI = From.begin(), FE = From.end(); |
2473 |
++ FI != FE; ++FI) { |
2474 |
++ |
2475 |
++ int Idx = Phi->getBasicBlockIndex(*FI); |
2476 |
++ assert(Idx != -1); |
2477 |
++ Phi->setIncomingValue(Idx, Updater.GetValueAtEndOfBlock(*FI)); |
2478 |
++ } |
2479 |
++ } |
2480 |
++ |
2481 |
++ DeletedPhis.erase(To); |
2482 |
++ } |
2483 |
++ assert(DeletedPhis.empty()); |
2484 |
++} |
2485 |
++ |
2486 |
++/// \brief Remove phi values from all successors and then remove the terminator. |
2487 |
++void AMDGPUStructurizeCFG::killTerminator(BasicBlock *BB) { |
2488 |
++ TerminatorInst *Term = BB->getTerminator(); |
2489 |
++ if (!Term) |
2490 |
++ return; |
2491 |
++ |
2492 |
++ for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB); |
2493 |
++ SI != SE; ++SI) { |
2494 |
++ |
2495 |
++ delPhiValues(BB, *SI); |
2496 |
++ } |
2497 |
++ |
2498 |
++ Term->eraseFromParent(); |
2499 |
++} |
2500 |
++ |
2501 |
++/// \brief Let node exit(s) point to NewExit |
2502 |
++void AMDGPUStructurizeCFG::changeExit(RegionNode *Node, BasicBlock *NewExit, |
2503 |
++ bool IncludeDominator) { |
2504 |
++ |
2505 |
++ if (Node->isSubRegion()) { |
2506 |
++ Region *SubRegion = Node->getNodeAs<Region>(); |
2507 |
++ BasicBlock *OldExit = SubRegion->getExit(); |
2508 |
++ BasicBlock *Dominator = 0; |
2509 |
++ |
2510 |
++ // Find all the edges from the sub region to the exit |
2511 |
++ for (pred_iterator I = pred_begin(OldExit), E = pred_end(OldExit); |
2512 |
++ I != E;) { |
2513 |
++ |
2514 |
++ BasicBlock *BB = *I++; |
2515 |
++ if (!SubRegion->contains(BB)) |
2516 |
++ continue; |
2517 |
++ |
2518 |
++ // Modify the edges to point to the new exit |
2519 |
++ delPhiValues(BB, OldExit); |
2520 |
++ BB->getTerminator()->replaceUsesOfWith(OldExit, NewExit); |
2521 |
++ addPhiValues(BB, NewExit); |
2522 |
++ |
2523 |
++ // Find the new dominator (if requested) |
2524 |
++ if (IncludeDominator) { |
2525 |
++ if (!Dominator) |
2526 |
++ Dominator = BB; |
2527 |
++ else |
2528 |
++ Dominator = DT->findNearestCommonDominator(Dominator, BB); |
2529 |
++ } |
2530 |
+ } |
2531 |
+ |
2532 |
-+ Phi->addIncoming(Updater.GetValueAtEndOfBlock(From), From); |
2533 |
++ // Change the dominator (if requested) |
2534 |
++ if (Dominator) |
2535 |
++ DT->changeImmediateDominator(NewExit, Dominator); |
2536 |
++ |
2537 |
++ // Update the region info |
2538 |
++ SubRegion->replaceExit(NewExit); |
2539 |
++ |
2540 |
++ } else { |
2541 |
++ BasicBlock *BB = Node->getNodeAs<BasicBlock>(); |
2542 |
++ killTerminator(BB); |
2543 |
++ BranchInst::Create(NewExit, BB); |
2544 |
++ addPhiValues(BB, NewExit); |
2545 |
++ if (IncludeDominator) |
2546 |
++ DT->changeImmediateDominator(NewExit, BB); |
2547 |
+ } |
2548 |
-+ DeletedPhis.erase(To); |
2549 |
+} |
2550 |
+ |
2551 |
+/// \brief Create a new flow node and update dominator tree and region info |
2552 |
-+BasicBlock *AMDGPUStructurizeCFG::getNextFlow(BasicBlock *Prev) { |
2553 |
++BasicBlock *AMDGPUStructurizeCFG::getNextFlow(BasicBlock *Dominator) { |
2554 |
+ LLVMContext &Context = Func->getContext(); |
2555 |
+ BasicBlock *Insert = Order.empty() ? ParentRegion->getExit() : |
2556 |
+ Order.back()->getEntry(); |
2557 |
+ BasicBlock *Flow = BasicBlock::Create(Context, FlowBlockName, |
2558 |
+ Func, Insert); |
2559 |
-+ DT->addNewBlock(Flow, Prev); |
2560 |
++ DT->addNewBlock(Flow, Dominator); |
2561 |
+ ParentRegion->getRegionInfo()->setRegionFor(Flow, ParentRegion); |
2562 |
-+ FlowsInserted.push_back(Flow); |
2563 |
+ return Flow; |
2564 |
+} |
2565 |
+ |
2566 |
++/// \brief Create a new or reuse the previous node as flow node |
2567 |
++BasicBlock *AMDGPUStructurizeCFG::needPrefix(bool NeedEmpty) { |
2568 |
++ |
2569 |
++ BasicBlock *Entry = PrevNode->getEntry(); |
2570 |
++ |
2571 |
++ if (!PrevNode->isSubRegion()) { |
2572 |
++ killTerminator(Entry); |
2573 |
++ if (!NeedEmpty || Entry->getFirstInsertionPt() == Entry->end()) |
2574 |
++ return Entry; |
2575 |
++ |
2576 |
++ } |
2577 |
++ |
2578 |
++ // create a new flow node |
2579 |
++ BasicBlock *Flow = getNextFlow(Entry); |
2580 |
++ |
2581 |
++ // and wire it up |
2582 |
++ changeExit(PrevNode, Flow, true); |
2583 |
++ PrevNode = ParentRegion->getBBNode(Flow); |
2584 |
++ return Flow; |
2585 |
++} |
2586 |
++ |
2587 |
++/// \brief Returns the region exit if possible, otherwise just a new flow node |
2588 |
++BasicBlock *AMDGPUStructurizeCFG::needPostfix(BasicBlock *Flow, |
2589 |
++ bool ExitUseAllowed) { |
2590 |
++ |
2591 |
++ if (Order.empty() && ExitUseAllowed) { |
2592 |
++ BasicBlock *Exit = ParentRegion->getExit(); |
2593 |
++ DT->changeImmediateDominator(Exit, Flow); |
2594 |
++ addPhiValues(Flow, Exit); |
2595 |
++ return Exit; |
2596 |
++ } |
2597 |
++ return getNextFlow(Flow); |
2598 |
++} |
2599 |
++ |
2600 |
++/// \brief Set the previous node |
2601 |
++void AMDGPUStructurizeCFG::setPrevNode(BasicBlock *BB) { |
2602 |
++ PrevNode = ParentRegion->contains(BB) ? ParentRegion->getBBNode(BB) : 0; |
2603 |
++} |
2604 |
++ |
2605 |
++/// \brief Does BB dominate all the predicates of Node ? |
2606 |
++bool AMDGPUStructurizeCFG::dominatesPredicates(BasicBlock *BB, RegionNode *Node) { |
2607 |
++ BBPredicates &Preds = Predicates[Node->getEntry()]; |
2608 |
++ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end(); |
2609 |
++ PI != PE; ++PI) { |
2610 |
++ |
2611 |
++ if (!DT->dominates(BB, PI->first)) |
2612 |
++ return false; |
2613 |
++ } |
2614 |
++ return true; |
2615 |
++} |
2616 |
++ |
2617 |
+/// \brief Can we predict that this node will always be called? |
2618 |
-+bool AMDGPUStructurizeCFG::isPredictableTrue(BasicBlock *Prev, |
2619 |
-+ BasicBlock *Node) { |
2620 |
-+ BBPredicates &Preds = Predicates[Node]; |
2621 |
++bool AMDGPUStructurizeCFG::isPredictableTrue(RegionNode *Node) { |
2622 |
++ |
2623 |
++ BBPredicates &Preds = Predicates[Node->getEntry()]; |
2624 |
+ bool Dominated = false; |
2625 |
+ |
2626 |
++ // Regionentry is always true |
2627 |
++ if (PrevNode == 0) |
2628 |
++ return true; |
2629 |
++ |
2630 |
+ for (BBPredicates::iterator I = Preds.begin(), E = Preds.end(); |
2631 |
+ I != E; ++I) { |
2632 |
+ |
2633 |
+ if (I->second != BoolTrue) |
2634 |
+ return false; |
2635 |
+ |
2636 |
-+ if (!Dominated && DT->dominates(I->first, Prev)) |
2637 |
++ if (!Dominated && DT->dominates(I->first, PrevNode->getEntry())) |
2638 |
+ Dominated = true; |
2639 |
+ } |
2640 |
++ |
2641 |
++ // TODO: The dominator check is too strict |
2642 |
+ return Dominated; |
2643 |
+} |
2644 |
+ |
2645 |
-+/// \brief Wire up the new control flow by inserting or updating the branch |
2646 |
-+/// instructions at node exits |
2647 |
-+BasicBlock *AMDGPUStructurizeCFG::wireFlowBlock(BasicBlock *Prev, |
2648 |
-+ RegionNode *Node) { |
2649 |
-+ BasicBlock *Entry = Node->getEntry(); |
2650 |
-+ |
2651 |
-+ if (LoopStart == Entry) { |
2652 |
-+ LoopStart = Prev; |
2653 |
-+ LoopPred[Prev] = BoolTrue; |
2654 |
-+ } |
2655 |
++/// Take one node from the order vector and wire it up |
2656 |
++void AMDGPUStructurizeCFG::wireFlow(bool ExitUseAllowed, |
2657 |
++ BasicBlock *LoopEnd) { |
2658 |
+ |
2659 |
-+ // Wire it up temporary, skipChained may recurse into us |
2660 |
-+ BranchInst::Create(Entry, Prev); |
2661 |
-+ DT->changeImmediateDominator(Entry, Prev); |
2662 |
-+ addPhiValues(Prev, Entry); |
2663 |
++ RegionNode *Node = Order.pop_back_val(); |
2664 |
++ Visited.insert(Node->getEntry()); |
2665 |
+ |
2666 |
-+ Node = skipChained(Node); |
2667 |
++ if (isPredictableTrue(Node)) { |
2668 |
++ // Just a linear flow |
2669 |
++ if (PrevNode) { |
2670 |
++ changeExit(PrevNode, Node->getEntry(), true); |
2671 |
++ } |
2672 |
++ PrevNode = Node; |
2673 |
+ |
2674 |
-+ BasicBlock *Next = getNextFlow(Prev); |
2675 |
-+ if (!isPredictableTrue(Prev, Entry)) { |
2676 |
-+ // Let Prev point to entry and next block |
2677 |
-+ Prev->getTerminator()->eraseFromParent(); |
2678 |
-+ BranchInst::Create(Entry, Next, BoolUndef, Prev); |
2679 |
+ } else { |
2680 |
-+ DT->changeImmediateDominator(Next, Entry); |
2681 |
-+ } |
2682 |
++ // Insert extra prefix node (or reuse last one) |
2683 |
++ BasicBlock *Flow = needPrefix(false); |
2684 |
+ |
2685 |
-+ // Let node exit(s) point to next block |
2686 |
-+ if (Node->isSubRegion()) { |
2687 |
-+ Region *SubRegion = Node->getNodeAs<Region>(); |
2688 |
-+ BasicBlock *Exit = SubRegion->getExit(); |
2689 |
++ // Insert extra postfix node (or use exit instead) |
2690 |
++ BasicBlock *Entry = Node->getEntry(); |
2691 |
++ BasicBlock *Next = needPostfix(Flow, ExitUseAllowed); |
2692 |
+ |
2693 |
-+ // Find all the edges from the sub region to the exit |
2694 |
-+ BBVector ToDo; |
2695 |
-+ for (pred_iterator I = pred_begin(Exit), E = pred_end(Exit); I != E; ++I) { |
2696 |
-+ if (SubRegion->contains(*I)) |
2697 |
-+ ToDo.push_back(*I); |
2698 |
-+ } |
2699 |
++ // let it point to entry and next block |
2700 |
++ Conditions.push_back(BranchInst::Create(Entry, Next, BoolUndef, Flow)); |
2701 |
++ addPhiValues(Flow, Entry); |
2702 |
++ DT->changeImmediateDominator(Entry, Flow); |
2703 |
+ |
2704 |
-+ // Modify the edges to point to the new flow block |
2705 |
-+ for (BBVector::iterator I = ToDo.begin(), E = ToDo.end(); I != E; ++I) { |
2706 |
-+ delPhiValues(*I, Exit); |
2707 |
-+ TerminatorInst *Term = (*I)->getTerminator(); |
2708 |
-+ Term->replaceUsesOfWith(Exit, Next); |
2709 |
++ PrevNode = Node; |
2710 |
++ while (!Order.empty() && !Visited.count(LoopEnd) && |
2711 |
++ dominatesPredicates(Entry, Order.back())) { |
2712 |
++ handleLoops(false, LoopEnd); |
2713 |
+ } |
2714 |
+ |
2715 |
-+ // Update the region info |
2716 |
-+ SubRegion->replaceExit(Next); |
2717 |
-+ |
2718 |
-+ } else { |
2719 |
-+ BasicBlock *BB = Node->getNodeAs<BasicBlock>(); |
2720 |
-+ killTerminator(BB); |
2721 |
-+ BranchInst::Create(Next, BB); |
2722 |
-+ |
2723 |
-+ if (BB == LoopEnd) |
2724 |
-+ LoopEnd = 0; |
2725 |
++ changeExit(PrevNode, Next, false); |
2726 |
++ setPrevNode(Next); |
2727 |
+ } |
2728 |
-+ |
2729 |
-+ return Next; |
2730 |
+} |
2731 |
+ |
2732 |
-+/// Destroy node order and visited map, build up flow order instead. |
2733 |
-+/// After this function control flow looks like it should be, but |
2734 |
-+/// branches only have undefined conditions. |
2735 |
-+void AMDGPUStructurizeCFG::createFlow() { |
2736 |
-+ DeletedPhis.clear(); |
2737 |
-+ |
2738 |
-+ BasicBlock *Prev = Order.pop_back_val()->getEntry(); |
2739 |
-+ assert(Prev == ParentRegion->getEntry() && "Incorrect node order!"); |
2740 |
-+ Visited.erase(Prev); |
2741 |
-+ |
2742 |
-+ if (LoopStart == Prev) { |
2743 |
-+ // Loop starts at entry, split entry so that we can predicate it |
2744 |
-+ BasicBlock::iterator Insert = Prev->getFirstInsertionPt(); |
2745 |
-+ BasicBlock *Split = Prev->splitBasicBlock(Insert, FlowBlockName); |
2746 |
-+ DT->addNewBlock(Split, Prev); |
2747 |
-+ ParentRegion->getRegionInfo()->setRegionFor(Split, ParentRegion); |
2748 |
-+ Predicates[Split] = Predicates[Prev]; |
2749 |
-+ Order.push_back(ParentRegion->getBBNode(Split)); |
2750 |
-+ LoopPred[Prev] = BoolTrue; |
2751 |
-+ |
2752 |
-+ } else if (LoopStart == Order.back()->getEntry()) { |
2753 |
-+ // Loop starts behind entry, split entry so that we can jump to it |
2754 |
-+ Instruction *Term = Prev->getTerminator(); |
2755 |
-+ BasicBlock *Split = Prev->splitBasicBlock(Term, FlowBlockName); |
2756 |
-+ DT->addNewBlock(Split, Prev); |
2757 |
-+ ParentRegion->getRegionInfo()->setRegionFor(Split, ParentRegion); |
2758 |
-+ Prev = Split; |
2759 |
-+ } |
2760 |
-+ |
2761 |
-+ killTerminator(Prev); |
2762 |
-+ FlowsInserted.clear(); |
2763 |
-+ FlowsInserted.push_back(Prev); |
2764 |
++void AMDGPUStructurizeCFG::handleLoops(bool ExitUseAllowed, |
2765 |
++ BasicBlock *LoopEnd) { |
2766 |
++ RegionNode *Node = Order.back(); |
2767 |
++ BasicBlock *LoopStart = Node->getEntry(); |
2768 |
+ |
2769 |
-+ while (!Order.empty()) { |
2770 |
-+ RegionNode *Node = Order.pop_back_val(); |
2771 |
-+ Visited.erase(Node->getEntry()); |
2772 |
-+ Prev = wireFlowBlock(Prev, Node); |
2773 |
-+ if (LoopStart && !LoopEnd) { |
2774 |
-+ // Create an extra loop end node |
2775 |
-+ LoopEnd = Prev; |
2776 |
-+ Prev = getNextFlow(LoopEnd); |
2777 |
-+ BranchInst::Create(Prev, LoopStart, BoolUndef, LoopEnd); |
2778 |
-+ addPhiValues(LoopEnd, LoopStart); |
2779 |
-+ } |
2780 |
++ if (!Loops.count(LoopStart)) { |
2781 |
++ wireFlow(ExitUseAllowed, LoopEnd); |
2782 |
++ return; |
2783 |
+ } |
2784 |
+ |
2785 |
-+ BasicBlock *Exit = ParentRegion->getExit(); |
2786 |
-+ BranchInst::Create(Exit, Prev); |
2787 |
-+ addPhiValues(Prev, Exit); |
2788 |
-+ if (DT->dominates(ParentRegion->getEntry(), Exit)) |
2789 |
-+ DT->changeImmediateDominator(Exit, Prev); |
2790 |
-+ |
2791 |
-+ if (LoopStart && LoopEnd) { |
2792 |
-+ BBVector::iterator FI = std::find(FlowsInserted.begin(), |
2793 |
-+ FlowsInserted.end(), |
2794 |
-+ LoopStart); |
2795 |
-+ for (; *FI != LoopEnd; ++FI) { |
2796 |
-+ addPhiValues(*FI, (*FI)->getTerminator()->getSuccessor(0)); |
2797 |
-+ } |
2798 |
++ if (!isPredictableTrue(Node)) |
2799 |
++ LoopStart = needPrefix(true); |
2800 |
++ |
2801 |
++ LoopEnd = Loops[Node->getEntry()]; |
2802 |
++ wireFlow(false, LoopEnd); |
2803 |
++ while (!Visited.count(LoopEnd)) { |
2804 |
++ handleLoops(false, LoopEnd); |
2805 |
+ } |
2806 |
+ |
2807 |
-+ assert(Order.empty()); |
2808 |
-+ assert(Visited.empty()); |
2809 |
-+ assert(DeletedPhis.empty()); |
2810 |
++ // Create an extra loop end node |
2811 |
++ LoopEnd = needPrefix(false); |
2812 |
++ BasicBlock *Next = needPostfix(LoopEnd, ExitUseAllowed); |
2813 |
++ LoopConds.push_back(BranchInst::Create(Next, LoopStart, |
2814 |
++ BoolUndef, LoopEnd)); |
2815 |
++ addPhiValues(LoopEnd, LoopStart); |
2816 |
++ setPrevNode(Next); |
2817 |
+} |
2818 |
+ |
2819 |
-+/// \brief Insert the missing branch conditions |
2820 |
-+void AMDGPUStructurizeCFG::insertConditions() { |
2821 |
-+ SSAUpdater PhiInserter; |
2822 |
-+ |
2823 |
-+ for (BBVector::iterator FI = FlowsInserted.begin(), FE = FlowsInserted.end(); |
2824 |
-+ FI != FE; ++FI) { |
2825 |
-+ |
2826 |
-+ BranchInst *Term = cast<BranchInst>((*FI)->getTerminator()); |
2827 |
-+ if (Term->isUnconditional()) |
2828 |
-+ continue; |
2829 |
++/// After this function control flow looks like it should be, but |
2830 |
++/// branches and PHI nodes only have undefined conditions. |
2831 |
++void AMDGPUStructurizeCFG::createFlow() { |
2832 |
+ |
2833 |
-+ PhiInserter.Initialize(Boolean, ""); |
2834 |
-+ PhiInserter.AddAvailableValue(&Func->getEntryBlock(), BoolFalse); |
2835 |
++ BasicBlock *Exit = ParentRegion->getExit(); |
2836 |
++ bool EntryDominatesExit = DT->dominates(ParentRegion->getEntry(), Exit); |
2837 |
+ |
2838 |
-+ BasicBlock *Succ = Term->getSuccessor(0); |
2839 |
-+ BBPredicates &Preds = (*FI == LoopEnd) ? LoopPred : Predicates[Succ]; |
2840 |
-+ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end(); |
2841 |
-+ PI != PE; ++PI) { |
2842 |
++ DeletedPhis.clear(); |
2843 |
++ AddedPhis.clear(); |
2844 |
++ Conditions.clear(); |
2845 |
++ LoopConds.clear(); |
2846 |
+ |
2847 |
-+ PhiInserter.AddAvailableValue(PI->first, PI->second); |
2848 |
-+ } |
2849 |
++ PrevNode = 0; |
2850 |
++ Visited.clear(); |
2851 |
+ |
2852 |
-+ Term->setCondition(PhiInserter.GetValueAtEndOfBlock(*FI)); |
2853 |
++ while (!Order.empty()) { |
2854 |
++ handleLoops(EntryDominatesExit, 0); |
2855 |
+ } |
2856 |
++ |
2857 |
++ if (PrevNode) |
2858 |
++ changeExit(PrevNode, Exit, EntryDominatesExit); |
2859 |
++ else |
2860 |
++ assert(EntryDominatesExit); |
2861 |
+} |
2862 |
+ |
2863 |
+/// Handle a rare case where the disintegrated nodes instructions |
2864 |
@@ -3265,14 +3626,21 @@ index 0000000..22338b5 |
2865 |
+ orderNodes(); |
2866 |
+ collectInfos(); |
2867 |
+ createFlow(); |
2868 |
-+ insertConditions(); |
2869 |
++ insertConditions(false); |
2870 |
++ insertConditions(true); |
2871 |
++ setPhiValues(); |
2872 |
+ rebuildSSA(); |
2873 |
+ |
2874 |
++ // Cleanup |
2875 |
+ Order.clear(); |
2876 |
+ Visited.clear(); |
2877 |
-+ Predicates.clear(); |
2878 |
+ DeletedPhis.clear(); |
2879 |
-+ FlowsInserted.clear(); |
2880 |
++ AddedPhis.clear(); |
2881 |
++ Predicates.clear(); |
2882 |
++ Conditions.clear(); |
2883 |
++ Loops.clear(); |
2884 |
++ LoopPreds.clear(); |
2885 |
++ LoopConds.clear(); |
2886 |
+ |
2887 |
+ return true; |
2888 |
+} |
2889 |
@@ -3447,10 +3815,10 @@ index 0000000..cab7884 |
2890 |
+#endif // AMDGPUSUBTARGET_H |
2891 |
diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp b/lib/Target/R600/AMDGPUTargetMachine.cpp |
2892 |
new file mode 100644 |
2893 |
-index 0000000..d09dc2e |
2894 |
+index 0000000..e2f00be |
2895 |
--- /dev/null |
2896 |
+++ b/lib/Target/R600/AMDGPUTargetMachine.cpp |
2897 |
-@@ -0,0 +1,142 @@ |
2898 |
+@@ -0,0 +1,153 @@ |
2899 |
+//===-- AMDGPUTargetMachine.cpp - TargetMachine for hw codegen targets-----===// |
2900 |
+// |
2901 |
+// The LLVM Compiler Infrastructure |
2902 |
@@ -3555,6 +3923,12 @@ index 0000000..d09dc2e |
2903 |
+bool AMDGPUPassConfig::addInstSelector() { |
2904 |
+ addPass(createAMDGPUPeepholeOpt(*TM)); |
2905 |
+ addPass(createAMDGPUISelDag(getAMDGPUTargetMachine())); |
2906 |
++ |
2907 |
++ const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>(); |
2908 |
++ if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) { |
2909 |
++ // This callbacks this pass uses are not implemented yet on SI. |
2910 |
++ addPass(createAMDGPUIndirectAddressingPass(*TM)); |
2911 |
++ } |
2912 |
+ return false; |
2913 |
+} |
2914 |
+ |
2915 |
@@ -3569,6 +3943,11 @@ index 0000000..d09dc2e |
2916 |
+} |
2917 |
+ |
2918 |
+bool AMDGPUPassConfig::addPostRegAlloc() { |
2919 |
++ const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>(); |
2920 |
++ |
2921 |
++ if (ST.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) { |
2922 |
++ addPass(createSIInsertWaits(*TM)); |
2923 |
++ } |
2924 |
+ return false; |
2925 |
+} |
2926 |
+ |
2927 |
@@ -3585,8 +3964,8 @@ index 0000000..d09dc2e |
2928 |
+ addPass(createAMDGPUCFGStructurizerPass(*TM)); |
2929 |
+ addPass(createR600ExpandSpecialInstrsPass(*TM)); |
2930 |
+ addPass(&FinalizeMachineBundlesID); |
2931 |
++ addPass(createR600LowerConstCopy(*TM)); |
2932 |
+ } else { |
2933 |
-+ addPass(createSILowerLiteralConstantsPass(*TM)); |
2934 |
+ addPass(createSILowerControlFlowPass(*TM)); |
2935 |
+ } |
2936 |
+ |
2937 |
@@ -3595,7 +3974,7 @@ index 0000000..d09dc2e |
2938 |
+ |
2939 |
diff --git a/lib/Target/R600/AMDGPUTargetMachine.h b/lib/Target/R600/AMDGPUTargetMachine.h |
2940 |
new file mode 100644 |
2941 |
-index 0000000..399e55c |
2942 |
+index 0000000..5a1dcf4 |
2943 |
--- /dev/null |
2944 |
+++ b/lib/Target/R600/AMDGPUTargetMachine.h |
2945 |
@@ -0,0 +1,70 @@ |
2946 |
@@ -3616,9 +3995,9 @@ index 0000000..399e55c |
2947 |
+#ifndef AMDGPU_TARGET_MACHINE_H |
2948 |
+#define AMDGPU_TARGET_MACHINE_H |
2949 |
+ |
2950 |
++#include "AMDGPUFrameLowering.h" |
2951 |
+#include "AMDGPUInstrInfo.h" |
2952 |
+#include "AMDGPUSubtarget.h" |
2953 |
-+#include "AMDILFrameLowering.h" |
2954 |
+#include "AMDILIntrinsicInfo.h" |
2955 |
+#include "R600ISelLowering.h" |
2956 |
+#include "llvm/ADT/OwningPtr.h" |
2957 |
@@ -3671,10 +4050,10 @@ index 0000000..399e55c |
2958 |
+#endif // AMDGPU_TARGET_MACHINE_H |
2959 |
diff --git a/lib/Target/R600/AMDIL.h b/lib/Target/R600/AMDIL.h |
2960 |
new file mode 100644 |
2961 |
-index 0000000..4e577dc |
2962 |
+index 0000000..b39fbdb |
2963 |
--- /dev/null |
2964 |
+++ b/lib/Target/R600/AMDIL.h |
2965 |
-@@ -0,0 +1,106 @@ |
2966 |
+@@ -0,0 +1,122 @@ |
2967 |
+//===-- AMDIL.h - Top-level interface for AMDIL representation --*- C++ -*-===// |
2968 |
+// |
2969 |
+// The LLVM Compiler Infrastructure |
2970 |
@@ -3767,14 +4146,30 @@ index 0000000..4e577dc |
2971 |
+enum AddressSpaces { |
2972 |
+ PRIVATE_ADDRESS = 0, ///< Address space for private memory. |
2973 |
+ GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0). |
2974 |
-+ CONSTANT_ADDRESS = 2, ///< Address space for constant memory. |
2975 |
++ CONSTANT_ADDRESS = 2, ///< Address space for constant memory |
2976 |
+ LOCAL_ADDRESS = 3, ///< Address space for local memory. |
2977 |
+ REGION_ADDRESS = 4, ///< Address space for region memory. |
2978 |
+ ADDRESS_NONE = 5, ///< Address space for unknown memory. |
2979 |
+ PARAM_D_ADDRESS = 6, ///< Address space for direct addressible parameter memory (CONST0) |
2980 |
+ PARAM_I_ADDRESS = 7, ///< Address space for indirect addressible parameter memory (VTX1) |
2981 |
+ USER_SGPR_ADDRESS = 8, ///< Address space for USER_SGPRS on SI |
2982 |
-+ LAST_ADDRESS = 9 |
2983 |
++ CONSTANT_BUFFER_0 = 9, |
2984 |
++ CONSTANT_BUFFER_1 = 10, |
2985 |
++ CONSTANT_BUFFER_2 = 11, |
2986 |
++ CONSTANT_BUFFER_3 = 12, |
2987 |
++ CONSTANT_BUFFER_4 = 13, |
2988 |
++ CONSTANT_BUFFER_5 = 14, |
2989 |
++ CONSTANT_BUFFER_6 = 15, |
2990 |
++ CONSTANT_BUFFER_7 = 16, |
2991 |
++ CONSTANT_BUFFER_8 = 17, |
2992 |
++ CONSTANT_BUFFER_9 = 18, |
2993 |
++ CONSTANT_BUFFER_10 = 19, |
2994 |
++ CONSTANT_BUFFER_11 = 20, |
2995 |
++ CONSTANT_BUFFER_12 = 21, |
2996 |
++ CONSTANT_BUFFER_13 = 22, |
2997 |
++ CONSTANT_BUFFER_14 = 23, |
2998 |
++ CONSTANT_BUFFER_15 = 24, |
2999 |
++ LAST_ADDRESS = 25 |
3000 |
+}; |
3001 |
+ |
3002 |
+} // namespace AMDGPUAS |
3003 |
@@ -4073,10 +4468,10 @@ index 0000000..c12cedc |
3004 |
+ |
3005 |
diff --git a/lib/Target/R600/AMDILCFGStructurizer.cpp b/lib/Target/R600/AMDILCFGStructurizer.cpp |
3006 |
new file mode 100644 |
3007 |
-index 0000000..9de97b6 |
3008 |
+index 0000000..568d281 |
3009 |
--- /dev/null |
3010 |
+++ b/lib/Target/R600/AMDILCFGStructurizer.cpp |
3011 |
-@@ -0,0 +1,3049 @@ |
3012 |
+@@ -0,0 +1,3045 @@ |
3013 |
+//===-- AMDILCFGStructurizer.cpp - CFG Structurizer -----------------------===// |
3014 |
+// |
3015 |
+// The LLVM Compiler Infrastructure |
3016 |
@@ -6101,9 +6496,7 @@ index 0000000..9de97b6 |
3017 |
+ CFGTraits::insertAssignInstrBefore(insertPos, passRep, immReg, 1); |
3018 |
+ InstrT *newInstr = |
3019 |
+ CFGTraits::insertInstrBefore(insertPos, AMDGPU::BRANCH_COND_i32, passRep); |
3020 |
-+ MachineInstrBuilder MIB(*funcRep, newInstr); |
3021 |
-+ MIB.addMBB(loopHeader); |
3022 |
-+ MIB.addReg(immReg, false); |
3023 |
++ MachineInstrBuilder(newInstr).addMBB(loopHeader).addReg(immReg, false); |
3024 |
+ |
3025 |
+ SHOWNEWINSTR(newInstr); |
3026 |
+ |
3027 |
@@ -6925,12 +7318,13 @@ index 0000000..9de97b6 |
3028 |
+ MachineInstr *oldInstr = &(*instrPos); |
3029 |
+ const TargetInstrInfo *tii = passRep->getTargetInstrInfo(); |
3030 |
+ MachineBasicBlock *blk = oldInstr->getParent(); |
3031 |
-+ MachineFunction *MF = blk->getParent(); |
3032 |
-+ MachineInstr *newInstr = MF->CreateMachineInstr(tii->get(newOpcode), DL); |
3033 |
++ MachineInstr *newInstr = |
3034 |
++ blk->getParent()->CreateMachineInstr(tii->get(newOpcode), |
3035 |
++ DL); |
3036 |
+ |
3037 |
+ blk->insert(instrPos, newInstr); |
3038 |
-+ MachineInstrBuilder MIB(*MF, newInstr); |
3039 |
-+ MIB.addReg(oldInstr->getOperand(1).getReg(), false); |
3040 |
++ MachineInstrBuilder(newInstr).addReg(oldInstr->getOperand(1).getReg(), |
3041 |
++ false); |
3042 |
+ |
3043 |
+ SHOWNEWINSTR(newInstr); |
3044 |
+ //erase later oldInstr->eraseFromParent(); |
3045 |
@@ -6943,13 +7337,13 @@ index 0000000..9de97b6 |
3046 |
+ RegiT regNum, |
3047 |
+ DebugLoc DL) { |
3048 |
+ const TargetInstrInfo *tii = passRep->getTargetInstrInfo(); |
3049 |
-+ MachineFunction *MF = blk->getParent(); |
3050 |
+ |
3051 |
-+ MachineInstr *newInstr = MF->CreateMachineInstr(tii->get(newOpcode), DL); |
3052 |
++ MachineInstr *newInstr = |
3053 |
++ blk->getParent()->CreateMachineInstr(tii->get(newOpcode), DL); |
3054 |
+ |
3055 |
+ //insert before |
3056 |
+ blk->insert(insertPos, newInstr); |
3057 |
-+ MachineInstrBuilder(*MF, newInstr).addReg(regNum, false); |
3058 |
++ MachineInstrBuilder(newInstr).addReg(regNum, false); |
3059 |
+ |
3060 |
+ SHOWNEWINSTR(newInstr); |
3061 |
+ } //insertCondBranchBefore |
3062 |
@@ -6959,12 +7353,11 @@ index 0000000..9de97b6 |
3063 |
+ AMDGPUCFGStructurizer *passRep, |
3064 |
+ RegiT regNum) { |
3065 |
+ const TargetInstrInfo *tii = passRep->getTargetInstrInfo(); |
3066 |
-+ MachineFunction *MF = blk->getParent(); |
3067 |
+ MachineInstr *newInstr = |
3068 |
-+ MF->CreateMachineInstr(tii->get(newOpcode), DebugLoc()); |
3069 |
++ blk->getParent()->CreateMachineInstr(tii->get(newOpcode), DebugLoc()); |
3070 |
+ |
3071 |
+ blk->push_back(newInstr); |
3072 |
-+ MachineInstrBuilder(*MF, newInstr).addReg(regNum, false); |
3073 |
++ MachineInstrBuilder(newInstr).addReg(regNum, false); |
3074 |
+ |
3075 |
+ SHOWNEWINSTR(newInstr); |
3076 |
+ } //insertCondBranchEnd |
3077 |
@@ -7009,14 +7402,12 @@ index 0000000..9de97b6 |
3078 |
+ RegiT src2Reg) { |
3079 |
+ const AMDGPUInstrInfo *tii = |
3080 |
+ static_cast<const AMDGPUInstrInfo *>(passRep->getTargetInstrInfo()); |
3081 |
-+ MachineFunction *MF = blk->getParent(); |
3082 |
+ MachineInstr *newInstr = |
3083 |
-+ MF->CreateMachineInstr(tii->get(tii->getIEQOpcode()), DebugLoc()); |
3084 |
++ blk->getParent()->CreateMachineInstr(tii->get(tii->getIEQOpcode()), DebugLoc()); |
3085 |
+ |
3086 |
-+ MachineInstrBuilder MIB(*MF, newInstr); |
3087 |
-+ MIB.addReg(dstReg, RegState::Define); //set target |
3088 |
-+ MIB.addReg(src1Reg); //set src value |
3089 |
-+ MIB.addReg(src2Reg); //set src value |
3090 |
++ MachineInstrBuilder(newInstr).addReg(dstReg, RegState::Define); //set target |
3091 |
++ MachineInstrBuilder(newInstr).addReg(src1Reg); //set src value |
3092 |
++ MachineInstrBuilder(newInstr).addReg(src2Reg); //set src value |
3093 |
+ |
3094 |
+ blk->insert(instrPos, newInstr); |
3095 |
+ SHOWNEWINSTR(newInstr); |
3096 |
@@ -7872,13 +8263,13 @@ index 0000000..6dc2deb |
3097 |
+ |
3098 |
+} // namespace llvm |
3099 |
+#endif // AMDILEVERGREENDEVICE_H |
3100 |
-diff --git a/lib/Target/R600/AMDILFrameLowering.cpp b/lib/Target/R600/AMDILFrameLowering.cpp |
3101 |
+diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp b/lib/Target/R600/AMDILISelDAGToDAG.cpp |
3102 |
new file mode 100644 |
3103 |
-index 0000000..9ad495a |
3104 |
+index 0000000..2e726e9 |
3105 |
--- /dev/null |
3106 |
-+++ b/lib/Target/R600/AMDILFrameLowering.cpp |
3107 |
-@@ -0,0 +1,47 @@ |
3108 |
-+//===----------------------- AMDILFrameLowering.cpp -----------------*- C++ -*-===// |
3109 |
++++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp |
3110 |
+@@ -0,0 +1,577 @@ |
3111 |
++//===-- AMDILISelDAGToDAG.cpp - A dag to dag inst selector for AMDIL ------===// |
3112 |
+// |
3113 |
+// The LLVM Compiler Infrastructure |
3114 |
+// |
3115 |
@@ -7888,119 +8279,21 @@ index 0000000..9ad495a |
3116 |
+//==-----------------------------------------------------------------------===// |
3117 |
+// |
3118 |
+/// \file |
3119 |
-+/// \brief Interface to describe a layout of a stack frame on a AMDGPU target |
3120 |
-+/// machine. |
3121 |
++/// \brief Defines an instruction selector for the AMDGPU target. |
3122 |
+// |
3123 |
+//===----------------------------------------------------------------------===// |
3124 |
-+#include "AMDILFrameLowering.h" |
3125 |
-+#include "llvm/CodeGen/MachineFrameInfo.h" |
3126 |
-+ |
3127 |
-+using namespace llvm; |
3128 |
-+AMDGPUFrameLowering::AMDGPUFrameLowering(StackDirection D, unsigned StackAl, |
3129 |
-+ int LAO, unsigned TransAl) |
3130 |
-+ : TargetFrameLowering(D, StackAl, LAO, TransAl) { |
3131 |
-+} |
3132 |
-+ |
3133 |
-+AMDGPUFrameLowering::~AMDGPUFrameLowering() { |
3134 |
-+} |
3135 |
-+ |
3136 |
-+int AMDGPUFrameLowering::getFrameIndexOffset(const MachineFunction &MF, |
3137 |
-+ int FI) const { |
3138 |
-+ const MachineFrameInfo *MFI = MF.getFrameInfo(); |
3139 |
-+ return MFI->getObjectOffset(FI); |
3140 |
-+} |
3141 |
-+ |
3142 |
-+const TargetFrameLowering::SpillSlot * |
3143 |
-+AMDGPUFrameLowering::getCalleeSavedSpillSlots(unsigned &NumEntries) const { |
3144 |
-+ NumEntries = 0; |
3145 |
-+ return 0; |
3146 |
-+} |
3147 |
-+void |
3148 |
-+AMDGPUFrameLowering::emitPrologue(MachineFunction &MF) const { |
3149 |
-+} |
3150 |
-+void |
3151 |
-+AMDGPUFrameLowering::emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const { |
3152 |
-+} |
3153 |
-+bool |
3154 |
-+AMDGPUFrameLowering::hasFP(const MachineFunction &MF) const { |
3155 |
-+ return false; |
3156 |
-+} |
3157 |
-diff --git a/lib/Target/R600/AMDILFrameLowering.h b/lib/Target/R600/AMDILFrameLowering.h |
3158 |
-new file mode 100644 |
3159 |
-index 0000000..51337c3 |
3160 |
---- /dev/null |
3161 |
-+++ b/lib/Target/R600/AMDILFrameLowering.h |
3162 |
-@@ -0,0 +1,40 @@ |
3163 |
-+//===--------------------- AMDILFrameLowering.h -----------------*- C++ -*-===// |
3164 |
-+// |
3165 |
-+// The LLVM Compiler Infrastructure |
3166 |
-+// |
3167 |
-+// This file is distributed under the University of Illinois Open Source |
3168 |
-+// License. See LICENSE.TXT for details. |
3169 |
-+// |
3170 |
-+//===----------------------------------------------------------------------===// |
3171 |
-+// |
3172 |
-+/// \file |
3173 |
-+/// \brief Interface to describe a layout of a stack frame on a AMDIL target |
3174 |
-+/// machine. |
3175 |
-+// |
3176 |
-+//===----------------------------------------------------------------------===// |
3177 |
-+#ifndef AMDILFRAME_LOWERING_H |
3178 |
-+#define AMDILFRAME_LOWERING_H |
3179 |
-+ |
3180 |
-+#include "llvm/CodeGen/MachineFunction.h" |
3181 |
-+#include "llvm/Target/TargetFrameLowering.h" |
3182 |
-+ |
3183 |
-+namespace llvm { |
3184 |
-+ |
3185 |
-+/// \brief Information about the stack frame layout on the AMDGPU targets. |
3186 |
-+/// |
3187 |
-+/// It holds the direction of the stack growth, the known stack alignment on |
3188 |
-+/// entry to each function, and the offset to the locals area. |
3189 |
-+/// See TargetFrameInfo for more comments. |
3190 |
-+class AMDGPUFrameLowering : public TargetFrameLowering { |
3191 |
-+public: |
3192 |
-+ AMDGPUFrameLowering(StackDirection D, unsigned StackAl, int LAO, |
3193 |
-+ unsigned TransAl = 1); |
3194 |
-+ virtual ~AMDGPUFrameLowering(); |
3195 |
-+ virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const; |
3196 |
-+ virtual const SpillSlot *getCalleeSavedSpillSlots(unsigned &NumEntries) const; |
3197 |
-+ virtual void emitPrologue(MachineFunction &MF) const; |
3198 |
-+ virtual void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const; |
3199 |
-+ virtual bool hasFP(const MachineFunction &MF) const; |
3200 |
-+}; |
3201 |
-+} // namespace llvm |
3202 |
-+#endif // AMDILFRAME_LOWERING_H |
3203 |
-diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp b/lib/Target/R600/AMDILISelDAGToDAG.cpp |
3204 |
-new file mode 100644 |
3205 |
-index 0000000..d15ed39 |
3206 |
---- /dev/null |
3207 |
-+++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp |
3208 |
-@@ -0,0 +1,485 @@ |
3209 |
-+//===-- AMDILISelDAGToDAG.cpp - A dag to dag inst selector for AMDIL ------===// |
3210 |
-+// |
3211 |
-+// The LLVM Compiler Infrastructure |
3212 |
-+// |
3213 |
-+// This file is distributed under the University of Illinois Open Source |
3214 |
-+// License. See LICENSE.TXT for details. |
3215 |
-+// |
3216 |
-+//==-----------------------------------------------------------------------===// |
3217 |
-+// |
3218 |
-+/// \file |
3219 |
-+/// \brief Defines an instruction selector for the AMDGPU target. |
3220 |
-+// |
3221 |
-+//===----------------------------------------------------------------------===// |
3222 |
-+#include "AMDGPUInstrInfo.h" |
3223 |
-+#include "AMDGPUISelLowering.h" // For AMDGPUISD |
3224 |
-+#include "AMDGPURegisterInfo.h" |
3225 |
-+#include "AMDILDevices.h" |
3226 |
-+#include "R600InstrInfo.h" |
3227 |
-+#include "llvm/ADT/ValueMap.h" |
3228 |
-+#include "llvm/CodeGen/PseudoSourceValue.h" |
3229 |
-+#include "llvm/CodeGen/SelectionDAGISel.h" |
3230 |
-+#include "llvm/Support/Compiler.h" |
3231 |
-+#include <list> |
3232 |
-+#include <queue> |
3233 |
++#include "AMDGPUInstrInfo.h" |
3234 |
++#include "AMDGPUISelLowering.h" // For AMDGPUISD |
3235 |
++#include "AMDGPURegisterInfo.h" |
3236 |
++#include "AMDILDevices.h" |
3237 |
++#include "R600InstrInfo.h" |
3238 |
++#include "llvm/ADT/ValueMap.h" |
3239 |
++#include "llvm/CodeGen/PseudoSourceValue.h" |
3240 |
++#include "llvm/CodeGen/SelectionDAGISel.h" |
3241 |
++#include "llvm/Support/Compiler.h" |
3242 |
++#include "llvm/CodeGen/SelectionDAG.h" |
3243 |
++#include <list> |
3244 |
++#include <queue> |
3245 |
+ |
3246 |
+using namespace llvm; |
3247 |
+ |
3248 |
@@ -8024,6 +8317,7 @@ index 0000000..d15ed39 |
3249 |
+ |
3250 |
+private: |
3251 |
+ inline SDValue getSmallIPtrImm(unsigned Imm); |
3252 |
++ bool FoldOperands(unsigned, const R600InstrInfo *, std::vector<SDValue> &); |
3253 |
+ |
3254 |
+ // Complex pattern selectors |
3255 |
+ bool SelectADDRParam(SDValue Addr, SDValue& R1, SDValue& R2); |
3256 |
@@ -8046,9 +8340,11 @@ index 0000000..d15ed39 |
3257 |
+ static bool isLocalLoad(const LoadSDNode *N); |
3258 |
+ static bool isRegionLoad(const LoadSDNode *N); |
3259 |
+ |
3260 |
-+ bool SelectADDR8BitOffset(SDValue Addr, SDValue& Base, SDValue& Offset); |
3261 |
-+ bool SelectADDRReg(SDValue Addr, SDValue& Base, SDValue& Offset); |
3262 |
++ bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr); |
3263 |
++ bool SelectGlobalValueVariableOffset(SDValue Addr, |
3264 |
++ SDValue &BaseReg, SDValue& Offset); |
3265 |
+ bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base, SDValue &Offset); |
3266 |
++ bool SelectADDRIndirect(SDValue Addr, SDValue &Base, SDValue &Offset); |
3267 |
+ |
3268 |
+ // Include the pieces autogenerated from the target description. |
3269 |
+#include "AMDGPUGenDAGISel.inc" |
3270 |
@@ -8135,16 +8431,6 @@ index 0000000..d15ed39 |
3271 |
+ } |
3272 |
+ switch (Opc) { |
3273 |
+ default: break; |
3274 |
-+ case ISD::FrameIndex: { |
3275 |
-+ if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(N)) { |
3276 |
-+ unsigned int FI = FIN->getIndex(); |
3277 |
-+ EVT OpVT = N->getValueType(0); |
3278 |
-+ unsigned int NewOpc = AMDGPU::COPY; |
3279 |
-+ SDValue TFI = CurDAG->getTargetFrameIndex(FI, MVT::i32); |
3280 |
-+ return CurDAG->SelectNodeTo(N, NewOpc, OpVT, TFI); |
3281 |
-+ } |
3282 |
-+ break; |
3283 |
-+ } |
3284 |
+ case ISD::ConstantFP: |
3285 |
+ case ISD::Constant: { |
3286 |
+ const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>(); |
3287 |
@@ -8203,7 +8489,9 @@ index 0000000..d15ed39 |
3288 |
+ continue; |
3289 |
+ } |
3290 |
+ } else { |
3291 |
-+ if (!TII->isALUInstr(Use->getMachineOpcode())) { |
3292 |
++ if (!TII->isALUInstr(Use->getMachineOpcode()) || |
3293 |
++ (TII->get(Use->getMachineOpcode()).TSFlags & |
3294 |
++ R600_InstFlag::VECTOR)) { |
3295 |
+ continue; |
3296 |
+ } |
3297 |
+ |
3298 |
@@ -8238,7 +8526,116 @@ index 0000000..d15ed39 |
3299 |
+ break; |
3300 |
+ } |
3301 |
+ } |
3302 |
-+ return SelectCode(N); |
3303 |
++ SDNode *Result = SelectCode(N); |
3304 |
++ |
3305 |
++ // Fold operands of selected node |
3306 |
++ |
3307 |
++ const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>(); |
3308 |
++ if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) { |
3309 |
++ const R600InstrInfo *TII = |
3310 |
++ static_cast<const R600InstrInfo*>(TM.getInstrInfo()); |
3311 |
++ if (Result && Result->isMachineOpcode() && |
3312 |
++ !(TII->get(Result->getMachineOpcode()).TSFlags & R600_InstFlag::VECTOR) |
3313 |
++ && TII->isALUInstr(Result->getMachineOpcode())) { |
3314 |
++ // Fold FNEG/FABS/CONST_ADDRESS |
3315 |
++ // TODO: Isel can generate multiple MachineInst, we need to recursively |
3316 |
++ // parse Result |
3317 |
++ bool IsModified = false; |
3318 |
++ do { |
3319 |
++ std::vector<SDValue> Ops; |
3320 |
++ for(SDNode::op_iterator I = Result->op_begin(), E = Result->op_end(); |
3321 |
++ I != E; ++I) |
3322 |
++ Ops.push_back(*I); |
3323 |
++ IsModified = FoldOperands(Result->getMachineOpcode(), TII, Ops); |
3324 |
++ if (IsModified) { |
3325 |
++ Result = CurDAG->UpdateNodeOperands(Result, Ops.data(), Ops.size()); |
3326 |
++ } |
3327 |
++ } while (IsModified); |
3328 |
++ |
3329 |
++ // If node has a single use which is CLAMP_R600, folds it |
3330 |
++ if (Result->hasOneUse() && Result->isMachineOpcode()) { |
3331 |
++ SDNode *PotentialClamp = *Result->use_begin(); |
3332 |
++ if (PotentialClamp->isMachineOpcode() && |
3333 |
++ PotentialClamp->getMachineOpcode() == AMDGPU::CLAMP_R600) { |
3334 |
++ unsigned ClampIdx = |
3335 |
++ TII->getOperandIdx(Result->getMachineOpcode(), R600Operands::CLAMP); |
3336 |
++ std::vector<SDValue> Ops; |
3337 |
++ unsigned NumOp = Result->getNumOperands(); |
3338 |
++ for (unsigned i = 0; i < NumOp; ++i) { |
3339 |
++ Ops.push_back(Result->getOperand(i)); |
3340 |
++ } |
3341 |
++ Ops[ClampIdx - 1] = CurDAG->getTargetConstant(1, MVT::i32); |
3342 |
++ Result = CurDAG->SelectNodeTo(PotentialClamp, |
3343 |
++ Result->getMachineOpcode(), PotentialClamp->getVTList(), |
3344 |
++ Ops.data(), NumOp); |
3345 |
++ } |
3346 |
++ } |
3347 |
++ } |
3348 |
++ } |
3349 |
++ |
3350 |
++ return Result; |
3351 |
++} |
3352 |
++ |
3353 |
++bool AMDGPUDAGToDAGISel::FoldOperands(unsigned Opcode, |
3354 |
++ const R600InstrInfo *TII, std::vector<SDValue> &Ops) { |
3355 |
++ int OperandIdx[] = { |
3356 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC0), |
3357 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC1), |
3358 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC2) |
3359 |
++ }; |
3360 |
++ int SelIdx[] = { |
3361 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC0_SEL), |
3362 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC1_SEL), |
3363 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC2_SEL) |
3364 |
++ }; |
3365 |
++ int NegIdx[] = { |
3366 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC0_NEG), |
3367 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC1_NEG), |
3368 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC2_NEG) |
3369 |
++ }; |
3370 |
++ int AbsIdx[] = { |
3371 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC0_ABS), |
3372 |
++ TII->getOperandIdx(Opcode, R600Operands::SRC1_ABS), |
3373 |
++ -1 |
3374 |
++ }; |
3375 |
++ |
3376 |
++ for (unsigned i = 0; i < 3; i++) { |
3377 |
++ if (OperandIdx[i] < 0) |
3378 |
++ return false; |
3379 |
++ SDValue Operand = Ops[OperandIdx[i] - 1]; |
3380 |
++ switch (Operand.getOpcode()) { |
3381 |
++ case AMDGPUISD::CONST_ADDRESS: { |
3382 |
++ if (i == 2) |
3383 |
++ break; |
3384 |
++ SDValue CstOffset; |
3385 |
++ if (!Operand.getValueType().isVector() && |
3386 |
++ SelectGlobalValueConstantOffset(Operand.getOperand(0), CstOffset)) { |
3387 |
++ Ops[OperandIdx[i] - 1] = CurDAG->getRegister(AMDGPU::ALU_CONST, MVT::f32); |
3388 |
++ Ops[SelIdx[i] - 1] = CstOffset; |
3389 |
++ return true; |
3390 |
++ } |
3391 |
++ } |
3392 |
++ break; |
3393 |
++ case ISD::FNEG: |
3394 |
++ if (NegIdx[i] < 0) |
3395 |
++ break; |
3396 |
++ Ops[OperandIdx[i] - 1] = Operand.getOperand(0); |
3397 |
++ Ops[NegIdx[i] - 1] = CurDAG->getTargetConstant(1, MVT::i32); |
3398 |
++ return true; |
3399 |
++ case ISD::FABS: |
3400 |
++ if (AbsIdx[i] < 0) |
3401 |
++ break; |
3402 |
++ Ops[OperandIdx[i] - 1] = Operand.getOperand(0); |
3403 |
++ Ops[AbsIdx[i] - 1] = CurDAG->getTargetConstant(1, MVT::i32); |
3404 |
++ return true; |
3405 |
++ case ISD::BITCAST: |
3406 |
++ Ops[OperandIdx[i] - 1] = Operand.getOperand(0); |
3407 |
++ return true; |
3408 |
++ default: |
3409 |
++ break; |
3410 |
++ } |
3411 |
++ } |
3412 |
++ return false; |
3413 |
+} |
3414 |
+ |
3415 |
+bool AMDGPUDAGToDAGISel::checkType(const Value *ptr, unsigned int addrspace) { |
3416 |
@@ -8385,41 +8782,23 @@ index 0000000..d15ed39 |
3417 |
+ |
3418 |
+///==== AMDGPU Functions ====/// |
3419 |
+ |
3420 |
-+bool AMDGPUDAGToDAGISel::SelectADDR8BitOffset(SDValue Addr, SDValue& Base, |
3421 |
-+ SDValue& Offset) { |
3422 |
-+ if (Addr.getOpcode() == ISD::TargetExternalSymbol || |
3423 |
-+ Addr.getOpcode() == ISD::TargetGlobalAddress) { |
3424 |
-+ return false; |
3425 |
++bool AMDGPUDAGToDAGISel::SelectGlobalValueConstantOffset(SDValue Addr, |
3426 |
++ SDValue& IntPtr) { |
3427 |
++ if (ConstantSDNode *Cst = dyn_cast<ConstantSDNode>(Addr)) { |
3428 |
++ IntPtr = CurDAG->getIntPtrConstant(Cst->getZExtValue() / 4, true); |
3429 |
++ return true; |
3430 |
+ } |
3431 |
++ return false; |
3432 |
++} |
3433 |
+ |
3434 |
-+ |
3435 |
-+ if (Addr.getOpcode() == ISD::ADD) { |
3436 |
-+ bool Match = false; |
3437 |
-+ |
3438 |
-+ // Find the base ptr and the offset |
3439 |
-+ for (unsigned i = 0; i < Addr.getNumOperands(); i++) { |
3440 |
-+ SDValue Arg = Addr.getOperand(i); |
3441 |
-+ ConstantSDNode * OffsetNode = dyn_cast<ConstantSDNode>(Arg); |
3442 |
-+ // This arg isn't a constant so it must be the base PTR. |
3443 |
-+ if (!OffsetNode) { |
3444 |
-+ Base = Addr.getOperand(i); |
3445 |
-+ continue; |
3446 |
-+ } |
3447 |
-+ // Check if the constant argument fits in 8-bits. The offset is in bytes |
3448 |
-+ // so we need to convert it to dwords. |
3449 |
-+ if (isUInt<8>(OffsetNode->getZExtValue() >> 2)) { |
3450 |
-+ Match = true; |
3451 |
-+ Offset = CurDAG->getTargetConstant(OffsetNode->getZExtValue() >> 2, |
3452 |
-+ MVT::i32); |
3453 |
-+ } |
3454 |
-+ } |
3455 |
-+ return Match; |
3456 |
++bool AMDGPUDAGToDAGISel::SelectGlobalValueVariableOffset(SDValue Addr, |
3457 |
++ SDValue& BaseReg, SDValue &Offset) { |
3458 |
++ if (!dyn_cast<ConstantSDNode>(Addr)) { |
3459 |
++ BaseReg = Addr; |
3460 |
++ Offset = CurDAG->getIntPtrConstant(0, true); |
3461 |
++ return true; |
3462 |
+ } |
3463 |
-+ |
3464 |
-+ // Default case, no offset |
3465 |
-+ Base = Addr; |
3466 |
-+ Offset = CurDAG->getTargetConstant(0, MVT::i32); |
3467 |
-+ return true; |
3468 |
++ return false; |
3469 |
+} |
3470 |
+ |
3471 |
+bool AMDGPUDAGToDAGISel::SelectADDRVTX_READ(SDValue Addr, SDValue &Base, |
3472 |
@@ -8449,16 +8828,21 @@ index 0000000..d15ed39 |
3473 |
+ return true; |
3474 |
+} |
3475 |
+ |
3476 |
-+bool AMDGPUDAGToDAGISel::SelectADDRReg(SDValue Addr, SDValue& Base, |
3477 |
-+ SDValue& Offset) { |
3478 |
-+ if (Addr.getOpcode() == ISD::TargetExternalSymbol || |
3479 |
-+ Addr.getOpcode() == ISD::TargetGlobalAddress || |
3480 |
-+ Addr.getOpcode() != ISD::ADD) { |
3481 |
-+ return false; |
3482 |
-+ } |
3483 |
++bool AMDGPUDAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base, |
3484 |
++ SDValue &Offset) { |
3485 |
++ ConstantSDNode *C; |
3486 |
+ |
3487 |
-+ Base = Addr.getOperand(0); |
3488 |
-+ Offset = Addr.getOperand(1); |
3489 |
++ if ((C = dyn_cast<ConstantSDNode>(Addr))) { |
3490 |
++ Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32); |
3491 |
++ Offset = CurDAG->getTargetConstant(C->getZExtValue(), MVT::i32); |
3492 |
++ } else if ((Addr.getOpcode() == ISD::ADD || Addr.getOpcode() == ISD::OR) && |
3493 |
++ (C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) { |
3494 |
++ Base = Addr.getOperand(0); |
3495 |
++ Offset = CurDAG->getTargetConstant(C->getZExtValue(), MVT::i32); |
3496 |
++ } else { |
3497 |
++ Base = Addr; |
3498 |
++ Offset = CurDAG->getTargetConstant(0, MVT::i32); |
3499 |
++ } |
3500 |
+ |
3501 |
+ return true; |
3502 |
+} |
3503 |
@@ -9857,10 +10241,10 @@ index 0000000..bc7df37 |
3504 |
+#endif // AMDILNIDEVICE_H |
3505 |
diff --git a/lib/Target/R600/AMDILPeepholeOptimizer.cpp b/lib/Target/R600/AMDILPeepholeOptimizer.cpp |
3506 |
new file mode 100644 |
3507 |
-index 0000000..4a748b8 |
3508 |
+index 0000000..57317ac |
3509 |
--- /dev/null |
3510 |
+++ b/lib/Target/R600/AMDILPeepholeOptimizer.cpp |
3511 |
-@@ -0,0 +1,1215 @@ |
3512 |
+@@ -0,0 +1,1256 @@ |
3513 |
+//===-- AMDILPeepholeOptimizer.cpp - AMDGPU Peephole optimizations ---------===// |
3514 |
+// |
3515 |
+// The LLVM Compiler Infrastructure |
3516 |
@@ -10409,14 +10793,51 @@ index 0000000..4a748b8 |
3517 |
+ lhsMaskOffset = lhsMaskVal ? CountTrailingZeros_32(lhsMaskVal) : lhsShiftVal; |
3518 |
+ rhsMaskOffset = rhsMaskVal ? CountTrailingZeros_32(rhsMaskVal) : rhsShiftVal; |
3519 |
+ // TODO: Handle the case of A & B | D & ~B(i.e. inverted masks). |
3520 |
++ if (mDebug) { |
3521 |
++ dbgs() << "Found pattern: \'((A" << (LHSMask ? " & B)" : ")"); |
3522 |
++ dbgs() << (LHSShift ? " << C)" : ")") << " | ((D" ; |
3523 |
++ dbgs() << (RHSMask ? " & E)" : ")"); |
3524 |
++ dbgs() << (RHSShift ? " << F)\'\n" : ")\'\n"); |
3525 |
++ dbgs() << "A = LHSSrc\t\tD = RHSSrc \n"; |
3526 |
++ dbgs() << "B = " << lhsMaskVal << "\t\tE = " << rhsMaskVal << "\n"; |
3527 |
++ dbgs() << "C = " << lhsShiftVal << "\t\tF = " << rhsShiftVal << "\n"; |
3528 |
++ dbgs() << "width(B) = " << lhsMaskWidth; |
3529 |
++ dbgs() << "\twidth(E) = " << rhsMaskWidth << "\n"; |
3530 |
++ dbgs() << "offset(B) = " << lhsMaskOffset; |
3531 |
++ dbgs() << "\toffset(E) = " << rhsMaskOffset << "\n"; |
3532 |
++ dbgs() << "Constraints: \n"; |
3533 |
++ dbgs() << "\t(1) B ^ E == 0\n"; |
3534 |
++ dbgs() << "\t(2-LHS) B is a mask\n"; |
3535 |
++ dbgs() << "\t(2-LHS) E is a mask\n"; |
3536 |
++ dbgs() << "\t(3-LHS) (offset(B)) >= (width(E) + offset(E))\n"; |
3537 |
++ dbgs() << "\t(3-RHS) (offset(E)) >= (width(B) + offset(B))\n"; |
3538 |
++ } |
3539 |
+ if ((lhsMaskVal || rhsMaskVal) && !(lhsMaskVal ^ rhsMaskVal)) { |
3540 |
++ if (mDebug) { |
3541 |
++ dbgs() << lhsMaskVal << " ^ " << rhsMaskVal; |
3542 |
++ dbgs() << " = " << (lhsMaskVal ^ rhsMaskVal) << "\n"; |
3543 |
++ dbgs() << "Failed constraint 1!\n"; |
3544 |
++ } |
3545 |
+ return false; |
3546 |
+ } |
3547 |
++ if (mDebug) { |
3548 |
++ dbgs() << "LHS = " << lhsMaskOffset << ""; |
3549 |
++ dbgs() << " >= (" << rhsMaskWidth << " + " << rhsMaskOffset << ") = "; |
3550 |
++ dbgs() << (lhsMaskOffset >= (rhsMaskWidth + rhsMaskOffset)); |
3551 |
++ dbgs() << "\nRHS = " << rhsMaskOffset << ""; |
3552 |
++ dbgs() << " >= (" << lhsMaskWidth << " + " << lhsMaskOffset << ") = "; |
3553 |
++ dbgs() << (rhsMaskOffset >= (lhsMaskWidth + lhsMaskOffset)); |
3554 |
++ dbgs() << "\n"; |
3555 |
++ } |
3556 |
+ if (lhsMaskOffset >= (rhsMaskWidth + rhsMaskOffset)) { |
3557 |
+ offset = ConstantInt::get(aType, lhsMaskOffset, false); |
3558 |
+ width = ConstantInt::get(aType, lhsMaskWidth, false); |
3559 |
+ RHSSrc = RHS; |
3560 |
+ if (!isMask_32(lhsMaskVal) && !isShiftedMask_32(lhsMaskVal)) { |
3561 |
++ if (mDebug) { |
3562 |
++ dbgs() << "Value is not a Mask: " << lhsMaskVal << "\n"; |
3563 |
++ dbgs() << "Failed constraint 2!\n"; |
3564 |
++ } |
3565 |
+ return false; |
3566 |
+ } |
3567 |
+ if (!LHSShift) { |
3568 |
@@ -10435,6 +10856,10 @@ index 0000000..4a748b8 |
3569 |
+ LHSSrc = RHSSrc; |
3570 |
+ RHSSrc = LHS; |
3571 |
+ if (!isMask_32(rhsMaskVal) && !isShiftedMask_32(rhsMaskVal)) { |
3572 |
++ if (mDebug) { |
3573 |
++ dbgs() << "Non-Mask: " << rhsMaskVal << "\n"; |
3574 |
++ dbgs() << "Failed constraint 2!\n"; |
3575 |
++ } |
3576 |
+ return false; |
3577 |
+ } |
3578 |
+ if (!RHSShift) { |
3579 |
@@ -11287,10 +11712,10 @@ index 0000000..5b2cb25 |
3580 |
+#endif // AMDILSIDEVICE_H |
3581 |
diff --git a/lib/Target/R600/CMakeLists.txt b/lib/Target/R600/CMakeLists.txt |
3582 |
new file mode 100644 |
3583 |
-index 0000000..ce0b56b |
3584 |
+index 0000000..8ef9f8c |
3585 |
--- /dev/null |
3586 |
+++ b/lib/Target/R600/CMakeLists.txt |
3587 |
-@@ -0,0 +1,55 @@ |
3588 |
+@@ -0,0 +1,56 @@ |
3589 |
+set(LLVM_TARGET_DEFINITIONS AMDGPU.td) |
3590 |
+ |
3591 |
+tablegen(LLVM AMDGPUGenRegisterInfo.inc -gen-register-info) |
3592 |
@@ -11304,7 +11729,7 @@ index 0000000..ce0b56b |
3593 |
+tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer) |
3594 |
+add_public_tablegen_target(AMDGPUCommonTableGen) |
3595 |
+ |
3596 |
-+add_llvm_target(R600CodeGen |
3597 |
++add_llvm_target(AMDGPUCodeGen |
3598 |
+ AMDIL7XXDevice.cpp |
3599 |
+ AMDILCFGStructurizer.cpp |
3600 |
+ AMDILDevice.cpp |
3601 |
@@ -11318,9 +11743,9 @@ index 0000000..ce0b56b |
3602 |
+ AMDILPeepholeOptimizer.cpp |
3603 |
+ AMDILSIDevice.cpp |
3604 |
+ AMDGPUAsmPrinter.cpp |
3605 |
++ AMDGPUIndirectAddressing.cpp |
3606 |
+ AMDGPUMCInstLower.cpp |
3607 |
+ AMDGPUSubtarget.cpp |
3608 |
-+ AMDGPUStructurizeCFG.cpp |
3609 |
+ AMDGPUTargetMachine.cpp |
3610 |
+ AMDGPUISelLowering.cpp |
3611 |
+ AMDGPUConvertToISA.cpp |
3612 |
@@ -11329,9 +11754,9 @@ index 0000000..ce0b56b |
3613 |
+ R600ExpandSpecialInstrs.cpp |
3614 |
+ R600InstrInfo.cpp |
3615 |
+ R600ISelLowering.cpp |
3616 |
++ R600LowerConstCopy.cpp |
3617 |
+ R600MachineFunctionInfo.cpp |
3618 |
+ R600RegisterInfo.cpp |
3619 |
-+ SIAnnotateControlFlow.cpp |
3620 |
+ SIAssignInterpRegs.cpp |
3621 |
+ SIInstrInfo.cpp |
3622 |
+ SIISelLowering.cpp |
3623 |
@@ -11339,6 +11764,7 @@ index 0000000..ce0b56b |
3624 |
+ SILowerControlFlow.cpp |
3625 |
+ SIMachineFunctionInfo.cpp |
3626 |
+ SIRegisterInfo.cpp |
3627 |
++ SIFixSGPRLiveness.cpp |
3628 |
+ ) |
3629 |
+ |
3630 |
+add_dependencies(LLVMR600CodeGen intrinsics_gen) |
3631 |
@@ -11348,10 +11774,10 @@ index 0000000..ce0b56b |
3632 |
+add_subdirectory(MCTargetDesc) |
3633 |
diff --git a/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp |
3634 |
new file mode 100644 |
3635 |
-index 0000000..e6c550b |
3636 |
+index 0000000..d6450a0 |
3637 |
--- /dev/null |
3638 |
+++ b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp |
3639 |
-@@ -0,0 +1,132 @@ |
3640 |
+@@ -0,0 +1,168 @@ |
3641 |
+//===-- AMDGPUInstPrinter.cpp - AMDGPU MC Inst -> ASM ---------------------===// |
3642 |
+// |
3643 |
+// The LLVM Compiler Infrastructure |
3644 |
@@ -11394,6 +11820,21 @@ index 0000000..e6c550b |
3645 |
+ } |
3646 |
+} |
3647 |
+ |
3648 |
++void AMDGPUInstPrinter::printInterpSlot(const MCInst *MI, unsigned OpNum, |
3649 |
++ raw_ostream &O) { |
3650 |
++ unsigned Imm = MI->getOperand(OpNum).getImm(); |
3651 |
++ |
3652 |
++ if (Imm == 2) { |
3653 |
++ O << "P0"; |
3654 |
++ } else if (Imm == 1) { |
3655 |
++ O << "P20"; |
3656 |
++ } else if (Imm == 0) { |
3657 |
++ O << "P10"; |
3658 |
++ } else { |
3659 |
++ assert(!"Invalid interpolation parameter slot"); |
3660 |
++ } |
3661 |
++} |
3662 |
++ |
3663 |
+void AMDGPUInstPrinter::printMemOperand(const MCInst *MI, unsigned OpNo, |
3664 |
+ raw_ostream &O) { |
3665 |
+ printOperand(MI, OpNo, O); |
3666 |
@@ -11459,10 +11900,7 @@ index 0000000..e6c550b |
3667 |
+ |
3668 |
+void AMDGPUInstPrinter::printRel(const MCInst *MI, unsigned OpNo, |
3669 |
+ raw_ostream &O) { |
3670 |
-+ const MCOperand &Op = MI->getOperand(OpNo); |
3671 |
-+ if (Op.getImm() != 0) { |
3672 |
-+ O << " + " << Op.getImm(); |
3673 |
-+ } |
3674 |
++ printIfSet(MI, OpNo, O, "+"); |
3675 |
+} |
3676 |
+ |
3677 |
+void AMDGPUInstPrinter::printUpdateExecMask(const MCInst *MI, unsigned OpNo, |
3678 |
@@ -11483,13 +11921,37 @@ index 0000000..e6c550b |
3679 |
+ } |
3680 |
+} |
3681 |
+ |
3682 |
++void AMDGPUInstPrinter::printSel(const MCInst *MI, unsigned OpNo, |
3683 |
++ raw_ostream &O) { |
3684 |
++ const char * chans = "XYZW"; |
3685 |
++ int sel = MI->getOperand(OpNo).getImm(); |
3686 |
++ |
3687 |
++ int chan = sel & 3; |
3688 |
++ sel >>= 2; |
3689 |
++ |
3690 |
++ if (sel >= 512) { |
3691 |
++ sel -= 512; |
3692 |
++ int cb = sel >> 12; |
3693 |
++ sel &= 4095; |
3694 |
++ O << cb << "[" << sel << "]"; |
3695 |
++ } else if (sel >= 448) { |
3696 |
++ sel -= 448; |
3697 |
++ O << sel; |
3698 |
++ } else if (sel >= 0){ |
3699 |
++ O << sel; |
3700 |
++ } |
3701 |
++ |
3702 |
++ if (sel >= 0) |
3703 |
++ O << "." << chans[chan]; |
3704 |
++} |
3705 |
++ |
3706 |
+#include "AMDGPUGenAsmWriter.inc" |
3707 |
diff --git a/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h |
3708 |
new file mode 100644 |
3709 |
-index 0000000..96e0e46 |
3710 |
+index 0000000..767a708 |
3711 |
--- /dev/null |
3712 |
+++ b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h |
3713 |
-@@ -0,0 +1,52 @@ |
3714 |
+@@ -0,0 +1,54 @@ |
3715 |
+//===-- AMDGPUInstPrinter.h - AMDGPU MC Inst -> ASM interface ---*- C++ -*-===// |
3716 |
+// |
3717 |
+// The LLVM Compiler Infrastructure |
3718 |
@@ -11525,6 +11987,7 @@ index 0000000..96e0e46 |
3719 |
+ |
3720 |
+private: |
3721 |
+ void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O); |
3722 |
++ void printInterpSlot(const MCInst *MI, unsigned OpNum, raw_ostream &O); |
3723 |
+ void printMemOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O); |
3724 |
+ void printIfSet(const MCInst *MI, unsigned OpNo, raw_ostream &O, StringRef Asm); |
3725 |
+ void printAbs(const MCInst *MI, unsigned OpNo, raw_ostream &O); |
3726 |
@@ -11537,6 +12000,7 @@ index 0000000..96e0e46 |
3727 |
+ void printUpdateExecMask(const MCInst *MI, unsigned OpNo, raw_ostream &O); |
3728 |
+ void printUpdatePred(const MCInst *MI, unsigned OpNo, raw_ostream &O); |
3729 |
+ void printWrite(const MCInst *MI, unsigned OpNo, raw_ostream &O); |
3730 |
++ void printSel(const MCInst *MI, unsigned OpNo, raw_ostream &O); |
3731 |
+}; |
3732 |
+ |
3733 |
+} // End namespace llvm |
3734 |
@@ -11544,7 +12008,7 @@ index 0000000..96e0e46 |
3735 |
+#endif // AMDGPUINSTRPRINTER_H |
3736 |
diff --git a/lib/Target/R600/InstPrinter/CMakeLists.txt b/lib/Target/R600/InstPrinter/CMakeLists.txt |
3737 |
new file mode 100644 |
3738 |
-index 0000000..069c55b |
3739 |
+index 0000000..6776337 |
3740 |
--- /dev/null |
3741 |
+++ b/lib/Target/R600/InstPrinter/CMakeLists.txt |
3742 |
@@ -0,0 +1,7 @@ |
3743 |
@@ -11554,7 +12018,7 @@ index 0000000..069c55b |
3744 |
+ AMDGPUInstPrinter.cpp |
3745 |
+ ) |
3746 |
+ |
3747 |
-+add_dependencies(LLVMR600AsmPrinter AMDGPUCommonTableGen) |
3748 |
++add_dependencies(LLVMR600AsmPrinter R600CommonTableGen) |
3749 |
diff --git a/lib/Target/R600/InstPrinter/LLVMBuild.txt b/lib/Target/R600/InstPrinter/LLVMBuild.txt |
3750 |
new file mode 100644 |
3751 |
index 0000000..ec0be89 |
3752 |
@@ -11869,10 +12333,10 @@ index 0000000..3ad0fa6 |
3753 |
+#endif // AMDGPUMCASMINFO_H |
3754 |
diff --git a/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h b/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h |
3755 |
new file mode 100644 |
3756 |
-index 0000000..9d0d6cf |
3757 |
+index 0000000..8721f80 |
3758 |
--- /dev/null |
3759 |
+++ b/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h |
3760 |
-@@ -0,0 +1,60 @@ |
3761 |
+@@ -0,0 +1,49 @@ |
3762 |
+//===-- AMDGPUCodeEmitter.h - AMDGPU Code Emitter interface -----------------===// |
3763 |
+// |
3764 |
+// The LLVM Compiler Infrastructure |
3765 |
@@ -11917,17 +12381,6 @@ index 0000000..9d0d6cf |
3766 |
+ SmallVectorImpl<MCFixup> &Fixups) const { |
3767 |
+ return 0; |
3768 |
+ } |
3769 |
-+ virtual uint64_t VOPPostEncode(const MCInst &MI, uint64_t Value) const { |
3770 |
-+ return Value; |
3771 |
-+ } |
3772 |
-+ virtual uint64_t i32LiteralEncode(const MCInst &MI, unsigned OpNo, |
3773 |
-+ SmallVectorImpl<MCFixup> &Fixups) const { |
3774 |
-+ return 0; |
3775 |
-+ } |
3776 |
-+ virtual uint32_t SMRDmemriEncode(const MCInst &MI, unsigned OpNo, |
3777 |
-+ SmallVectorImpl<MCFixup> &Fixups) const { |
3778 |
-+ return 0; |
3779 |
-+ } |
3780 |
+}; |
3781 |
+ |
3782 |
+} // End namespace llvm |
3783 |
@@ -12182,10 +12635,10 @@ index 0000000..8894a76 |
3784 |
+include $(LEVEL)/Makefile.common |
3785 |
diff --git a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp |
3786 |
new file mode 100644 |
3787 |
-index 0000000..dc91924 |
3788 |
+index 0000000..115fe8d |
3789 |
--- /dev/null |
3790 |
+++ b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp |
3791 |
-@@ -0,0 +1,575 @@ |
3792 |
+@@ -0,0 +1,582 @@ |
3793 |
+//===- R600MCCodeEmitter.cpp - Code Emitter for R600->Cayman GPU families -===// |
3794 |
+// |
3795 |
+// The LLVM Compiler Infrastructure |
3796 |
@@ -12252,8 +12705,8 @@ index 0000000..dc91924 |
3797 |
+ void EmitALUInstr(const MCInst &MI, SmallVectorImpl<MCFixup> &Fixups, |
3798 |
+ raw_ostream &OS) const; |
3799 |
+ void EmitSrc(const MCInst &MI, unsigned OpIdx, raw_ostream &OS) const; |
3800 |
-+ void EmitSrcISA(const MCInst &MI, unsigned OpIdx, uint64_t &Value, |
3801 |
-+ raw_ostream &OS) const; |
3802 |
++ void EmitSrcISA(const MCInst &MI, unsigned RegOpIdx, unsigned SelOpIdx, |
3803 |
++ raw_ostream &OS) const; |
3804 |
+ void EmitDst(const MCInst &MI, raw_ostream &OS) const; |
3805 |
+ void EmitTexInstr(const MCInst &MI, SmallVectorImpl<MCFixup> &Fixups, |
3806 |
+ raw_ostream &OS) const; |
3807 |
@@ -12350,9 +12803,12 @@ index 0000000..dc91924 |
3808 |
+ case AMDGPU::VTX_READ_PARAM_8_eg: |
3809 |
+ case AMDGPU::VTX_READ_PARAM_16_eg: |
3810 |
+ case AMDGPU::VTX_READ_PARAM_32_eg: |
3811 |
++ case AMDGPU::VTX_READ_PARAM_128_eg: |
3812 |
+ case AMDGPU::VTX_READ_GLOBAL_8_eg: |
3813 |
+ case AMDGPU::VTX_READ_GLOBAL_32_eg: |
3814 |
-+ case AMDGPU::VTX_READ_GLOBAL_128_eg: { |
3815 |
++ case AMDGPU::VTX_READ_GLOBAL_128_eg: |
3816 |
++ case AMDGPU::TEX_VTX_CONSTBUF: |
3817 |
++ case AMDGPU::TEX_VTX_TEXBUF : { |
3818 |
+ uint64_t InstWord01 = getBinaryCodeForInstr(MI, Fixups); |
3819 |
+ uint32_t InstWord2 = MI.getOperand(2).getImm(); // Offset |
3820 |
+ |
3821 |
@@ -12382,7 +12838,6 @@ index 0000000..dc91924 |
3822 |
+ SmallVectorImpl<MCFixup> &Fixups, |
3823 |
+ raw_ostream &OS) const { |
3824 |
+ const MCInstrDesc &MCDesc = MCII.get(MI.getOpcode()); |
3825 |
-+ unsigned NumOperands = MI.getNumOperands(); |
3826 |
+ |
3827 |
+ // Emit instruction type |
3828 |
+ EmitByte(INSTR_ALU, OS); |
3829 |
@@ -12398,19 +12853,21 @@ index 0000000..dc91924 |
3830 |
+ InstWord01 |= ISAOpCode << 1; |
3831 |
+ } |
3832 |
+ |
3833 |
-+ unsigned SrcIdx = 0; |
3834 |
-+ for (unsigned int OpIdx = 1; OpIdx < NumOperands; ++OpIdx) { |
3835 |
-+ if (MI.getOperand(OpIdx).isImm() || MI.getOperand(OpIdx).isFPImm() || |
3836 |
-+ OpIdx == (unsigned)MCDesc.findFirstPredOperandIdx()) { |
3837 |
-+ continue; |
3838 |
-+ } |
3839 |
-+ EmitSrcISA(MI, OpIdx, InstWord01, OS); |
3840 |
-+ SrcIdx++; |
3841 |
-+ } |
3842 |
++ unsigned SrcNum = MCDesc.TSFlags & R600_InstFlag::OP3 ? 3 : |
3843 |
++ MCDesc.TSFlags & R600_InstFlag::OP2 ? 2 : 1; |
3844 |
++ |
3845 |
++ EmitByte(SrcNum, OS); |
3846 |
++ |
3847 |
++ const unsigned SrcOps[3][2] = { |
3848 |
++ {R600Operands::SRC0, R600Operands::SRC0_SEL}, |
3849 |
++ {R600Operands::SRC1, R600Operands::SRC1_SEL}, |
3850 |
++ {R600Operands::SRC2, R600Operands::SRC2_SEL} |
3851 |
++ }; |
3852 |
+ |
3853 |
-+ // Emit zeros for unused sources |
3854 |
-+ for ( ; SrcIdx < 3; SrcIdx++) { |
3855 |
-+ EmitNullBytes(SRC_BYTE_COUNT - 6, OS); |
3856 |
++ for (unsigned SrcIdx = 0; SrcIdx < SrcNum; ++SrcIdx) { |
3857 |
++ unsigned RegOpIdx = R600Operands::ALUOpTable[SrcNum-1][SrcOps[SrcIdx][0]]; |
3858 |
++ unsigned SelOpIdx = R600Operands::ALUOpTable[SrcNum-1][SrcOps[SrcIdx][1]]; |
3859 |
++ EmitSrcISA(MI, RegOpIdx, SelOpIdx, OS); |
3860 |
+ } |
3861 |
+ |
3862 |
+ Emit(InstWord01, OS); |
3863 |
@@ -12481,34 +12938,37 @@ index 0000000..dc91924 |
3864 |
+ |
3865 |
+} |
3866 |
+ |
3867 |
-+void R600MCCodeEmitter::EmitSrcISA(const MCInst &MI, unsigned OpIdx, |
3868 |
-+ uint64_t &Value, raw_ostream &OS) const { |
3869 |
-+ const MCOperand &MO = MI.getOperand(OpIdx); |
3870 |
++void R600MCCodeEmitter::EmitSrcISA(const MCInst &MI, unsigned RegOpIdx, |
3871 |
++ unsigned SelOpIdx, raw_ostream &OS) const { |
3872 |
++ const MCOperand &RegMO = MI.getOperand(RegOpIdx); |
3873 |
++ const MCOperand &SelMO = MI.getOperand(SelOpIdx); |
3874 |
++ |
3875 |
+ union { |
3876 |
+ float f; |
3877 |
+ uint32_t i; |
3878 |
+ } InlineConstant; |
3879 |
+ InlineConstant.i = 0; |
3880 |
-+ // Emit the source select (2 bytes). For GPRs, this is the register index. |
3881 |
-+ // For other potential instruction operands, (e.g. constant registers) the |
3882 |
-+ // value of the source select is defined in the r600isa docs. |
3883 |
-+ if (MO.isReg()) { |
3884 |
-+ unsigned Reg = MO.getReg(); |
3885 |
-+ if (AMDGPUMCRegisterClasses[AMDGPU::R600_CReg32RegClassID].contains(Reg)) { |
3886 |
-+ EmitByte(1, OS); |
3887 |
-+ } else { |
3888 |
-+ EmitByte(0, OS); |
3889 |
-+ } |
3890 |
++ // Emit source type (1 byte) and source select (4 bytes). For GPRs type is 0 |
3891 |
++ // and select is 0 (GPR index is encoded in the instr encoding. For constants |
3892 |
++ // type is 1 and select is the original const select passed from the driver. |
3893 |
++ unsigned Reg = RegMO.getReg(); |
3894 |
++ if (Reg == AMDGPU::ALU_CONST) { |
3895 |
++ EmitByte(1, OS); |
3896 |
++ uint32_t Sel = SelMO.getImm(); |
3897 |
++ Emit(Sel, OS); |
3898 |
++ } else { |
3899 |
++ EmitByte(0, OS); |
3900 |
++ Emit((uint32_t)0, OS); |
3901 |
++ } |
3902 |
+ |
3903 |
-+ if (Reg == AMDGPU::ALU_LITERAL_X) { |
3904 |
-+ unsigned ImmOpIndex = MI.getNumOperands() - 1; |
3905 |
-+ MCOperand ImmOp = MI.getOperand(ImmOpIndex); |
3906 |
-+ if (ImmOp.isFPImm()) { |
3907 |
-+ InlineConstant.f = ImmOp.getFPImm(); |
3908 |
-+ } else { |
3909 |
-+ assert(ImmOp.isImm()); |
3910 |
-+ InlineConstant.i = ImmOp.getImm(); |
3911 |
-+ } |
3912 |
++ if (Reg == AMDGPU::ALU_LITERAL_X) { |
3913 |
++ unsigned ImmOpIndex = MI.getNumOperands() - 1; |
3914 |
++ MCOperand ImmOp = MI.getOperand(ImmOpIndex); |
3915 |
++ if (ImmOp.isFPImm()) { |
3916 |
++ InlineConstant.f = ImmOp.getFPImm(); |
3917 |
++ } else { |
3918 |
++ assert(ImmOp.isImm()); |
3919 |
++ InlineConstant.i = ImmOp.getImm(); |
3920 |
+ } |
3921 |
+ } |
3922 |
+ |
3923 |
@@ -12763,10 +13223,10 @@ index 0000000..dc91924 |
3924 |
+#include "AMDGPUGenMCCodeEmitter.inc" |
3925 |
diff --git a/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp b/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp |
3926 |
new file mode 100644 |
3927 |
-index 0000000..c47dc99 |
3928 |
+index 0000000..6dfbbe8 |
3929 |
--- /dev/null |
3930 |
+++ b/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp |
3931 |
-@@ -0,0 +1,298 @@ |
3932 |
+@@ -0,0 +1,235 @@ |
3933 |
+//===-- SIMCCodeEmitter.cpp - SI Code Emitter -------------------------------===// |
3934 |
+// |
3935 |
+// The LLVM Compiler Infrastructure |
3936 |
@@ -12793,38 +13253,16 @@ index 0000000..c47dc99 |
3937 |
+#include "llvm/MC/MCFixup.h" |
3938 |
+#include "llvm/Support/raw_ostream.h" |
3939 |
+ |
3940 |
-+#define VGPR_BIT(src_idx) (1ULL << (9 * src_idx - 1)) |
3941 |
-+#define SI_INSTR_FLAGS_ENCODING_MASK 0xf |
3942 |
-+ |
3943 |
-+// These must be kept in sync with SIInstructions.td and also the |
3944 |
-+// InstrEncodingInfo array in SIInstrInfo.cpp. |
3945 |
-+// |
3946 |
-+// NOTE: This enum is only used to identify the encoding type within LLVM, |
3947 |
-+// the actual encoding type that is part of the instruction format is different |
3948 |
-+namespace SIInstrEncodingType { |
3949 |
-+ enum Encoding { |
3950 |
-+ EXP = 0, |
3951 |
-+ LDS = 1, |
3952 |
-+ MIMG = 2, |
3953 |
-+ MTBUF = 3, |
3954 |
-+ MUBUF = 4, |
3955 |
-+ SMRD = 5, |
3956 |
-+ SOP1 = 6, |
3957 |
-+ SOP2 = 7, |
3958 |
-+ SOPC = 8, |
3959 |
-+ SOPK = 9, |
3960 |
-+ SOPP = 10, |
3961 |
-+ VINTRP = 11, |
3962 |
-+ VOP1 = 12, |
3963 |
-+ VOP2 = 13, |
3964 |
-+ VOP3 = 14, |
3965 |
-+ VOPC = 15 |
3966 |
-+ }; |
3967 |
-+} |
3968 |
-+ |
3969 |
+using namespace llvm; |
3970 |
+ |
3971 |
+namespace { |
3972 |
++ |
3973 |
++/// \brief Helper type used in encoding |
3974 |
++typedef union { |
3975 |
++ int32_t I; |
3976 |
++ float F; |
3977 |
++} IntFloatUnion; |
3978 |
++ |
3979 |
+class SIMCCodeEmitter : public AMDGPUMCCodeEmitter { |
3980 |
+ SIMCCodeEmitter(const SIMCCodeEmitter &); // DO NOT IMPLEMENT |
3981 |
+ void operator=(const SIMCCodeEmitter &); // DO NOT IMPLEMENT |
3982 |
@@ -12833,6 +13271,15 @@ index 0000000..c47dc99 |
3983 |
+ const MCSubtargetInfo &STI; |
3984 |
+ MCContext &Ctx; |
3985 |
+ |
3986 |
++ /// \brief Encode a sequence of registers with the correct alignment. |
3987 |
++ unsigned GPRAlign(const MCInst &MI, unsigned OpNo, unsigned shift) const; |
3988 |
++ |
3989 |
++ /// \brief Can this operand also contain immediate values? |
3990 |
++ bool isSrcOperand(const MCInstrDesc &Desc, unsigned OpNo) const; |
3991 |
++ |
3992 |
++ /// \brief Encode an fp or int literal |
3993 |
++ uint32_t getLitEncoding(const MCOperand &MO) const; |
3994 |
++ |
3995 |
+public: |
3996 |
+ SIMCCodeEmitter(const MCInstrInfo &mcii, const MCRegisterInfo &mri, |
3997 |
+ const MCSubtargetInfo &sti, MCContext &ctx) |
3998 |
@@ -12848,11 +13295,6 @@ index 0000000..c47dc99 |
3999 |
+ virtual uint64_t getMachineOpValue(const MCInst &MI, const MCOperand &MO, |
4000 |
+ SmallVectorImpl<MCFixup> &Fixups) const; |
4001 |
+ |
4002 |
-+public: |
4003 |
-+ |
4004 |
-+ /// \brief Encode a sequence of registers with the correct alignment. |
4005 |
-+ unsigned GPRAlign(const MCInst &MI, unsigned OpNo, unsigned shift) const; |
4006 |
-+ |
4007 |
+ /// \brief Encoding for when 2 consecutive registers are used |
4008 |
+ virtual unsigned GPR2AlignEncode(const MCInst &MI, unsigned OpNo, |
4009 |
+ SmallVectorImpl<MCFixup> &Fixup) const; |
4010 |
@@ -12860,73 +13302,142 @@ index 0000000..c47dc99 |
4011 |
+ /// \brief Encoding for when 4 consectuive registers are used |
4012 |
+ virtual unsigned GPR4AlignEncode(const MCInst &MI, unsigned OpNo, |
4013 |
+ SmallVectorImpl<MCFixup> &Fixup) const; |
4014 |
++}; |
4015 |
+ |
4016 |
-+ /// \brief Encoding for SMRD indexed loads |
4017 |
-+ virtual uint32_t SMRDmemriEncode(const MCInst &MI, unsigned OpNo, |
4018 |
-+ SmallVectorImpl<MCFixup> &Fixup) const; |
4019 |
++} // End anonymous namespace |
4020 |
++ |
4021 |
++MCCodeEmitter *llvm::createSIMCCodeEmitter(const MCInstrInfo &MCII, |
4022 |
++ const MCRegisterInfo &MRI, |
4023 |
++ const MCSubtargetInfo &STI, |
4024 |
++ MCContext &Ctx) { |
4025 |
++ return new SIMCCodeEmitter(MCII, MRI, STI, Ctx); |
4026 |
++} |
4027 |
+ |
4028 |
-+ /// \brief Post-Encoder method for VOP instructions |
4029 |
-+ virtual uint64_t VOPPostEncode(const MCInst &MI, uint64_t Value) const; |
4030 |
++bool SIMCCodeEmitter::isSrcOperand(const MCInstrDesc &Desc, |
4031 |
++ unsigned OpNo) const { |
4032 |
+ |
4033 |
-+private: |
4034 |
++ unsigned RegClass = Desc.OpInfo[OpNo].RegClass; |
4035 |
++ return (AMDGPU::SSrc_32RegClassID == RegClass) || |
4036 |
++ (AMDGPU::SSrc_64RegClassID == RegClass) || |
4037 |
++ (AMDGPU::VSrc_32RegClassID == RegClass) || |
4038 |
++ (AMDGPU::VSrc_64RegClassID == RegClass); |
4039 |
++} |
4040 |
+ |
4041 |
-+ /// \returns this SIInstrEncodingType for this instruction. |
4042 |
-+ unsigned getEncodingType(const MCInst &MI) const; |
4043 |
++uint32_t SIMCCodeEmitter::getLitEncoding(const MCOperand &MO) const { |
4044 |
+ |
4045 |
-+ /// \brief Get then size in bytes of this instructions encoding. |
4046 |
-+ unsigned getEncodingBytes(const MCInst &MI) const; |
4047 |
++ IntFloatUnion Imm; |
4048 |
++ if (MO.isImm()) |
4049 |
++ Imm.I = MO.getImm(); |
4050 |
++ else if (MO.isFPImm()) |
4051 |
++ Imm.F = MO.getFPImm(); |
4052 |
++ else |
4053 |
++ return ~0; |
4054 |
+ |
4055 |
-+ /// \returns the hardware encoding for a register |
4056 |
-+ unsigned getRegBinaryCode(unsigned reg) const; |
4057 |
++ if (Imm.I >= 0 && Imm.I <= 64) |
4058 |
++ return 128 + Imm.I; |
4059 |
+ |
4060 |
-+ /// \brief Generated function that returns the hardware encoding for |
4061 |
-+ /// a register |
4062 |
-+ unsigned getHWRegNum(unsigned reg) const; |
4063 |
++ if (Imm.I >= -16 && Imm.I <= -1) |
4064 |
++ return 192 + abs(Imm.I); |
4065 |
+ |
4066 |
-+}; |
4067 |
++ if (Imm.F == 0.5f) |
4068 |
++ return 240; |
4069 |
+ |
4070 |
-+} // End anonymous namespace |
4071 |
++ if (Imm.F == -0.5f) |
4072 |
++ return 241; |
4073 |
+ |
4074 |
-+MCCodeEmitter *llvm::createSIMCCodeEmitter(const MCInstrInfo &MCII, |
4075 |
-+ const MCRegisterInfo &MRI, |
4076 |
-+ const MCSubtargetInfo &STI, |
4077 |
-+ MCContext &Ctx) { |
4078 |
-+ return new SIMCCodeEmitter(MCII, MRI, STI, Ctx); |
4079 |
++ if (Imm.F == 1.0f) |
4080 |
++ return 242; |
4081 |
++ |
4082 |
++ if (Imm.F == -1.0f) |
4083 |
++ return 243; |
4084 |
++ |
4085 |
++ if (Imm.F == 2.0f) |
4086 |
++ return 244; |
4087 |
++ |
4088 |
++ if (Imm.F == -2.0f) |
4089 |
++ return 245; |
4090 |
++ |
4091 |
++ if (Imm.F == 4.0f) |
4092 |
++ return 246; |
4093 |
++ |
4094 |
++ if (Imm.F == 4.0f) |
4095 |
++ return 247; |
4096 |
++ |
4097 |
++ return 255; |
4098 |
+} |
4099 |
+ |
4100 |
+void SIMCCodeEmitter::EncodeInstruction(const MCInst &MI, raw_ostream &OS, |
4101 |
+ SmallVectorImpl<MCFixup> &Fixups) const { |
4102 |
++ |
4103 |
+ uint64_t Encoding = getBinaryCodeForInstr(MI, Fixups); |
4104 |
-+ unsigned bytes = getEncodingBytes(MI); |
4105 |
++ const MCInstrDesc &Desc = MCII.get(MI.getOpcode()); |
4106 |
++ unsigned bytes = Desc.getSize(); |
4107 |
++ |
4108 |
+ for (unsigned i = 0; i < bytes; i++) { |
4109 |
+ OS.write((uint8_t) ((Encoding >> (8 * i)) & 0xff)); |
4110 |
+ } |
4111 |
++ |
4112 |
++ if (bytes > 4) |
4113 |
++ return; |
4114 |
++ |
4115 |
++ // Check for additional literals in SRC0/1/2 (Op 1/2/3) |
4116 |
++ for (unsigned i = 0, e = MI.getNumOperands(); i < e; ++i) { |
4117 |
++ |
4118 |
++ // Check if this operand should be encoded as [SV]Src |
4119 |
++ if (!isSrcOperand(Desc, i)) |
4120 |
++ continue; |
4121 |
++ |
4122 |
++ // Is this operand a literal immediate? |
4123 |
++ const MCOperand &Op = MI.getOperand(i); |
4124 |
++ if (getLitEncoding(Op) != 255) |
4125 |
++ continue; |
4126 |
++ |
4127 |
++ // Yes! Encode it |
4128 |
++ IntFloatUnion Imm; |
4129 |
++ if (Op.isImm()) |
4130 |
++ Imm.I = Op.getImm(); |
4131 |
++ else |
4132 |
++ Imm.F = Op.getFPImm(); |
4133 |
++ |
4134 |
++ for (unsigned j = 0; j < 4; j++) { |
4135 |
++ OS.write((uint8_t) ((Imm.I >> (8 * j)) & 0xff)); |
4136 |
++ } |
4137 |
++ |
4138 |
++ // Only one literal value allowed |
4139 |
++ break; |
4140 |
++ } |
4141 |
+} |
4142 |
+ |
4143 |
+uint64_t SIMCCodeEmitter::getMachineOpValue(const MCInst &MI, |
4144 |
+ const MCOperand &MO, |
4145 |
+ SmallVectorImpl<MCFixup> &Fixups) const { |
4146 |
-+ if (MO.isReg()) { |
4147 |
-+ return getRegBinaryCode(MO.getReg()); |
4148 |
-+ } else if (MO.isImm()) { |
4149 |
-+ return MO.getImm(); |
4150 |
-+ } else if (MO.isFPImm()) { |
4151 |
-+ // XXX: Not all instructions can use inline literals |
4152 |
-+ // XXX: We should make sure this is a 32-bit constant |
4153 |
-+ union { |
4154 |
-+ float F; |
4155 |
-+ uint32_t I; |
4156 |
-+ } Imm; |
4157 |
-+ Imm.F = MO.getFPImm(); |
4158 |
-+ return Imm.I; |
4159 |
-+ } else if (MO.isExpr()) { |
4160 |
++ if (MO.isReg()) |
4161 |
++ return MRI.getEncodingValue(MO.getReg()); |
4162 |
++ |
4163 |
++ if (MO.isExpr()) { |
4164 |
+ const MCExpr *Expr = MO.getExpr(); |
4165 |
+ MCFixupKind Kind = MCFixupKind(FK_PCRel_4); |
4166 |
+ Fixups.push_back(MCFixup::Create(0, Expr, Kind, MI.getLoc())); |
4167 |
+ return 0; |
4168 |
-+ } else{ |
4169 |
-+ llvm_unreachable("Encoding of this operand type is not supported yet."); |
4170 |
+ } |
4171 |
++ |
4172 |
++ // Figure out the operand number, needed for isSrcOperand check |
4173 |
++ unsigned OpNo = 0; |
4174 |
++ for (unsigned e = MI.getNumOperands(); OpNo < e; ++OpNo) { |
4175 |
++ if (&MO == &MI.getOperand(OpNo)) |
4176 |
++ break; |
4177 |
++ } |
4178 |
++ |
4179 |
++ const MCInstrDesc &Desc = MCII.get(MI.getOpcode()); |
4180 |
++ if (isSrcOperand(Desc, OpNo)) { |
4181 |
++ uint32_t Enc = getLitEncoding(MO); |
4182 |
++ if (Enc != ~0U && (Enc != 255 || Desc.getSize() == 4)) |
4183 |
++ return Enc; |
4184 |
++ |
4185 |
++ } else if (MO.isImm()) |
4186 |
++ return MO.getImm(); |
4187 |
++ |
4188 |
++ llvm_unreachable("Encoding of this operand type is not supported yet."); |
4189 |
+ return 0; |
4190 |
+} |
4191 |
+ |
4192 |
@@ -12936,10 +13447,10 @@ index 0000000..c47dc99 |
4193 |
+ |
4194 |
+unsigned SIMCCodeEmitter::GPRAlign(const MCInst &MI, unsigned OpNo, |
4195 |
+ unsigned shift) const { |
4196 |
-+ unsigned regCode = getRegBinaryCode(MI.getOperand(OpNo).getReg()); |
4197 |
-+ return regCode >> shift; |
4198 |
-+ return 0; |
4199 |
++ unsigned regCode = MRI.getEncodingValue(MI.getOperand(OpNo).getReg()); |
4200 |
++ return (regCode & 0xff) >> shift; |
4201 |
+} |
4202 |
++ |
4203 |
+unsigned SIMCCodeEmitter::GPR2AlignEncode(const MCInst &MI, |
4204 |
+ unsigned OpNo , |
4205 |
+ SmallVectorImpl<MCFixup> &Fixup) const { |
4206 |
@@ -12951,120 +13462,6 @@ index 0000000..c47dc99 |
4207 |
+ SmallVectorImpl<MCFixup> &Fixup) const { |
4208 |
+ return GPRAlign(MI, OpNo, 2); |
4209 |
+} |
4210 |
-+ |
4211 |
-+#define SMRD_OFFSET_MASK 0xff |
4212 |
-+#define SMRD_IMM_SHIFT 8 |
4213 |
-+#define SMRD_SBASE_MASK 0x3f |
4214 |
-+#define SMRD_SBASE_SHIFT 9 |
4215 |
-+/// This function is responsibe for encoding the offset |
4216 |
-+/// and the base ptr for SMRD instructions it should return a bit string in |
4217 |
-+/// this format: |
4218 |
-+/// |
4219 |
-+/// OFFSET = bits{7-0} |
4220 |
-+/// IMM = bits{8} |
4221 |
-+/// SBASE = bits{14-9} |
4222 |
-+/// |
4223 |
-+uint32_t SIMCCodeEmitter::SMRDmemriEncode(const MCInst &MI, unsigned OpNo, |
4224 |
-+ SmallVectorImpl<MCFixup> &Fixup) const { |
4225 |
-+ uint32_t Encoding; |
4226 |
-+ |
4227 |
-+ const MCOperand &OffsetOp = MI.getOperand(OpNo + 1); |
4228 |
-+ |
4229 |
-+ //XXX: Use this function for SMRD loads with register offsets |
4230 |
-+ assert(OffsetOp.isImm()); |
4231 |
-+ |
4232 |
-+ Encoding = |
4233 |
-+ (getMachineOpValue(MI, OffsetOp, Fixup) & SMRD_OFFSET_MASK) |
4234 |
-+ | (1 << SMRD_IMM_SHIFT) //XXX If the Offset is a register we shouldn't set this bit |
4235 |
-+ | ((GPR2AlignEncode(MI, OpNo, Fixup) & SMRD_SBASE_MASK) << SMRD_SBASE_SHIFT) |
4236 |
-+ ; |
4237 |
-+ |
4238 |
-+ return Encoding; |
4239 |
-+} |
4240 |
-+ |
4241 |
-+//===----------------------------------------------------------------------===// |
4242 |
-+// Post Encoder Callbacks |
4243 |
-+//===----------------------------------------------------------------------===// |
4244 |
-+ |
4245 |
-+uint64_t SIMCCodeEmitter::VOPPostEncode(const MCInst &MI, uint64_t Value) const{ |
4246 |
-+ unsigned encodingType = getEncodingType(MI); |
4247 |
-+ unsigned numSrcOps; |
4248 |
-+ unsigned vgprBitOffset; |
4249 |
-+ |
4250 |
-+ if (encodingType == SIInstrEncodingType::VOP3) { |
4251 |
-+ numSrcOps = 3; |
4252 |
-+ vgprBitOffset = 32; |
4253 |
-+ } else { |
4254 |
-+ numSrcOps = 1; |
4255 |
-+ vgprBitOffset = 0; |
4256 |
-+ } |
4257 |
-+ |
4258 |
-+ // Add one to skip over the destination reg operand. |
4259 |
-+ for (unsigned opIdx = 1; opIdx < numSrcOps + 1; opIdx++) { |
4260 |
-+ const MCOperand &MO = MI.getOperand(opIdx); |
4261 |
-+ if (MO.isReg()) { |
4262 |
-+ unsigned reg = MI.getOperand(opIdx).getReg(); |
4263 |
-+ if (AMDGPUMCRegisterClasses[AMDGPU::VReg_32RegClassID].contains(reg) || |
4264 |
-+ AMDGPUMCRegisterClasses[AMDGPU::VReg_64RegClassID].contains(reg)) { |
4265 |
-+ Value |= (VGPR_BIT(opIdx)) << vgprBitOffset; |
4266 |
-+ } |
4267 |
-+ } else if (MO.isFPImm()) { |
4268 |
-+ union { |
4269 |
-+ float f; |
4270 |
-+ uint32_t i; |
4271 |
-+ } Imm; |
4272 |
-+ // XXX: Not all instructions can use inline literals |
4273 |
-+ // XXX: We should make sure this is a 32-bit constant |
4274 |
-+ Imm.f = MO.getFPImm(); |
4275 |
-+ Value |= ((uint64_t)Imm.i) << 32; |
4276 |
-+ } |
4277 |
-+ } |
4278 |
-+ return Value; |
4279 |
-+} |
4280 |
-+ |
4281 |
-+//===----------------------------------------------------------------------===// |
4282 |
-+// Encoding helper functions |
4283 |
-+//===----------------------------------------------------------------------===// |
4284 |
-+ |
4285 |
-+unsigned SIMCCodeEmitter::getEncodingType(const MCInst &MI) const { |
4286 |
-+ return MCII.get(MI.getOpcode()).TSFlags & SI_INSTR_FLAGS_ENCODING_MASK; |
4287 |
-+} |
4288 |
-+ |
4289 |
-+unsigned SIMCCodeEmitter::getEncodingBytes(const MCInst &MI) const { |
4290 |
-+ |
4291 |
-+ // These instructions aren't real instructions with an encoding type, so |
4292 |
-+ // we need to manually specify their size. |
4293 |
-+ switch (MI.getOpcode()) { |
4294 |
-+ default: break; |
4295 |
-+ case AMDGPU::SI_LOAD_LITERAL_I32: |
4296 |
-+ case AMDGPU::SI_LOAD_LITERAL_F32: |
4297 |
-+ return 4; |
4298 |
-+ } |
4299 |
-+ |
4300 |
-+ unsigned encoding_type = getEncodingType(MI); |
4301 |
-+ switch (encoding_type) { |
4302 |
-+ case SIInstrEncodingType::EXP: |
4303 |
-+ case SIInstrEncodingType::LDS: |
4304 |
-+ case SIInstrEncodingType::MUBUF: |
4305 |
-+ case SIInstrEncodingType::MTBUF: |
4306 |
-+ case SIInstrEncodingType::MIMG: |
4307 |
-+ case SIInstrEncodingType::VOP3: |
4308 |
-+ return 8; |
4309 |
-+ default: |
4310 |
-+ return 4; |
4311 |
-+ } |
4312 |
-+} |
4313 |
-+ |
4314 |
-+ |
4315 |
-+unsigned SIMCCodeEmitter::getRegBinaryCode(unsigned reg) const { |
4316 |
-+ switch (reg) { |
4317 |
-+ case AMDGPU::M0: return 124; |
4318 |
-+ case AMDGPU::SREG_LIT_0: return 128; |
4319 |
-+ case AMDGPU::SI_LITERAL_CONSTANT: return 255; |
4320 |
-+ default: return MRI.getEncodingValue(reg); |
4321 |
-+ } |
4322 |
-+} |
4323 |
-+ |
4324 |
diff --git a/lib/Target/R600/Makefile b/lib/Target/R600/Makefile |
4325 |
new file mode 100644 |
4326 |
index 0000000..1b3ebbe |
4327 |
@@ -13096,10 +13493,10 @@ index 0000000..1b3ebbe |
4328 |
+include $(LEVEL)/Makefile.common |
4329 |
diff --git a/lib/Target/R600/Processors.td b/lib/Target/R600/Processors.td |
4330 |
new file mode 100644 |
4331 |
-index 0000000..3dc1ecd |
4332 |
+index 0000000..868810c |
4333 |
--- /dev/null |
4334 |
+++ b/lib/Target/R600/Processors.td |
4335 |
-@@ -0,0 +1,29 @@ |
4336 |
+@@ -0,0 +1,30 @@ |
4337 |
+//===-- Processors.td - TODO: Add brief description -------===// |
4338 |
+// |
4339 |
+// The LLVM Compiler Infrastructure |
4340 |
@@ -13115,6 +13512,7 @@ index 0000000..3dc1ecd |
4341 |
+ |
4342 |
+class Proc<string Name, ProcessorItineraries itin, list<SubtargetFeature> Features> |
4343 |
+: Processor<Name, itin, Features>; |
4344 |
++def : Proc<"", R600_EG_Itin, [FeatureR600ALUInst]>; |
4345 |
+def : Proc<"r600", R600_EG_Itin, [FeatureR600ALUInst]>; |
4346 |
+def : Proc<"rv710", R600_EG_Itin, []>; |
4347 |
+def : Proc<"rv730", R600_EG_Itin, []>; |
4348 |
@@ -13131,10 +13529,10 @@ index 0000000..3dc1ecd |
4349 |
+ |
4350 |
diff --git a/lib/Target/R600/R600Defines.h b/lib/Target/R600/R600Defines.h |
4351 |
new file mode 100644 |
4352 |
-index 0000000..7dea8e4 |
4353 |
+index 0000000..16cfcf5 |
4354 |
--- /dev/null |
4355 |
+++ b/lib/Target/R600/R600Defines.h |
4356 |
-@@ -0,0 +1,79 @@ |
4357 |
+@@ -0,0 +1,97 @@ |
4358 |
+//===-- R600Defines.h - R600 Helper Macros ----------------------*- C++ -*-===// |
4359 |
+// |
4360 |
+// The LLVM Compiler Infrastructure |
4361 |
@@ -13186,6 +13584,9 @@ index 0000000..7dea8e4 |
4362 |
+#define HW_REG_MASK 0x1ff |
4363 |
+#define HW_CHAN_SHIFT 9 |
4364 |
+ |
4365 |
++#define GET_REG_CHAN(reg) ((reg) >> HW_CHAN_SHIFT) |
4366 |
++#define GET_REG_INDEX(reg) ((reg) & HW_REG_MASK) |
4367 |
++ |
4368 |
+namespace R600Operands { |
4369 |
+ enum Ops { |
4370 |
+ DST, |
4371 |
@@ -13199,27 +13600,42 @@ index 0000000..7dea8e4 |
4372 |
+ SRC0_NEG, |
4373 |
+ SRC0_REL, |
4374 |
+ SRC0_ABS, |
4375 |
++ SRC0_SEL, |
4376 |
+ SRC1, |
4377 |
+ SRC1_NEG, |
4378 |
+ SRC1_REL, |
4379 |
+ SRC1_ABS, |
4380 |
++ SRC1_SEL, |
4381 |
+ SRC2, |
4382 |
+ SRC2_NEG, |
4383 |
+ SRC2_REL, |
4384 |
++ SRC2_SEL, |
4385 |
+ LAST, |
4386 |
+ PRED_SEL, |
4387 |
+ IMM, |
4388 |
+ COUNT |
4389 |
+ }; |
4390 |
++ |
4391 |
++ const static int ALUOpTable[3][R600Operands::COUNT] = { |
4392 |
++// W C S S S S S S S S S S S |
4393 |
++// R O D L S R R R R S R R R R S R R R L P |
4394 |
++// D U I M R A R C C C C R C C C C R C C C A R I |
4395 |
++// S E U T O E M C 0 0 0 0 C 1 1 1 1 C 2 2 2 S E M |
4396 |
++// T M P E D L P 0 N R A S 1 N R A S 2 N R S T D M |
4397 |
++ {0,-1,-1, 1, 2, 3, 4, 5, 6, 7, 8, 9,-1,-1,-1,-1,-1,-1,-1,-1,-1,10,11,12}, |
4398 |
++ {0, 1, 2, 3, 4 ,5 ,6 ,7, 8, 9,10,11,12,13,14,15,16,-1,-1,-1,-1,17,18,19}, |
4399 |
++ {0,-1,-1,-1,-1, 1, 2, 3, 4, 5,-1, 6, 7, 8, 9,-1,10,11,12,13,14,15,16,17} |
4400 |
++ }; |
4401 |
++ |
4402 |
+} |
4403 |
+ |
4404 |
+#endif // R600DEFINES_H_ |
4405 |
diff --git a/lib/Target/R600/R600ExpandSpecialInstrs.cpp b/lib/Target/R600/R600ExpandSpecialInstrs.cpp |
4406 |
new file mode 100644 |
4407 |
-index 0000000..b6e62b7 |
4408 |
+index 0000000..c00c349 |
4409 |
--- /dev/null |
4410 |
+++ b/lib/Target/R600/R600ExpandSpecialInstrs.cpp |
4411 |
-@@ -0,0 +1,334 @@ |
4412 |
+@@ -0,0 +1,290 @@ |
4413 |
+//===-- R600ExpandSpecialInstrs.cpp - Expand special instructions ---------===// |
4414 |
+// |
4415 |
+// The LLVM Compiler Infrastructure |
4416 |
@@ -13277,118 +13693,6 @@ index 0000000..b6e62b7 |
4417 |
+ return new R600ExpandSpecialInstrsPass(TM); |
4418 |
+} |
4419 |
+ |
4420 |
-+bool R600ExpandSpecialInstrsPass::ExpandInputPerspective(MachineInstr &MI) { |
4421 |
-+ const R600RegisterInfo &TRI = TII->getRegisterInfo(); |
4422 |
-+ if (MI.getOpcode() != AMDGPU::input_perspective) |
4423 |
-+ return false; |
4424 |
-+ |
4425 |
-+ MachineBasicBlock::iterator I = &MI; |
4426 |
-+ unsigned DstReg = MI.getOperand(0).getReg(); |
4427 |
-+ R600MachineFunctionInfo *MFI = MI.getParent()->getParent() |
4428 |
-+ ->getInfo<R600MachineFunctionInfo>(); |
4429 |
-+ unsigned IJIndexBase; |
4430 |
-+ |
4431 |
-+ // In Evergreen ISA doc section 8.3.2 : |
4432 |
-+ // We need to interpolate XY and ZW in two different instruction groups. |
4433 |
-+ // An INTERP_* must occupy all 4 slots of an instruction group. |
4434 |
-+ // Output of INTERP_XY is written in X,Y slots |
4435 |
-+ // Output of INTERP_ZW is written in Z,W slots |
4436 |
-+ // |
4437 |
-+ // Thus interpolation requires the following sequences : |
4438 |
-+ // |
4439 |
-+ // AnyGPR.x = INTERP_ZW; (Write Masked Out) |
4440 |
-+ // AnyGPR.y = INTERP_ZW; (Write Masked Out) |
4441 |
-+ // DstGPR.z = INTERP_ZW; |
4442 |
-+ // DstGPR.w = INTERP_ZW; (End of first IG) |
4443 |
-+ // DstGPR.x = INTERP_XY; |
4444 |
-+ // DstGPR.y = INTERP_XY; |
4445 |
-+ // AnyGPR.z = INTERP_XY; (Write Masked Out) |
4446 |
-+ // AnyGPR.w = INTERP_XY; (Write Masked Out) (End of second IG) |
4447 |
-+ // |
4448 |
-+ switch (MI.getOperand(1).getImm()) { |
4449 |
-+ case 0: |
4450 |
-+ IJIndexBase = MFI->GetIJPerspectiveIndex(); |
4451 |
-+ break; |
4452 |
-+ case 1: |
4453 |
-+ IJIndexBase = MFI->GetIJLinearIndex(); |
4454 |
-+ break; |
4455 |
-+ default: |
4456 |
-+ assert(0 && "Unknow ij index"); |
4457 |
-+ } |
4458 |
-+ |
4459 |
-+ for (unsigned i = 0; i < 8; i++) { |
4460 |
-+ unsigned IJIndex = AMDGPU::R600_TReg32RegClass.getRegister( |
4461 |
-+ 2 * IJIndexBase + ((i + 1) % 2)); |
4462 |
-+ unsigned ReadReg = AMDGPU::R600_ArrayBaseRegClass.getRegister( |
4463 |
-+ MI.getOperand(2).getImm()); |
4464 |
-+ |
4465 |
-+ |
4466 |
-+ unsigned Sel = AMDGPU::sel_x; |
4467 |
-+ switch (i % 4) { |
4468 |
-+ case 0:Sel = AMDGPU::sel_x;break; |
4469 |
-+ case 1:Sel = AMDGPU::sel_y;break; |
4470 |
-+ case 2:Sel = AMDGPU::sel_z;break; |
4471 |
-+ case 3:Sel = AMDGPU::sel_w;break; |
4472 |
-+ default:break; |
4473 |
-+ } |
4474 |
-+ |
4475 |
-+ unsigned Res = TRI.getSubReg(DstReg, Sel); |
4476 |
-+ |
4477 |
-+ unsigned Opcode = (i < 4)?AMDGPU::INTERP_ZW:AMDGPU::INTERP_XY; |
4478 |
-+ |
4479 |
-+ MachineBasicBlock &MBB = *(MI.getParent()); |
4480 |
-+ MachineInstr *NewMI = |
4481 |
-+ TII->buildDefaultInstruction(MBB, I, Opcode, Res, IJIndex, ReadReg); |
4482 |
-+ |
4483 |
-+ if (!(i> 1 && i < 6)) { |
4484 |
-+ TII->addFlag(NewMI, 0, MO_FLAG_MASK); |
4485 |
-+ } |
4486 |
-+ |
4487 |
-+ if (i % 4 != 3) |
4488 |
-+ TII->addFlag(NewMI, 0, MO_FLAG_NOT_LAST); |
4489 |
-+ } |
4490 |
-+ |
4491 |
-+ MI.eraseFromParent(); |
4492 |
-+ |
4493 |
-+ return true; |
4494 |
-+} |
4495 |
-+ |
4496 |
-+bool R600ExpandSpecialInstrsPass::ExpandInputConstant(MachineInstr &MI) { |
4497 |
-+ const R600RegisterInfo &TRI = TII->getRegisterInfo(); |
4498 |
-+ if (MI.getOpcode() != AMDGPU::input_constant) |
4499 |
-+ return false; |
4500 |
-+ |
4501 |
-+ MachineBasicBlock::iterator I = &MI; |
4502 |
-+ unsigned DstReg = MI.getOperand(0).getReg(); |
4503 |
-+ |
4504 |
-+ for (unsigned i = 0; i < 4; i++) { |
4505 |
-+ unsigned ReadReg = AMDGPU::R600_ArrayBaseRegClass.getRegister( |
4506 |
-+ MI.getOperand(1).getImm()); |
4507 |
-+ |
4508 |
-+ unsigned Sel = AMDGPU::sel_x; |
4509 |
-+ switch (i % 4) { |
4510 |
-+ case 0:Sel = AMDGPU::sel_x;break; |
4511 |
-+ case 1:Sel = AMDGPU::sel_y;break; |
4512 |
-+ case 2:Sel = AMDGPU::sel_z;break; |
4513 |
-+ case 3:Sel = AMDGPU::sel_w;break; |
4514 |
-+ default:break; |
4515 |
-+ } |
4516 |
-+ |
4517 |
-+ unsigned Res = TRI.getSubReg(DstReg, Sel); |
4518 |
-+ |
4519 |
-+ MachineBasicBlock &MBB = *(MI.getParent()); |
4520 |
-+ MachineInstr *NewMI = TII->buildDefaultInstruction( |
4521 |
-+ MBB, I, AMDGPU::INTERP_LOAD_P0, Res, ReadReg); |
4522 |
-+ |
4523 |
-+ if (i % 4 != 3) |
4524 |
-+ TII->addFlag(NewMI, 0, MO_FLAG_NOT_LAST); |
4525 |
-+ } |
4526 |
-+ |
4527 |
-+ MI.eraseFromParent(); |
4528 |
-+ |
4529 |
-+ return true; |
4530 |
-+} |
4531 |
-+ |
4532 |
+bool R600ExpandSpecialInstrsPass::runOnMachineFunction(MachineFunction &MF) { |
4533 |
+ |
4534 |
+ const R600RegisterInfo &TRI = TII->getRegisterInfo(); |
4535 |
@@ -13422,7 +13726,7 @@ index 0000000..b6e62b7 |
4536 |
+ MI.eraseFromParent(); |
4537 |
+ continue; |
4538 |
+ } |
4539 |
-+ case AMDGPU::BREAK: |
4540 |
++ case AMDGPU::BREAK: { |
4541 |
+ MachineInstr *PredSet = TII->buildDefaultInstruction(MBB, I, |
4542 |
+ AMDGPU::PRED_SETE_INT, |
4543 |
+ AMDGPU::PREDICATE_BIT, |
4544 |
@@ -13436,12 +13740,81 @@ index 0000000..b6e62b7 |
4545 |
+ .addReg(AMDGPU::PREDICATE_BIT); |
4546 |
+ MI.eraseFromParent(); |
4547 |
+ continue; |
4548 |
-+ } |
4549 |
++ } |
4550 |
+ |
4551 |
-+ if (ExpandInputPerspective(MI)) |
4552 |
-+ continue; |
4553 |
-+ if (ExpandInputConstant(MI)) |
4554 |
-+ continue; |
4555 |
++ case AMDGPU::INTERP_PAIR_XY: { |
4556 |
++ MachineInstr *BMI; |
4557 |
++ unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister( |
4558 |
++ MI.getOperand(2).getImm()); |
4559 |
++ |
4560 |
++ for (unsigned Chan = 0; Chan < 4; ++Chan) { |
4561 |
++ unsigned DstReg; |
4562 |
++ |
4563 |
++ if (Chan < 2) |
4564 |
++ DstReg = MI.getOperand(Chan).getReg(); |
4565 |
++ else |
4566 |
++ DstReg = Chan == 2 ? AMDGPU::T0_Z : AMDGPU::T0_W; |
4567 |
++ |
4568 |
++ BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_XY, |
4569 |
++ DstReg, MI.getOperand(3 + (Chan % 2)).getReg(), PReg); |
4570 |
++ |
4571 |
++ BMI->setIsInsideBundle(Chan > 0); |
4572 |
++ if (Chan >= 2) |
4573 |
++ TII->addFlag(BMI, 0, MO_FLAG_MASK); |
4574 |
++ if (Chan != 3) |
4575 |
++ TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST); |
4576 |
++ } |
4577 |
++ |
4578 |
++ MI.eraseFromParent(); |
4579 |
++ continue; |
4580 |
++ } |
4581 |
++ |
4582 |
++ case AMDGPU::INTERP_PAIR_ZW: { |
4583 |
++ MachineInstr *BMI; |
4584 |
++ unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister( |
4585 |
++ MI.getOperand(2).getImm()); |
4586 |
++ |
4587 |
++ for (unsigned Chan = 0; Chan < 4; ++Chan) { |
4588 |
++ unsigned DstReg; |
4589 |
++ |
4590 |
++ if (Chan < 2) |
4591 |
++ DstReg = Chan == 0 ? AMDGPU::T0_X : AMDGPU::T0_Y; |
4592 |
++ else |
4593 |
++ DstReg = MI.getOperand(Chan-2).getReg(); |
4594 |
++ |
4595 |
++ BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_ZW, |
4596 |
++ DstReg, MI.getOperand(3 + (Chan % 2)).getReg(), PReg); |
4597 |
++ |
4598 |
++ BMI->setIsInsideBundle(Chan > 0); |
4599 |
++ if (Chan < 2) |
4600 |
++ TII->addFlag(BMI, 0, MO_FLAG_MASK); |
4601 |
++ if (Chan != 3) |
4602 |
++ TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST); |
4603 |
++ } |
4604 |
++ |
4605 |
++ MI.eraseFromParent(); |
4606 |
++ continue; |
4607 |
++ } |
4608 |
++ |
4609 |
++ case AMDGPU::INTERP_VEC_LOAD: { |
4610 |
++ const R600RegisterInfo &TRI = TII->getRegisterInfo(); |
4611 |
++ MachineInstr *BMI; |
4612 |
++ unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister( |
4613 |
++ MI.getOperand(1).getImm()); |
4614 |
++ unsigned DstReg = MI.getOperand(0).getReg(); |
4615 |
++ |
4616 |
++ for (unsigned Chan = 0; Chan < 4; ++Chan) { |
4617 |
++ BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_LOAD_P0, |
4618 |
++ TRI.getSubReg(DstReg, TRI.getSubRegFromChannel(Chan)), PReg); |
4619 |
++ BMI->setIsInsideBundle(Chan > 0); |
4620 |
++ if (Chan != 3) |
4621 |
++ TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST); |
4622 |
++ } |
4623 |
++ |
4624 |
++ MI.eraseFromParent(); |
4625 |
++ continue; |
4626 |
++ } |
4627 |
++ } |
4628 |
+ |
4629 |
+ bool IsReduction = TII->isReductionOp(MI.getOpcode()); |
4630 |
+ bool IsVector = TII->isVector(MI); |
4631 |
@@ -13540,8 +13913,7 @@ index 0000000..b6e62b7 |
4632 |
+ MachineInstr *NewMI = |
4633 |
+ TII->buildDefaultInstruction(MBB, I, Opcode, DstReg, Src0, Src1); |
4634 |
+ |
4635 |
-+ if (Chan != 0) |
4636 |
-+ NewMI->bundleWithPred(); |
4637 |
++ NewMI->setIsInsideBundle(Chan != 0); |
4638 |
+ if (Mask) { |
4639 |
+ TII->addFlag(NewMI, 0, MO_FLAG_MASK); |
4640 |
+ } |
4641 |
@@ -13556,10 +13928,10 @@ index 0000000..b6e62b7 |
4642 |
+} |
4643 |
diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp |
4644 |
new file mode 100644 |
4645 |
-index 0000000..d6b9d90 |
4646 |
+index 0000000..9c38522 |
4647 |
--- /dev/null |
4648 |
+++ b/lib/Target/R600/R600ISelLowering.cpp |
4649 |
-@@ -0,0 +1,909 @@ |
4650 |
+@@ -0,0 +1,1195 @@ |
4651 |
+//===-- R600ISelLowering.cpp - R600 DAG Lowering Implementation -----------===// |
4652 |
+// |
4653 |
+// The LLVM Compiler Infrastructure |
4654 |
@@ -13580,6 +13952,7 @@ index 0000000..d6b9d90 |
4655 |
+#include "R600MachineFunctionInfo.h" |
4656 |
+#include "llvm/Argument.h" |
4657 |
+#include "llvm/Function.h" |
4658 |
++#include "llvm/CodeGen/MachineFrameInfo.h" |
4659 |
+#include "llvm/CodeGen/MachineInstrBuilder.h" |
4660 |
+#include "llvm/CodeGen/MachineRegisterInfo.h" |
4661 |
+#include "llvm/CodeGen/SelectionDAG.h" |
4662 |
@@ -13633,10 +14006,27 @@ index 0000000..d6b9d90 |
4663 |
+ setOperationAction(ISD::SELECT, MVT::i32, Custom); |
4664 |
+ setOperationAction(ISD::SELECT, MVT::f32, Custom); |
4665 |
+ |
4666 |
++ // Legalize loads and stores to the private address space. |
4667 |
++ setOperationAction(ISD::LOAD, MVT::i32, Custom); |
4668 |
++ setOperationAction(ISD::LOAD, MVT::v2i32, Custom); |
4669 |
++ setOperationAction(ISD::LOAD, MVT::v4i32, Custom); |
4670 |
++ setLoadExtAction(ISD::EXTLOAD, MVT::v4i8, Custom); |
4671 |
++ setLoadExtAction(ISD::EXTLOAD, MVT::i8, Custom); |
4672 |
++ setLoadExtAction(ISD::ZEXTLOAD, MVT::i8, Custom); |
4673 |
++ setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i8, Custom); |
4674 |
++ setOperationAction(ISD::STORE, MVT::i8, Custom); |
4675 |
+ setOperationAction(ISD::STORE, MVT::i32, Custom); |
4676 |
++ setOperationAction(ISD::STORE, MVT::v2i32, Custom); |
4677 |
+ setOperationAction(ISD::STORE, MVT::v4i32, Custom); |
4678 |
+ |
4679 |
++ setOperationAction(ISD::LOAD, MVT::i32, Custom); |
4680 |
++ setOperationAction(ISD::LOAD, MVT::v4i32, Custom); |
4681 |
++ setOperationAction(ISD::FrameIndex, MVT::i32, Custom); |
4682 |
++ |
4683 |
+ setTargetDAGCombine(ISD::FP_ROUND); |
4684 |
++ setTargetDAGCombine(ISD::FP_TO_SINT); |
4685 |
++ setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT); |
4686 |
++ setTargetDAGCombine(ISD::SELECT_CC); |
4687 |
+ |
4688 |
+ setSchedulingPreference(Sched::VLIW); |
4689 |
+} |
4690 |
@@ -13677,15 +14067,6 @@ index 0000000..d6b9d90 |
4691 |
+ break; |
4692 |
+ } |
4693 |
+ |
4694 |
-+ case AMDGPU::R600_LOAD_CONST: { |
4695 |
-+ int64_t RegIndex = MI->getOperand(1).getImm(); |
4696 |
-+ unsigned ConstantReg = AMDGPU::R600_CReg32RegClass.getRegister(RegIndex); |
4697 |
-+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::COPY)) |
4698 |
-+ .addOperand(MI->getOperand(0)) |
4699 |
-+ .addReg(ConstantReg); |
4700 |
-+ break; |
4701 |
-+ } |
4702 |
-+ |
4703 |
+ case AMDGPU::MASK_WRITE: { |
4704 |
+ unsigned maskedRegister = MI->getOperand(0).getReg(); |
4705 |
+ assert(TargetRegisterInfo::isVirtualRegister(maskedRegister)); |
4706 |
@@ -13716,18 +14097,6 @@ index 0000000..d6b9d90 |
4707 |
+ break; |
4708 |
+ } |
4709 |
+ |
4710 |
-+ case AMDGPU::RESERVE_REG: { |
4711 |
-+ R600MachineFunctionInfo * MFI = MF->getInfo<R600MachineFunctionInfo>(); |
4712 |
-+ int64_t ReservedIndex = MI->getOperand(0).getImm(); |
4713 |
-+ unsigned ReservedReg = |
4714 |
-+ AMDGPU::R600_TReg32RegClass.getRegister(ReservedIndex); |
4715 |
-+ MFI->ReservedRegs.push_back(ReservedReg); |
4716 |
-+ unsigned SuperReg = |
4717 |
-+ AMDGPU::R600_Reg128RegClass.getRegister(ReservedIndex / 4); |
4718 |
-+ MFI->ReservedRegs.push_back(SuperReg); |
4719 |
-+ break; |
4720 |
-+ } |
4721 |
-+ |
4722 |
+ case AMDGPU::TXD: { |
4723 |
+ unsigned T0 = MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass); |
4724 |
+ unsigned T1 = MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass); |
4725 |
@@ -13812,33 +14181,26 @@ index 0000000..d6b9d90 |
4726 |
+ break; |
4727 |
+ } |
4728 |
+ |
4729 |
-+ case AMDGPU::input_perspective: { |
4730 |
-+ R600MachineFunctionInfo *MFI = MF->getInfo<R600MachineFunctionInfo>(); |
4731 |
-+ |
4732 |
-+ // XXX Be more fine about register reservation |
4733 |
-+ for (unsigned i = 0; i < 4; i ++) { |
4734 |
-+ unsigned ReservedReg = AMDGPU::R600_TReg32RegClass.getRegister(i); |
4735 |
-+ MFI->ReservedRegs.push_back(ReservedReg); |
4736 |
-+ } |
4737 |
-+ |
4738 |
-+ switch (MI->getOperand(1).getImm()) { |
4739 |
-+ case 0:// Perspective |
4740 |
-+ MFI->HasPerspectiveInterpolation = true; |
4741 |
-+ break; |
4742 |
-+ case 1:// Linear |
4743 |
-+ MFI->HasLinearInterpolation = true; |
4744 |
-+ break; |
4745 |
-+ default: |
4746 |
-+ assert(0 && "Unknow ij index"); |
4747 |
-+ } |
4748 |
-+ |
4749 |
-+ return BB; |
4750 |
-+ } |
4751 |
-+ |
4752 |
+ case AMDGPU::EG_ExportSwz: |
4753 |
+ case AMDGPU::R600_ExportSwz: { |
4754 |
++ // Instruction is left unmodified if its not the last one of its type |
4755 |
++ bool isLastInstructionOfItsType = true; |
4756 |
++ unsigned InstExportType = MI->getOperand(1).getImm(); |
4757 |
++ for (MachineBasicBlock::iterator NextExportInst = llvm::next(I), |
4758 |
++ EndBlock = BB->end(); NextExportInst != EndBlock; |
4759 |
++ NextExportInst = llvm::next(NextExportInst)) { |
4760 |
++ if (NextExportInst->getOpcode() == AMDGPU::EG_ExportSwz || |
4761 |
++ NextExportInst->getOpcode() == AMDGPU::R600_ExportSwz) { |
4762 |
++ unsigned CurrentInstExportType = NextExportInst->getOperand(1) |
4763 |
++ .getImm(); |
4764 |
++ if (CurrentInstExportType == InstExportType) { |
4765 |
++ isLastInstructionOfItsType = false; |
4766 |
++ break; |
4767 |
++ } |
4768 |
++ } |
4769 |
++ } |
4770 |
+ bool EOP = (llvm::next(I)->getOpcode() == AMDGPU::RETURN)? 1 : 0; |
4771 |
-+ if (!EOP) |
4772 |
++ if (!EOP && !isLastInstructionOfItsType) |
4773 |
+ return BB; |
4774 |
+ unsigned CfInst = (MI->getOpcode() == AMDGPU::EG_ExportSwz)? 84 : 40; |
4775 |
+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI->getOpcode())) |
4776 |
@@ -13850,7 +14212,7 @@ index 0000000..d6b9d90 |
4777 |
+ .addOperand(MI->getOperand(5)) |
4778 |
+ .addOperand(MI->getOperand(6)) |
4779 |
+ .addImm(CfInst) |
4780 |
-+ .addImm(1); |
4781 |
++ .addImm(EOP); |
4782 |
+ break; |
4783 |
+ } |
4784 |
+ } |
4785 |
@@ -13926,7 +14288,9 @@ index 0000000..d6b9d90 |
4786 |
+ case ISD::SELECT: return LowerSELECT(Op, DAG); |
4787 |
+ case ISD::SETCC: return LowerSETCC(Op, DAG); |
4788 |
+ case ISD::STORE: return LowerSTORE(Op, DAG); |
4789 |
++ case ISD::LOAD: return LowerLOAD(Op, DAG); |
4790 |
+ case ISD::FPOW: return LowerFPOW(Op, DAG); |
4791 |
++ case ISD::FrameIndex: return LowerFrameIndex(Op, DAG); |
4792 |
+ case ISD::INTRINSIC_VOID: { |
4793 |
+ SDValue Chain = Op.getOperand(0); |
4794 |
+ unsigned IntrinsicID = |
4795 |
@@ -13953,39 +14317,7 @@ index 0000000..d6b9d90 |
4796 |
+ Chain); |
4797 |
+ |
4798 |
+ } |
4799 |
-+ case AMDGPUIntrinsic::R600_store_stream_output : { |
4800 |
-+ MachineFunction &MF = DAG.getMachineFunction(); |
4801 |
-+ R600MachineFunctionInfo *MFI = MF.getInfo<R600MachineFunctionInfo>(); |
4802 |
-+ int64_t RegIndex = cast<ConstantSDNode>(Op.getOperand(3))->getZExtValue(); |
4803 |
-+ int64_t BufIndex = cast<ConstantSDNode>(Op.getOperand(4))->getZExtValue(); |
4804 |
-+ |
4805 |
-+ SDNode **OutputsMap = MFI->StreamOutputs[BufIndex]; |
4806 |
-+ unsigned Inst; |
4807 |
-+ switch (cast<ConstantSDNode>(Op.getOperand(4))->getZExtValue() ) { |
4808 |
-+ // STREAM3 |
4809 |
-+ case 3: |
4810 |
-+ Inst = 4; |
4811 |
-+ break; |
4812 |
-+ // STREAM2 |
4813 |
-+ case 2: |
4814 |
-+ Inst = 3; |
4815 |
-+ break; |
4816 |
-+ // STREAM1 |
4817 |
-+ case 1: |
4818 |
-+ Inst = 2; |
4819 |
-+ break; |
4820 |
-+ // STREAM0 |
4821 |
-+ case 0: |
4822 |
-+ Inst = 1; |
4823 |
-+ break; |
4824 |
-+ default: |
4825 |
-+ assert(0 && "Wrong buffer id for stream outputs !"); |
4826 |
-+ } |
4827 |
+ |
4828 |
-+ return InsertScalarToRegisterExport(DAG, Op.getDebugLoc(), OutputsMap, |
4829 |
-+ RegIndex / 4, RegIndex % 4, Inst, 0, Op.getOperand(2), |
4830 |
-+ Chain); |
4831 |
-+ } |
4832 |
+ // default for switch(IntrinsicID) |
4833 |
+ default: break; |
4834 |
+ } |
4835 |
@@ -14004,38 +14336,35 @@ index 0000000..d6b9d90 |
4836 |
+ unsigned Reg = AMDGPU::R600_TReg32RegClass.getRegister(RegIndex); |
4837 |
+ return CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass, Reg, VT); |
4838 |
+ } |
4839 |
-+ case AMDGPUIntrinsic::R600_load_input_perspective: { |
4840 |
-+ int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue(); |
4841 |
-+ if (slot < 0) |
4842 |
-+ return DAG.getUNDEF(MVT::f32); |
4843 |
-+ SDValue FullVector = DAG.getNode( |
4844 |
-+ AMDGPUISD::INTERP, |
4845 |
-+ DL, MVT::v4f32, |
4846 |
-+ DAG.getConstant(0, MVT::i32), DAG.getConstant(slot / 4 , MVT::i32)); |
4847 |
-+ return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, |
4848 |
-+ DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32)); |
4849 |
-+ } |
4850 |
-+ case AMDGPUIntrinsic::R600_load_input_linear: { |
4851 |
-+ int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue(); |
4852 |
-+ if (slot < 0) |
4853 |
-+ return DAG.getUNDEF(MVT::f32); |
4854 |
-+ SDValue FullVector = DAG.getNode( |
4855 |
-+ AMDGPUISD::INTERP, |
4856 |
-+ DL, MVT::v4f32, |
4857 |
-+ DAG.getConstant(1, MVT::i32), DAG.getConstant(slot / 4 , MVT::i32)); |
4858 |
-+ return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, |
4859 |
-+ DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32)); |
4860 |
-+ } |
4861 |
-+ case AMDGPUIntrinsic::R600_load_input_constant: { |
4862 |
++ |
4863 |
++ case AMDGPUIntrinsic::R600_interp_input: { |
4864 |
+ int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue(); |
4865 |
-+ if (slot < 0) |
4866 |
-+ return DAG.getUNDEF(MVT::f32); |
4867 |
-+ SDValue FullVector = DAG.getNode( |
4868 |
-+ AMDGPUISD::INTERP_P0, |
4869 |
-+ DL, MVT::v4f32, |
4870 |
-+ DAG.getConstant(slot / 4 , MVT::i32)); |
4871 |
-+ return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, |
4872 |
-+ DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32)); |
4873 |
++ int ijb = cast<ConstantSDNode>(Op.getOperand(2))->getSExtValue(); |
4874 |
++ MachineSDNode *interp; |
4875 |
++ if (ijb < 0) { |
4876 |
++ interp = DAG.getMachineNode(AMDGPU::INTERP_VEC_LOAD, DL, |
4877 |
++ MVT::v4f32, DAG.getTargetConstant(slot / 4 , MVT::i32)); |
4878 |
++ return DAG.getTargetExtractSubreg( |
4879 |
++ TII->getRegisterInfo().getSubRegFromChannel(slot % 4), |
4880 |
++ DL, MVT::f32, SDValue(interp, 0)); |
4881 |
++ } |
4882 |
++ |
4883 |
++ if (slot % 4 < 2) |
4884 |
++ interp = DAG.getMachineNode(AMDGPU::INTERP_PAIR_XY, DL, |
4885 |
++ MVT::f32, MVT::f32, DAG.getTargetConstant(slot / 4 , MVT::i32), |
4886 |
++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass, |
4887 |
++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb + 1), MVT::f32), |
4888 |
++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass, |
4889 |
++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb), MVT::f32)); |
4890 |
++ else |
4891 |
++ interp = DAG.getMachineNode(AMDGPU::INTERP_PAIR_ZW, DL, |
4892 |
++ MVT::f32, MVT::f32, DAG.getTargetConstant(slot / 4 , MVT::i32), |
4893 |
++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass, |
4894 |
++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb + 1), MVT::f32), |
4895 |
++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass, |
4896 |
++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb), MVT::f32)); |
4897 |
++ |
4898 |
++ return SDValue(interp, slot % 2); |
4899 |
+ } |
4900 |
+ |
4901 |
+ case r600_read_ngroups_x: |
4902 |
@@ -14089,6 +14418,20 @@ index 0000000..d6b9d90 |
4903 |
+ switch (N->getOpcode()) { |
4904 |
+ default: return; |
4905 |
+ case ISD::FP_TO_UINT: Results.push_back(LowerFPTOUINT(N->getOperand(0), DAG)); |
4906 |
++ return; |
4907 |
++ case ISD::LOAD: { |
4908 |
++ SDNode *Node = LowerLOAD(SDValue(N, 0), DAG).getNode(); |
4909 |
++ Results.push_back(SDValue(Node, 0)); |
4910 |
++ Results.push_back(SDValue(Node, 1)); |
4911 |
++ // XXX: LLVM seems not to replace Chain Value inside CustomWidenLowerNode |
4912 |
++ // function |
4913 |
++ DAG.ReplaceAllUsesOfValueWith(SDValue(N,1), SDValue(Node, 1)); |
4914 |
++ return; |
4915 |
++ } |
4916 |
++ case ISD::STORE: |
4917 |
++ SDNode *Node = LowerSTORE(SDValue(N, 0), DAG).getNode(); |
4918 |
++ Results.push_back(SDValue(Node, 0)); |
4919 |
++ return; |
4920 |
+ } |
4921 |
+} |
4922 |
+ |
4923 |
@@ -14156,6 +14499,20 @@ index 0000000..d6b9d90 |
4924 |
+ false, false, false, 0); |
4925 |
+} |
4926 |
+ |
4927 |
++SDValue R600TargetLowering::LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const { |
4928 |
++ |
4929 |
++ MachineFunction &MF = DAG.getMachineFunction(); |
4930 |
++ const AMDGPUFrameLowering *TFL = |
4931 |
++ static_cast<const AMDGPUFrameLowering*>(getTargetMachine().getFrameLowering()); |
4932 |
++ |
4933 |
++ FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Op); |
4934 |
++ assert(FIN); |
4935 |
++ |
4936 |
++ unsigned FrameIndex = FIN->getIndex(); |
4937 |
++ unsigned Offset = TFL->getFrameIndexOffset(MF, FrameIndex); |
4938 |
++ return DAG.getConstant(Offset * 4 * TFL->getStackWidth(MF), MVT::i32); |
4939 |
++} |
4940 |
++ |
4941 |
+SDValue R600TargetLowering::LowerROTL(SDValue Op, SelectionDAG &DAG) const { |
4942 |
+ DebugLoc DL = Op.getDebugLoc(); |
4943 |
+ EVT VT = Op.getValueType(); |
4944 |
@@ -14242,9 +14599,12 @@ index 0000000..d6b9d90 |
4945 |
+ } |
4946 |
+ |
4947 |
+ // Try to lower to a SET* instruction: |
4948 |
-+ // We need all the operands of SELECT_CC to have the same value type, so if |
4949 |
-+ // necessary we need to change True and False to be the same type as LHS and |
4950 |
-+ // RHS, and then convert the result of the select_cc back to the correct type. |
4951 |
++ // |
4952 |
++ // CompareVT == MVT::f32 and VT == MVT::i32 is supported by the hardware, |
4953 |
++ // but for the other case where CompareVT != VT, all operands of |
4954 |
++ // SELECT_CC need to have the same value type, so we need to change True and |
4955 |
++ // False to be the same type as LHS and RHS, and then convert the result of |
4956 |
++ // the select_cc back to the correct type. |
4957 |
+ |
4958 |
+ // Move hardware True/False values to the correct operand. |
4959 |
+ if (isHWTrueValue(False) && isHWFalseValue(True)) { |
4960 |
@@ -14254,32 +14614,17 @@ index 0000000..d6b9d90 |
4961 |
+ } |
4962 |
+ |
4963 |
+ if (isHWTrueValue(True) && isHWFalseValue(False)) { |
4964 |
-+ if (CompareVT != VT) { |
4965 |
-+ if (VT == MVT::f32 && CompareVT == MVT::i32) { |
4966 |
-+ SDValue Boolean = DAG.getNode(ISD::SELECT_CC, DL, CompareVT, |
4967 |
-+ LHS, RHS, |
4968 |
-+ DAG.getConstant(-1, MVT::i32), |
4969 |
-+ DAG.getConstant(0, MVT::i32), |
4970 |
-+ CC); |
4971 |
-+ // Convert integer values of true (-1) and false (0) to fp values of |
4972 |
-+ // true (1.0f) and false (0.0f). |
4973 |
-+ SDValue LSB = DAG.getNode(ISD::AND, DL, MVT::i32, Boolean, |
4974 |
-+ DAG.getConstant(1, MVT::i32)); |
4975 |
-+ return DAG.getNode(ISD::UINT_TO_FP, DL, VT, LSB); |
4976 |
-+ } else if (VT == MVT::i32 && CompareVT == MVT::f32) { |
4977 |
-+ SDValue BoolAsFlt = DAG.getNode(ISD::SELECT_CC, DL, CompareVT, |
4978 |
-+ LHS, RHS, |
4979 |
-+ DAG.getConstantFP(1.0f, MVT::f32), |
4980 |
-+ DAG.getConstantFP(0.0f, MVT::f32), |
4981 |
-+ CC); |
4982 |
-+ // Convert fp values of true (1.0f) and false (0.0f) to integer values |
4983 |
-+ // of true (-1) and false (0). |
4984 |
-+ SDValue Neg = DAG.getNode(ISD::FNEG, DL, MVT::f32, BoolAsFlt); |
4985 |
-+ return DAG.getNode(ISD::FP_TO_SINT, DL, VT, Neg); |
4986 |
-+ } else { |
4987 |
-+ // I don't think there will be any other type pairings. |
4988 |
-+ assert(!"Unhandled operand type parings in SELECT_CC"); |
4989 |
-+ } |
4990 |
++ if (CompareVT != VT && VT == MVT::f32 && CompareVT == MVT::i32) { |
4991 |
++ SDValue Boolean = DAG.getNode(ISD::SELECT_CC, DL, CompareVT, |
4992 |
++ LHS, RHS, |
4993 |
++ DAG.getConstant(-1, MVT::i32), |
4994 |
++ DAG.getConstant(0, MVT::i32), |
4995 |
++ CC); |
4996 |
++ // Convert integer values of true (-1) and false (0) to fp values of |
4997 |
++ // true (1.0f) and false (0.0f). |
4998 |
++ SDValue LSB = DAG.getNode(ISD::AND, DL, MVT::i32, Boolean, |
4999 |
++ DAG.getConstant(1, MVT::i32)); |
5000 |
++ return DAG.getNode(ISD::UINT_TO_FP, DL, VT, LSB); |
5001 |
+ } else { |
5002 |
+ // This SELECT_CC is already legal. |
5003 |
+ return DAG.getNode(ISD::SELECT_CC, DL, VT, LHS, RHS, True, False, CC); |
5004 |
@@ -14370,6 +14715,61 @@ index 0000000..d6b9d90 |
5005 |
+ return Cond; |
5006 |
+} |
5007 |
+ |
5008 |
++/// LLVM generates byte-addresed pointers. For indirect addressing, we need to |
5009 |
++/// convert these pointers to a register index. Each register holds |
5010 |
++/// 16 bytes, (4 x 32bit sub-register), but we need to take into account the |
5011 |
++/// \p StackWidth, which tells us how many of the 4 sub-registrers will be used |
5012 |
++/// for indirect addressing. |
5013 |
++SDValue R600TargetLowering::stackPtrToRegIndex(SDValue Ptr, |
5014 |
++ unsigned StackWidth, |
5015 |
++ SelectionDAG &DAG) const { |
5016 |
++ unsigned SRLPad; |
5017 |
++ switch(StackWidth) { |
5018 |
++ case 1: |
5019 |
++ SRLPad = 2; |
5020 |
++ break; |
5021 |
++ case 2: |
5022 |
++ SRLPad = 3; |
5023 |
++ break; |
5024 |
++ case 4: |
5025 |
++ SRLPad = 4; |
5026 |
++ break; |
5027 |
++ default: llvm_unreachable("Invalid stack width"); |
5028 |
++ } |
5029 |
++ |
5030 |
++ return DAG.getNode(ISD::SRL, Ptr.getDebugLoc(), Ptr.getValueType(), Ptr, |
5031 |
++ DAG.getConstant(SRLPad, MVT::i32)); |
5032 |
++} |
5033 |
++ |
5034 |
++void R600TargetLowering::getStackAddress(unsigned StackWidth, |
5035 |
++ unsigned ElemIdx, |
5036 |
++ unsigned &Channel, |
5037 |
++ unsigned &PtrIncr) const { |
5038 |
++ switch (StackWidth) { |
5039 |
++ default: |
5040 |
++ case 1: |
5041 |
++ Channel = 0; |
5042 |
++ if (ElemIdx > 0) { |
5043 |
++ PtrIncr = 1; |
5044 |
++ } else { |
5045 |
++ PtrIncr = 0; |
5046 |
++ } |
5047 |
++ break; |
5048 |
++ case 2: |
5049 |
++ Channel = ElemIdx % 2; |
5050 |
++ if (ElemIdx == 2) { |
5051 |
++ PtrIncr = 1; |
5052 |
++ } else { |
5053 |
++ PtrIncr = 0; |
5054 |
++ } |
5055 |
++ break; |
5056 |
++ case 4: |
5057 |
++ Channel = ElemIdx; |
5058 |
++ PtrIncr = 0; |
5059 |
++ break; |
5060 |
++ } |
5061 |
++} |
5062 |
++ |
5063 |
+SDValue R600TargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const { |
5064 |
+ DebugLoc DL = Op.getDebugLoc(); |
5065 |
+ StoreSDNode *StoreNode = cast<StoreSDNode>(Op); |
5066 |
@@ -14391,23 +14791,202 @@ index 0000000..d6b9d90 |
5067 |
+ } |
5068 |
+ return Chain; |
5069 |
+ } |
5070 |
-+ return SDValue(); |
5071 |
-+} |
5072 |
+ |
5073 |
++ EVT ValueVT = Value.getValueType(); |
5074 |
+ |
5075 |
-+SDValue R600TargetLowering::LowerFPOW(SDValue Op, |
5076 |
-+ SelectionDAG &DAG) const { |
5077 |
-+ DebugLoc DL = Op.getDebugLoc(); |
5078 |
-+ EVT VT = Op.getValueType(); |
5079 |
-+ SDValue LogBase = DAG.getNode(ISD::FLOG2, DL, VT, Op.getOperand(0)); |
5080 |
-+ SDValue MulLogBase = DAG.getNode(ISD::FMUL, DL, VT, Op.getOperand(1), LogBase); |
5081 |
-+ return DAG.getNode(ISD::FEXP2, DL, VT, MulLogBase); |
5082 |
++ if (StoreNode->getAddressSpace() != AMDGPUAS::PRIVATE_ADDRESS) { |
5083 |
++ return SDValue(); |
5084 |
++ } |
5085 |
++ |
5086 |
++ // Lowering for indirect addressing |
5087 |
++ |
5088 |
++ const MachineFunction &MF = DAG.getMachineFunction(); |
5089 |
++ const AMDGPUFrameLowering *TFL = static_cast<const AMDGPUFrameLowering*>( |
5090 |
++ getTargetMachine().getFrameLowering()); |
5091 |
++ unsigned StackWidth = TFL->getStackWidth(MF); |
5092 |
++ |
5093 |
++ Ptr = stackPtrToRegIndex(Ptr, StackWidth, DAG); |
5094 |
++ |
5095 |
++ if (ValueVT.isVector()) { |
5096 |
++ unsigned NumElemVT = ValueVT.getVectorNumElements(); |
5097 |
++ EVT ElemVT = ValueVT.getVectorElementType(); |
5098 |
++ SDValue Stores[4]; |
5099 |
++ |
5100 |
++ assert(NumElemVT >= StackWidth && "Stack width cannot be greater than " |
5101 |
++ "vector width in load"); |
5102 |
++ |
5103 |
++ for (unsigned i = 0; i < NumElemVT; ++i) { |
5104 |
++ unsigned Channel, PtrIncr; |
5105 |
++ getStackAddress(StackWidth, i, Channel, PtrIncr); |
5106 |
++ Ptr = DAG.getNode(ISD::ADD, DL, MVT::i32, Ptr, |
5107 |
++ DAG.getConstant(PtrIncr, MVT::i32)); |
5108 |
++ SDValue Elem = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ElemVT, |
5109 |
++ Value, DAG.getConstant(i, MVT::i32)); |
5110 |
++ |
5111 |
++ Stores[i] = DAG.getNode(AMDGPUISD::REGISTER_STORE, DL, MVT::Other, |
5112 |
++ Chain, Elem, Ptr, |
5113 |
++ DAG.getTargetConstant(Channel, MVT::i32)); |
5114 |
++ } |
5115 |
++ Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Stores, NumElemVT); |
5116 |
++ } else { |
5117 |
++ if (ValueVT == MVT::i8) { |
5118 |
++ Value = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Value); |
5119 |
++ } |
5120 |
++ Chain = DAG.getNode(AMDGPUISD::REGISTER_STORE, DL, MVT::Other, Chain, Value, Ptr, |
5121 |
++ DAG.getTargetConstant(0, MVT::i32)); // Channel |
5122 |
++ } |
5123 |
++ |
5124 |
++ return Chain; |
5125 |
+} |
5126 |
+ |
5127 |
-+/// XXX Only kernel functions are supported, so we can assume for now that |
5128 |
-+/// every function is a kernel function, but in the future we should use |
5129 |
-+/// separate calling conventions for kernel and non-kernel functions. |
5130 |
-+SDValue R600TargetLowering::LowerFormalArguments( |
5131 |
++// return (512 + (kc_bank << 12) |
5132 |
++static int |
5133 |
++ConstantAddressBlock(unsigned AddressSpace) { |
5134 |
++ switch (AddressSpace) { |
5135 |
++ case AMDGPUAS::CONSTANT_BUFFER_0: |
5136 |
++ return 512; |
5137 |
++ case AMDGPUAS::CONSTANT_BUFFER_1: |
5138 |
++ return 512 + 4096; |
5139 |
++ case AMDGPUAS::CONSTANT_BUFFER_2: |
5140 |
++ return 512 + 4096 * 2; |
5141 |
++ case AMDGPUAS::CONSTANT_BUFFER_3: |
5142 |
++ return 512 + 4096 * 3; |
5143 |
++ case AMDGPUAS::CONSTANT_BUFFER_4: |
5144 |
++ return 512 + 4096 * 4; |
5145 |
++ case AMDGPUAS::CONSTANT_BUFFER_5: |
5146 |
++ return 512 + 4096 * 5; |
5147 |
++ case AMDGPUAS::CONSTANT_BUFFER_6: |
5148 |
++ return 512 + 4096 * 6; |
5149 |
++ case AMDGPUAS::CONSTANT_BUFFER_7: |
5150 |
++ return 512 + 4096 * 7; |
5151 |
++ case AMDGPUAS::CONSTANT_BUFFER_8: |
5152 |
++ return 512 + 4096 * 8; |
5153 |
++ case AMDGPUAS::CONSTANT_BUFFER_9: |
5154 |
++ return 512 + 4096 * 9; |
5155 |
++ case AMDGPUAS::CONSTANT_BUFFER_10: |
5156 |
++ return 512 + 4096 * 10; |
5157 |
++ case AMDGPUAS::CONSTANT_BUFFER_11: |
5158 |
++ return 512 + 4096 * 11; |
5159 |
++ case AMDGPUAS::CONSTANT_BUFFER_12: |
5160 |
++ return 512 + 4096 * 12; |
5161 |
++ case AMDGPUAS::CONSTANT_BUFFER_13: |
5162 |
++ return 512 + 4096 * 13; |
5163 |
++ case AMDGPUAS::CONSTANT_BUFFER_14: |
5164 |
++ return 512 + 4096 * 14; |
5165 |
++ case AMDGPUAS::CONSTANT_BUFFER_15: |
5166 |
++ return 512 + 4096 * 15; |
5167 |
++ default: |
5168 |
++ return -1; |
5169 |
++ } |
5170 |
++} |
5171 |
++ |
5172 |
++SDValue R600TargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const |
5173 |
++{ |
5174 |
++ EVT VT = Op.getValueType(); |
5175 |
++ DebugLoc DL = Op.getDebugLoc(); |
5176 |
++ LoadSDNode *LoadNode = cast<LoadSDNode>(Op); |
5177 |
++ SDValue Chain = Op.getOperand(0); |
5178 |
++ SDValue Ptr = Op.getOperand(1); |
5179 |
++ SDValue LoweredLoad; |
5180 |
++ |
5181 |
++ int ConstantBlock = ConstantAddressBlock(LoadNode->getAddressSpace()); |
5182 |
++ if (ConstantBlock > -1) { |
5183 |
++ SDValue Result; |
5184 |
++ if (dyn_cast<ConstantExpr>(LoadNode->getSrcValue()) || |
5185 |
++ dyn_cast<Constant>(LoadNode->getSrcValue())) { |
5186 |
++ SDValue Slots[4]; |
5187 |
++ for (unsigned i = 0; i < 4; i++) { |
5188 |
++ // We want Const position encoded with the following formula : |
5189 |
++ // (((512 + (kc_bank << 12) + const_index) << 2) + chan) |
5190 |
++ // const_index is Ptr computed by llvm using an alignment of 16. |
5191 |
++ // Thus we add (((512 + (kc_bank << 12)) + chan ) * 4 here and |
5192 |
++ // then div by 4 at the ISel step |
5193 |
++ SDValue NewPtr = DAG.getNode(ISD::ADD, DL, Ptr.getValueType(), Ptr, |
5194 |
++ DAG.getConstant(4 * i + ConstantBlock * 16, MVT::i32)); |
5195 |
++ Slots[i] = DAG.getNode(AMDGPUISD::CONST_ADDRESS, DL, MVT::i32, NewPtr); |
5196 |
++ } |
5197 |
++ Result = DAG.getNode(ISD::BUILD_VECTOR, DL, MVT::v4i32, Slots, 4); |
5198 |
++ } else { |
5199 |
++ // non constant ptr cant be folded, keeps it as a v4f32 load |
5200 |
++ Result = DAG.getNode(AMDGPUISD::CONST_ADDRESS, DL, MVT::v4i32, |
5201 |
++ DAG.getNode(ISD::SRL, DL, MVT::i32, Ptr, DAG.getConstant(4, MVT::i32)) |
5202 |
++ ); |
5203 |
++ } |
5204 |
++ |
5205 |
++ if (!VT.isVector()) { |
5206 |
++ Result = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i32, Result, |
5207 |
++ DAG.getConstant(0, MVT::i32)); |
5208 |
++ } |
5209 |
++ |
5210 |
++ SDValue MergedValues[2] = { |
5211 |
++ Result, |
5212 |
++ Chain |
5213 |
++ }; |
5214 |
++ return DAG.getMergeValues(MergedValues, 2, DL); |
5215 |
++ } |
5216 |
++ |
5217 |
++ if (LoadNode->getAddressSpace() != AMDGPUAS::PRIVATE_ADDRESS) { |
5218 |
++ return SDValue(); |
5219 |
++ } |
5220 |
++ |
5221 |
++ // Lowering for indirect addressing |
5222 |
++ const MachineFunction &MF = DAG.getMachineFunction(); |
5223 |
++ const AMDGPUFrameLowering *TFL = static_cast<const AMDGPUFrameLowering*>( |
5224 |
++ getTargetMachine().getFrameLowering()); |
5225 |
++ unsigned StackWidth = TFL->getStackWidth(MF); |
5226 |
++ |
5227 |
++ Ptr = stackPtrToRegIndex(Ptr, StackWidth, DAG); |
5228 |
++ |
5229 |
++ if (VT.isVector()) { |
5230 |
++ unsigned NumElemVT = VT.getVectorNumElements(); |
5231 |
++ EVT ElemVT = VT.getVectorElementType(); |
5232 |
++ SDValue Loads[4]; |
5233 |
++ |
5234 |
++ assert(NumElemVT >= StackWidth && "Stack width cannot be greater than " |
5235 |
++ "vector width in load"); |
5236 |
++ |
5237 |
++ for (unsigned i = 0; i < NumElemVT; ++i) { |
5238 |
++ unsigned Channel, PtrIncr; |
5239 |
++ getStackAddress(StackWidth, i, Channel, PtrIncr); |
5240 |
++ Ptr = DAG.getNode(ISD::ADD, DL, MVT::i32, Ptr, |
5241 |
++ DAG.getConstant(PtrIncr, MVT::i32)); |
5242 |
++ Loads[i] = DAG.getNode(AMDGPUISD::REGISTER_LOAD, DL, ElemVT, |
5243 |
++ Chain, Ptr, |
5244 |
++ DAG.getTargetConstant(Channel, MVT::i32), |
5245 |
++ Op.getOperand(2)); |
5246 |
++ } |
5247 |
++ for (unsigned i = NumElemVT; i < 4; ++i) { |
5248 |
++ Loads[i] = DAG.getUNDEF(ElemVT); |
5249 |
++ } |
5250 |
++ EVT TargetVT = EVT::getVectorVT(*DAG.getContext(), ElemVT, 4); |
5251 |
++ LoweredLoad = DAG.getNode(ISD::BUILD_VECTOR, DL, TargetVT, Loads, 4); |
5252 |
++ } else { |
5253 |
++ LoweredLoad = DAG.getNode(AMDGPUISD::REGISTER_LOAD, DL, VT, |
5254 |
++ Chain, Ptr, |
5255 |
++ DAG.getTargetConstant(0, MVT::i32), // Channel |
5256 |
++ Op.getOperand(2)); |
5257 |
++ } |
5258 |
++ |
5259 |
++ SDValue Ops[2]; |
5260 |
++ Ops[0] = LoweredLoad; |
5261 |
++ Ops[1] = Chain; |
5262 |
++ |
5263 |
++ return DAG.getMergeValues(Ops, 2, DL); |
5264 |
++} |
5265 |
++ |
5266 |
++SDValue R600TargetLowering::LowerFPOW(SDValue Op, |
5267 |
++ SelectionDAG &DAG) const { |
5268 |
++ DebugLoc DL = Op.getDebugLoc(); |
5269 |
++ EVT VT = Op.getValueType(); |
5270 |
++ SDValue LogBase = DAG.getNode(ISD::FLOG2, DL, VT, Op.getOperand(0)); |
5271 |
++ SDValue MulLogBase = DAG.getNode(ISD::FMUL, DL, VT, Op.getOperand(1), LogBase); |
5272 |
++ return DAG.getNode(ISD::FEXP2, DL, VT, MulLogBase); |
5273 |
++} |
5274 |
++ |
5275 |
++/// XXX Only kernel functions are supported, so we can assume for now that |
5276 |
++/// every function is a kernel function, but in the future we should use |
5277 |
++/// separate calling conventions for kernel and non-kernel functions. |
5278 |
++SDValue R600TargetLowering::LowerFormalArguments( |
5279 |
+ SDValue Chain, |
5280 |
+ CallingConv::ID CallConv, |
5281 |
+ bool isVarArg, |
5282 |
@@ -14435,7 +15014,7 @@ index 0000000..d6b9d90 |
5283 |
+ AMDGPUAS::PARAM_I_ADDRESS); |
5284 |
+ SDValue Arg = DAG.getExtLoad(ISD::ZEXTLOAD, DL, VT, DAG.getRoot(), |
5285 |
+ DAG.getConstant(ParamOffsetBytes, MVT::i32), |
5286 |
-+ MachinePointerInfo(new Argument(PtrTy)), |
5287 |
++ MachinePointerInfo(UndefValue::get(PtrTy)), |
5288 |
+ ArgVT, false, false, ArgBytes); |
5289 |
+ InVals.push_back(Arg); |
5290 |
+ ParamOffsetBytes += ArgBytes; |
5291 |
@@ -14466,15 +15045,94 @@ index 0000000..d6b9d90 |
5292 |
+ } |
5293 |
+ break; |
5294 |
+ } |
5295 |
++ |
5296 |
++ // (i32 fp_to_sint (fneg (select_cc f32, f32, 1.0, 0.0 cc))) -> |
5297 |
++ // (i32 select_cc f32, f32, -1, 0 cc) |
5298 |
++ // |
5299 |
++ // Mesa's GLSL frontend generates the above pattern a lot and we can lower |
5300 |
++ // this to one of the SET*_DX10 instructions. |
5301 |
++ case ISD::FP_TO_SINT: { |
5302 |
++ SDValue FNeg = N->getOperand(0); |
5303 |
++ if (FNeg.getOpcode() != ISD::FNEG) { |
5304 |
++ return SDValue(); |
5305 |
++ } |
5306 |
++ SDValue SelectCC = FNeg.getOperand(0); |
5307 |
++ if (SelectCC.getOpcode() != ISD::SELECT_CC || |
5308 |
++ SelectCC.getOperand(0).getValueType() != MVT::f32 || // LHS |
5309 |
++ SelectCC.getOperand(2).getValueType() != MVT::f32 || // True |
5310 |
++ !isHWTrueValue(SelectCC.getOperand(2)) || |
5311 |
++ !isHWFalseValue(SelectCC.getOperand(3))) { |
5312 |
++ return SDValue(); |
5313 |
++ } |
5314 |
++ |
5315 |
++ return DAG.getNode(ISD::SELECT_CC, N->getDebugLoc(), N->getValueType(0), |
5316 |
++ SelectCC.getOperand(0), // LHS |
5317 |
++ SelectCC.getOperand(1), // RHS |
5318 |
++ DAG.getConstant(-1, MVT::i32), // True |
5319 |
++ DAG.getConstant(0, MVT::i32), // Flase |
5320 |
++ SelectCC.getOperand(4)); // CC |
5321 |
++ |
5322 |
++ break; |
5323 |
++ } |
5324 |
++ // Extract_vec (Build_vector) generated by custom lowering |
5325 |
++ // also needs to be customly combined |
5326 |
++ case ISD::EXTRACT_VECTOR_ELT: { |
5327 |
++ SDValue Arg = N->getOperand(0); |
5328 |
++ if (Arg.getOpcode() == ISD::BUILD_VECTOR) { |
5329 |
++ if (ConstantSDNode *Const = dyn_cast<ConstantSDNode>(N->getOperand(1))) { |
5330 |
++ unsigned Element = Const->getZExtValue(); |
5331 |
++ return Arg->getOperand(Element); |
5332 |
++ } |
5333 |
++ } |
5334 |
++ if (Arg.getOpcode() == ISD::BITCAST && |
5335 |
++ Arg.getOperand(0).getOpcode() == ISD::BUILD_VECTOR) { |
5336 |
++ if (ConstantSDNode *Const = dyn_cast<ConstantSDNode>(N->getOperand(1))) { |
5337 |
++ unsigned Element = Const->getZExtValue(); |
5338 |
++ return DAG.getNode(ISD::BITCAST, N->getDebugLoc(), N->getVTList(), |
5339 |
++ Arg->getOperand(0).getOperand(Element)); |
5340 |
++ } |
5341 |
++ } |
5342 |
++ } |
5343 |
++ |
5344 |
++ case ISD::SELECT_CC: { |
5345 |
++ // fold selectcc (selectcc x, y, a, b, cc), b, a, b, seteq -> |
5346 |
++ // selectcc x, y, a, b, inv(cc) |
5347 |
++ SDValue LHS = N->getOperand(0); |
5348 |
++ if (LHS.getOpcode() != ISD::SELECT_CC) { |
5349 |
++ return SDValue(); |
5350 |
++ } |
5351 |
++ |
5352 |
++ SDValue RHS = N->getOperand(1); |
5353 |
++ SDValue True = N->getOperand(2); |
5354 |
++ SDValue False = N->getOperand(3); |
5355 |
++ |
5356 |
++ if (LHS.getOperand(2).getNode() != True.getNode() || |
5357 |
++ LHS.getOperand(3).getNode() != False.getNode() || |
5358 |
++ RHS.getNode() != False.getNode() || |
5359 |
++ cast<CondCodeSDNode>(N->getOperand(4))->get() != ISD::SETEQ) { |
5360 |
++ return SDValue(); |
5361 |
++ } |
5362 |
++ |
5363 |
++ ISD::CondCode CCOpcode = cast<CondCodeSDNode>(LHS->getOperand(4))->get(); |
5364 |
++ CCOpcode = ISD::getSetCCInverse( |
5365 |
++ CCOpcode, LHS.getOperand(0).getValueType().isInteger()); |
5366 |
++ return DAG.getSelectCC(N->getDebugLoc(), |
5367 |
++ LHS.getOperand(0), |
5368 |
++ LHS.getOperand(1), |
5369 |
++ LHS.getOperand(2), |
5370 |
++ LHS.getOperand(3), |
5371 |
++ CCOpcode); |
5372 |
++ |
5373 |
++ } |
5374 |
+ } |
5375 |
+ return SDValue(); |
5376 |
+} |
5377 |
diff --git a/lib/Target/R600/R600ISelLowering.h b/lib/Target/R600/R600ISelLowering.h |
5378 |
new file mode 100644 |
5379 |
-index 0000000..2b954da |
5380 |
+index 0000000..afa3897 |
5381 |
--- /dev/null |
5382 |
+++ b/lib/Target/R600/R600ISelLowering.h |
5383 |
-@@ -0,0 +1,72 @@ |
5384 |
+@@ -0,0 +1,78 @@ |
5385 |
+//===-- R600ISelLowering.h - R600 DAG Lowering Interface -*- C++ -*--------===// |
5386 |
+// |
5387 |
+// The LLVM Compiler Infrastructure |
5388 |
@@ -14540,7 +15198,13 @@ index 0000000..2b954da |
5389 |
+ SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const; |
5390 |
+ SDValue LowerFPTOUINT(SDValue Op, SelectionDAG &DAG) const; |
5391 |
+ SDValue LowerFPOW(SDValue Op, SelectionDAG &DAG) const; |
5392 |
-+ |
5393 |
++ SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const; |
5394 |
++ SDValue LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const; |
5395 |
++ |
5396 |
++ SDValue stackPtrToRegIndex(SDValue Ptr, unsigned StackWidth, |
5397 |
++ SelectionDAG &DAG) const; |
5398 |
++ void getStackAddress(unsigned StackWidth, unsigned ElemIdx, |
5399 |
++ unsigned &Channel, unsigned &PtrIncr) const; |
5400 |
+ bool isZero(SDValue Op) const; |
5401 |
+}; |
5402 |
+ |
5403 |
@@ -14549,10 +15213,10 @@ index 0000000..2b954da |
5404 |
+#endif // R600ISELLOWERING_H |
5405 |
diff --git a/lib/Target/R600/R600InstrInfo.cpp b/lib/Target/R600/R600InstrInfo.cpp |
5406 |
new file mode 100644 |
5407 |
-index 0000000..70ed41aba |
5408 |
+index 0000000..31671ea |
5409 |
--- /dev/null |
5410 |
+++ b/lib/Target/R600/R600InstrInfo.cpp |
5411 |
-@@ -0,0 +1,665 @@ |
5412 |
+@@ -0,0 +1,776 @@ |
5413 |
+//===-- R600InstrInfo.cpp - R600 Instruction Information ------------------===// |
5414 |
+// |
5415 |
+// The LLVM Compiler Infrastructure |
5416 |
@@ -14571,8 +15235,12 @@ index 0000000..70ed41aba |
5417 |
+#include "AMDGPUTargetMachine.h" |
5418 |
+#include "AMDGPUSubtarget.h" |
5419 |
+#include "R600Defines.h" |
5420 |
++#include "R600MachineFunctionInfo.h" |
5421 |
+#include "R600RegisterInfo.h" |
5422 |
+#include "llvm/CodeGen/MachineInstrBuilder.h" |
5423 |
++#include "llvm/CodeGen/MachineFrameInfo.h" |
5424 |
++#include "llvm/CodeGen/MachineRegisterInfo.h" |
5425 |
++#include "llvm/Instructions.h" |
5426 |
+ |
5427 |
+#define GET_INSTRINFO_CTOR |
5428 |
+#include "AMDGPUGenDFAPacketizer.inc" |
5429 |
@@ -14627,11 +15295,10 @@ index 0000000..70ed41aba |
5430 |
+MachineInstr * R600InstrInfo::getMovImmInstr(MachineFunction *MF, |
5431 |
+ unsigned DstReg, int64_t Imm) const { |
5432 |
+ MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::MOV), DebugLoc()); |
5433 |
-+ MachineInstrBuilder MIB(*MF, MI); |
5434 |
-+ MIB.addReg(DstReg, RegState::Define); |
5435 |
-+ MIB.addReg(AMDGPU::ALU_LITERAL_X); |
5436 |
-+ MIB.addImm(Imm); |
5437 |
-+ MIB.addReg(0); // PREDICATE_BIT |
5438 |
++ MachineInstrBuilder(MI).addReg(DstReg, RegState::Define); |
5439 |
++ MachineInstrBuilder(MI).addReg(AMDGPU::ALU_LITERAL_X); |
5440 |
++ MachineInstrBuilder(MI).addImm(Imm); |
5441 |
++ MachineInstrBuilder(MI).addReg(0); // PREDICATE_BIT |
5442 |
+ |
5443 |
+ return MI; |
5444 |
+} |
5445 |
@@ -14659,7 +15326,6 @@ index 0000000..70ed41aba |
5446 |
+ switch (Opcode) { |
5447 |
+ default: return false; |
5448 |
+ case AMDGPU::RETURN: |
5449 |
-+ case AMDGPU::RESERVE_REG: |
5450 |
+ return true; |
5451 |
+ } |
5452 |
+} |
5453 |
@@ -15005,8 +15671,7 @@ index 0000000..70ed41aba |
5454 |
+ if (PIdx != -1) { |
5455 |
+ MachineOperand &PMO = MI->getOperand(PIdx); |
5456 |
+ PMO.setReg(Pred[2].getReg()); |
5457 |
-+ MachineInstrBuilder MIB(*MI->getParent()->getParent(), MI); |
5458 |
-+ MIB.addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit); |
5459 |
++ MachineInstrBuilder(MI).addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit); |
5460 |
+ return true; |
5461 |
+ } |
5462 |
+ |
5463 |
@@ -15021,6 +15686,124 @@ index 0000000..70ed41aba |
5464 |
+ return 2; |
5465 |
+} |
5466 |
+ |
5467 |
++int R600InstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const { |
5468 |
++ const MachineRegisterInfo &MRI = MF.getRegInfo(); |
5469 |
++ const MachineFrameInfo *MFI = MF.getFrameInfo(); |
5470 |
++ int Offset = 0; |
5471 |
++ |
5472 |
++ if (MFI->getNumObjects() == 0) { |
5473 |
++ return -1; |
5474 |
++ } |
5475 |
++ |
5476 |
++ if (MRI.livein_empty()) { |
5477 |
++ return 0; |
5478 |
++ } |
5479 |
++ |
5480 |
++ for (MachineRegisterInfo::livein_iterator LI = MRI.livein_begin(), |
5481 |
++ LE = MRI.livein_end(); |
5482 |
++ LI != LE; ++LI) { |
5483 |
++ Offset = std::max(Offset, |
5484 |
++ GET_REG_INDEX(RI.getEncodingValue(LI->first))); |
5485 |
++ } |
5486 |
++ |
5487 |
++ return Offset + 1; |
5488 |
++} |
5489 |
++ |
5490 |
++int R600InstrInfo::getIndirectIndexEnd(const MachineFunction &MF) const { |
5491 |
++ int Offset = 0; |
5492 |
++ const MachineFrameInfo *MFI = MF.getFrameInfo(); |
5493 |
++ |
5494 |
++ // Variable sized objects are not supported |
5495 |
++ assert(!MFI->hasVarSizedObjects()); |
5496 |
++ |
5497 |
++ if (MFI->getNumObjects() == 0) { |
5498 |
++ return -1; |
5499 |
++ } |
5500 |
++ |
5501 |
++ Offset = TM.getFrameLowering()->getFrameIndexOffset(MF, -1); |
5502 |
++ |
5503 |
++ return getIndirectIndexBegin(MF) + Offset; |
5504 |
++} |
5505 |
++ |
5506 |
++std::vector<unsigned> R600InstrInfo::getIndirectReservedRegs( |
5507 |
++ const MachineFunction &MF) const { |
5508 |
++ const AMDGPUFrameLowering *TFL = |
5509 |
++ static_cast<const AMDGPUFrameLowering*>(TM.getFrameLowering()); |
5510 |
++ std::vector<unsigned> Regs; |
5511 |
++ |
5512 |
++ unsigned StackWidth = TFL->getStackWidth(MF); |
5513 |
++ int End = getIndirectIndexEnd(MF); |
5514 |
++ |
5515 |
++ if (End == -1) { |
5516 |
++ return Regs; |
5517 |
++ } |
5518 |
++ |
5519 |
++ for (int Index = getIndirectIndexBegin(MF); Index <= End; ++Index) { |
5520 |
++ unsigned SuperReg = AMDGPU::R600_Reg128RegClass.getRegister(Index); |
5521 |
++ Regs.push_back(SuperReg); |
5522 |
++ for (unsigned Chan = 0; Chan < StackWidth; ++Chan) { |
5523 |
++ unsigned Reg = AMDGPU::R600_TReg32RegClass.getRegister((4 * Index) + Chan); |
5524 |
++ Regs.push_back(Reg); |
5525 |
++ } |
5526 |
++ } |
5527 |
++ return Regs; |
5528 |
++} |
5529 |
++ |
5530 |
++unsigned R600InstrInfo::calculateIndirectAddress(unsigned RegIndex, |
5531 |
++ unsigned Channel) const { |
5532 |
++ // XXX: Remove when we support a stack width > 2 |
5533 |
++ assert(Channel == 0); |
5534 |
++ return RegIndex; |
5535 |
++} |
5536 |
++ |
5537 |
++const TargetRegisterClass * R600InstrInfo::getIndirectAddrStoreRegClass( |
5538 |
++ unsigned SourceReg) const { |
5539 |
++ return &AMDGPU::R600_TReg32RegClass; |
5540 |
++} |
5541 |
++ |
5542 |
++const TargetRegisterClass *R600InstrInfo::getIndirectAddrLoadRegClass() const { |
5543 |
++ return &AMDGPU::TRegMemRegClass; |
5544 |
++} |
5545 |
++ |
5546 |
++MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB, |
5547 |
++ MachineBasicBlock::iterator I, |
5548 |
++ unsigned ValueReg, unsigned Address, |
5549 |
++ unsigned OffsetReg) const { |
5550 |
++ unsigned AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address); |
5551 |
++ MachineInstr *MOVA = buildDefaultInstruction(*MBB, I, AMDGPU::MOVA_INT_eg, |
5552 |
++ AMDGPU::AR_X, OffsetReg); |
5553 |
++ setImmOperand(MOVA, R600Operands::WRITE, 0); |
5554 |
++ |
5555 |
++ MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV, |
5556 |
++ AddrReg, ValueReg) |
5557 |
++ .addReg(AMDGPU::AR_X, RegState::Implicit); |
5558 |
++ setImmOperand(Mov, R600Operands::DST_REL, 1); |
5559 |
++ return Mov; |
5560 |
++} |
5561 |
++ |
5562 |
++MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB, |
5563 |
++ MachineBasicBlock::iterator I, |
5564 |
++ unsigned ValueReg, unsigned Address, |
5565 |
++ unsigned OffsetReg) const { |
5566 |
++ unsigned AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address); |
5567 |
++ MachineInstr *MOVA = buildDefaultInstruction(*MBB, I, AMDGPU::MOVA_INT_eg, |
5568 |
++ AMDGPU::AR_X, |
5569 |
++ OffsetReg); |
5570 |
++ setImmOperand(MOVA, R600Operands::WRITE, 0); |
5571 |
++ MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV, |
5572 |
++ ValueReg, |
5573 |
++ AddrReg) |
5574 |
++ .addReg(AMDGPU::AR_X, RegState::Implicit); |
5575 |
++ setImmOperand(Mov, R600Operands::SRC0_REL, 1); |
5576 |
++ |
5577 |
++ return Mov; |
5578 |
++} |
5579 |
++ |
5580 |
++const TargetRegisterClass *R600InstrInfo::getSuperIndirectRegClass() const { |
5581 |
++ return &AMDGPU::IndirectRegRegClass; |
5582 |
++} |
5583 |
++ |
5584 |
++ |
5585 |
+MachineInstrBuilder R600InstrInfo::buildDefaultInstruction(MachineBasicBlock &MBB, |
5586 |
+ MachineBasicBlock::iterator I, |
5587 |
+ unsigned Opcode, |
5588 |
@@ -15041,13 +15824,15 @@ index 0000000..70ed41aba |
5589 |
+ .addReg(Src0Reg) // $src0 |
5590 |
+ .addImm(0) // $src0_neg |
5591 |
+ .addImm(0) // $src0_rel |
5592 |
-+ .addImm(0); // $src0_abs |
5593 |
++ .addImm(0) // $src0_abs |
5594 |
++ .addImm(-1); // $src0_sel |
5595 |
+ |
5596 |
+ if (Src1Reg) { |
5597 |
+ MIB.addReg(Src1Reg) // $src1 |
5598 |
+ .addImm(0) // $src1_neg |
5599 |
+ .addImm(0) // $src1_rel |
5600 |
-+ .addImm(0); // $src1_abs |
5601 |
++ .addImm(0) // $src1_abs |
5602 |
++ .addImm(-1); // $src1_sel |
5603 |
+ } |
5604 |
+ |
5605 |
+ //XXX: The r600g finalizer expects this to be 1, once we've moved the |
5606 |
@@ -15076,16 +15861,6 @@ index 0000000..70ed41aba |
5607 |
+ |
5608 |
+int R600InstrInfo::getOperandIdx(unsigned Opcode, |
5609 |
+ R600Operands::Ops Op) const { |
5610 |
-+ const static int OpTable[3][R600Operands::COUNT] = { |
5611 |
-+// W C S S S S S S S S |
5612 |
-+// R O D L S R R R S R R R S R R L P |
5613 |
-+// D U I M R A R C C C C C C C R C C A R I |
5614 |
-+// S E U T O E M C 0 0 0 C 1 1 1 C 2 2 S E M |
5615 |
-+// T M P E D L P 0 N R A 1 N R A 2 N R T D M |
5616 |
-+ {0,-1,-1, 1, 2, 3, 4, 5, 6, 7, 8,-1,-1,-1,-1,-1,-1,-1, 9,10,11}, |
5617 |
-+ {0, 1, 2, 3, 4 ,5 ,6 ,7, 8, 9,10,11,12,-1,-1,-1,13,14,15,16,17}, |
5618 |
-+ {0,-1,-1,-1,-1, 1, 2, 3, 4, 5,-1, 6, 7, 8,-1, 9,10,11,12,13,14} |
5619 |
-+ }; |
5620 |
+ unsigned TargetFlags = get(Opcode).TSFlags; |
5621 |
+ unsigned OpTableIdx; |
5622 |
+ |
5623 |
@@ -15111,7 +15886,7 @@ index 0000000..70ed41aba |
5624 |
+ OpTableIdx = 2; |
5625 |
+ } |
5626 |
+ |
5627 |
-+ return OpTable[OpTableIdx][Op]; |
5628 |
++ return R600Operands::ALUOpTable[OpTableIdx][Op]; |
5629 |
+} |
5630 |
+ |
5631 |
+void R600InstrInfo::setImmOperand(MachineInstr *MI, R600Operands::Ops Op, |
5632 |
@@ -15220,10 +15995,10 @@ index 0000000..70ed41aba |
5633 |
+} |
5634 |
diff --git a/lib/Target/R600/R600InstrInfo.h b/lib/Target/R600/R600InstrInfo.h |
5635 |
new file mode 100644 |
5636 |
-index 0000000..6bb0ca9 |
5637 |
+index 0000000..278fad1 |
5638 |
--- /dev/null |
5639 |
+++ b/lib/Target/R600/R600InstrInfo.h |
5640 |
-@@ -0,0 +1,169 @@ |
5641 |
+@@ -0,0 +1,201 @@ |
5642 |
+//===-- R600InstrInfo.h - R600 Instruction Info Interface -------*- C++ -*-===// |
5643 |
+// |
5644 |
+// The LLVM Compiler Infrastructure |
5645 |
@@ -15340,6 +16115,38 @@ index 0000000..6bb0ca9 |
5646 |
+ virtual int getInstrLatency(const InstrItineraryData *ItinData, |
5647 |
+ SDNode *Node) const { return 1;} |
5648 |
+ |
5649 |
++ /// \returns a list of all the registers that may be accesed using indirect |
5650 |
++ /// addressing. |
5651 |
++ std::vector<unsigned> getIndirectReservedRegs(const MachineFunction &MF) const; |
5652 |
++ |
5653 |
++ virtual int getIndirectIndexBegin(const MachineFunction &MF) const; |
5654 |
++ |
5655 |
++ virtual int getIndirectIndexEnd(const MachineFunction &MF) const; |
5656 |
++ |
5657 |
++ |
5658 |
++ virtual unsigned calculateIndirectAddress(unsigned RegIndex, |
5659 |
++ unsigned Channel) const; |
5660 |
++ |
5661 |
++ virtual const TargetRegisterClass *getIndirectAddrStoreRegClass( |
5662 |
++ unsigned SourceReg) const; |
5663 |
++ |
5664 |
++ virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const; |
5665 |
++ |
5666 |
++ virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB, |
5667 |
++ MachineBasicBlock::iterator I, |
5668 |
++ unsigned ValueReg, unsigned Address, |
5669 |
++ unsigned OffsetReg) const; |
5670 |
++ |
5671 |
++ virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB, |
5672 |
++ MachineBasicBlock::iterator I, |
5673 |
++ unsigned ValueReg, unsigned Address, |
5674 |
++ unsigned OffsetReg) const; |
5675 |
++ |
5676 |
++ virtual const TargetRegisterClass *getSuperIndirectRegClass() const; |
5677 |
++ |
5678 |
++ |
5679 |
++ ///buildDefaultInstruction - This function returns a MachineInstr with |
5680 |
++ /// all the instruction modifiers initialized to their default values. |
5681 |
+ /// You can use this function to avoid manually specifying each instruction |
5682 |
+ /// modifier operand when building a new instruction. |
5683 |
+ /// |
5684 |
@@ -15395,10 +16202,10 @@ index 0000000..6bb0ca9 |
5685 |
+#endif // R600INSTRINFO_H_ |
5686 |
diff --git a/lib/Target/R600/R600Instructions.td b/lib/Target/R600/R600Instructions.td |
5687 |
new file mode 100644 |
5688 |
-index 0000000..64bab18 |
5689 |
+index 0000000..409da07 |
5690 |
--- /dev/null |
5691 |
+++ b/lib/Target/R600/R600Instructions.td |
5692 |
-@@ -0,0 +1,1724 @@ |
5693 |
+@@ -0,0 +1,1976 @@ |
5694 |
+//===-- R600Instructions.td - R600 Instruction defs -------*- tablegen -*-===// |
5695 |
+// |
5696 |
+// The LLVM Compiler Infrastructure |
5697 |
@@ -15471,6 +16278,11 @@ index 0000000..64bab18 |
5698 |
+ let PrintMethod = PM; |
5699 |
+} |
5700 |
+ |
5701 |
++// src_sel for ALU src operands, see also ALU_CONST, ALU_PARAM registers |
5702 |
++def SEL : OperandWithDefaultOps <i32, (ops (i32 -1))> { |
5703 |
++ let PrintMethod = "printSel"; |
5704 |
++} |
5705 |
++ |
5706 |
+def LITERAL : InstFlag<"printLiteral">; |
5707 |
+ |
5708 |
+def WRITE : InstFlag <"printWrite", 1>; |
5709 |
@@ -15487,9 +16299,16 @@ index 0000000..64bab18 |
5710 |
+// default to 0. |
5711 |
+def LAST : InstFlag<"printLast", 1>; |
5712 |
+ |
5713 |
++def FRAMEri : Operand<iPTR> { |
5714 |
++ let MIOperandInfo = (ops R600_Reg32:$ptr, i32imm:$index); |
5715 |
++} |
5716 |
++ |
5717 |
+def ADDRParam : ComplexPattern<i32, 2, "SelectADDRParam", [], []>; |
5718 |
+def ADDRDWord : ComplexPattern<i32, 1, "SelectADDRDWord", [], []>; |
5719 |
+def ADDRVTX_READ : ComplexPattern<i32, 2, "SelectADDRVTX_READ", [], []>; |
5720 |
++def ADDRGA_CONST_OFFSET : ComplexPattern<i32, 1, "SelectGlobalValueConstantOffset", [], []>; |
5721 |
++def ADDRGA_VAR_OFFSET : ComplexPattern<i32, 2, "SelectGlobalValueVariableOffset", [], []>; |
5722 |
++def ADDRIndirect : ComplexPattern<iPTR, 2, "SelectADDRIndirect", [], []>; |
5723 |
+ |
5724 |
+class R600ALU_Word0 { |
5725 |
+ field bits<32> Word0; |
5726 |
@@ -15574,6 +16393,55 @@ index 0000000..64bab18 |
5727 |
+ let Word1{17-13} = alu_inst; |
5728 |
+} |
5729 |
+ |
5730 |
++class VTX_WORD0 { |
5731 |
++ field bits<32> Word0; |
5732 |
++ bits<7> SRC_GPR; |
5733 |
++ bits<5> VC_INST; |
5734 |
++ bits<2> FETCH_TYPE; |
5735 |
++ bits<1> FETCH_WHOLE_QUAD; |
5736 |
++ bits<8> BUFFER_ID; |
5737 |
++ bits<1> SRC_REL; |
5738 |
++ bits<2> SRC_SEL_X; |
5739 |
++ bits<6> MEGA_FETCH_COUNT; |
5740 |
++ |
5741 |
++ let Word0{4-0} = VC_INST; |
5742 |
++ let Word0{6-5} = FETCH_TYPE; |
5743 |
++ let Word0{7} = FETCH_WHOLE_QUAD; |
5744 |
++ let Word0{15-8} = BUFFER_ID; |
5745 |
++ let Word0{22-16} = SRC_GPR; |
5746 |
++ let Word0{23} = SRC_REL; |
5747 |
++ let Word0{25-24} = SRC_SEL_X; |
5748 |
++ let Word0{31-26} = MEGA_FETCH_COUNT; |
5749 |
++} |
5750 |
++ |
5751 |
++class VTX_WORD1_GPR { |
5752 |
++ field bits<32> Word1; |
5753 |
++ bits<7> DST_GPR; |
5754 |
++ bits<1> DST_REL; |
5755 |
++ bits<3> DST_SEL_X; |
5756 |
++ bits<3> DST_SEL_Y; |
5757 |
++ bits<3> DST_SEL_Z; |
5758 |
++ bits<3> DST_SEL_W; |
5759 |
++ bits<1> USE_CONST_FIELDS; |
5760 |
++ bits<6> DATA_FORMAT; |
5761 |
++ bits<2> NUM_FORMAT_ALL; |
5762 |
++ bits<1> FORMAT_COMP_ALL; |
5763 |
++ bits<1> SRF_MODE_ALL; |
5764 |
++ |
5765 |
++ let Word1{6-0} = DST_GPR; |
5766 |
++ let Word1{7} = DST_REL; |
5767 |
++ let Word1{8} = 0; // Reserved |
5768 |
++ let Word1{11-9} = DST_SEL_X; |
5769 |
++ let Word1{14-12} = DST_SEL_Y; |
5770 |
++ let Word1{17-15} = DST_SEL_Z; |
5771 |
++ let Word1{20-18} = DST_SEL_W; |
5772 |
++ let Word1{21} = USE_CONST_FIELDS; |
5773 |
++ let Word1{27-22} = DATA_FORMAT; |
5774 |
++ let Word1{29-28} = NUM_FORMAT_ALL; |
5775 |
++ let Word1{30} = FORMAT_COMP_ALL; |
5776 |
++ let Word1{31} = SRF_MODE_ALL; |
5777 |
++} |
5778 |
++ |
5779 |
+/* |
5780 |
+XXX: R600 subtarget uses a slightly different encoding than the other |
5781 |
+subtargets. We currently handle this in R600MCCodeEmitter, but we may |
5782 |
@@ -15615,11 +16483,11 @@ index 0000000..64bab18 |
5783 |
+ InstR600 <0, |
5784 |
+ (outs R600_Reg32:$dst), |
5785 |
+ (ins WRITE:$write, OMOD:$omod, REL:$dst_rel, CLAMP:$clamp, |
5786 |
-+ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs, |
5787 |
++ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs, SEL:$src0_sel, |
5788 |
+ LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal), |
5789 |
+ !strconcat(opName, |
5790 |
+ "$clamp $dst$write$dst_rel$omod, " |
5791 |
-+ "$src0_neg$src0_abs$src0$src0_abs$src0_rel, " |
5792 |
++ "$src0_neg$src0_abs$src0$src0_sel$src0_abs$src0_rel, " |
5793 |
+ "$literal $pred_sel$last"), |
5794 |
+ pattern, |
5795 |
+ itin>, |
5796 |
@@ -15655,13 +16523,13 @@ index 0000000..64bab18 |
5797 |
+ (outs R600_Reg32:$dst), |
5798 |
+ (ins UEM:$update_exec_mask, UP:$update_pred, WRITE:$write, |
5799 |
+ OMOD:$omod, REL:$dst_rel, CLAMP:$clamp, |
5800 |
-+ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs, |
5801 |
-+ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, ABS:$src1_abs, |
5802 |
++ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs, SEL:$src0_sel, |
5803 |
++ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, ABS:$src1_abs, SEL:$src1_sel, |
5804 |
+ LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal), |
5805 |
+ !strconcat(opName, |
5806 |
+ "$clamp $update_exec_mask$update_pred$dst$write$dst_rel$omod, " |
5807 |
-+ "$src0_neg$src0_abs$src0$src0_abs$src0_rel, " |
5808 |
-+ "$src1_neg$src1_abs$src1$src1_abs$src1_rel, " |
5809 |
++ "$src0_neg$src0_abs$src0$src0_sel$src0_abs$src0_rel, " |
5810 |
++ "$src1_neg$src1_abs$src1$src1_sel$src1_abs$src1_rel, " |
5811 |
+ "$literal $pred_sel$last"), |
5812 |
+ pattern, |
5813 |
+ itin>, |
5814 |
@@ -15692,14 +16560,14 @@ index 0000000..64bab18 |
5815 |
+ InstR600 <0, |
5816 |
+ (outs R600_Reg32:$dst), |
5817 |
+ (ins REL:$dst_rel, CLAMP:$clamp, |
5818 |
-+ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, |
5819 |
-+ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, |
5820 |
-+ R600_Reg32:$src2, NEG:$src2_neg, REL:$src2_rel, |
5821 |
++ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, SEL:$src0_sel, |
5822 |
++ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, SEL:$src1_sel, |
5823 |
++ R600_Reg32:$src2, NEG:$src2_neg, REL:$src2_rel, SEL:$src2_sel, |
5824 |
+ LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal), |
5825 |
+ !strconcat(opName, "$clamp $dst$dst_rel, " |
5826 |
-+ "$src0_neg$src0$src0_rel, " |
5827 |
-+ "$src1_neg$src1$src1_rel, " |
5828 |
-+ "$src2_neg$src2$src2_rel, " |
5829 |
++ "$src0_neg$src0$src0_sel$src0_rel, " |
5830 |
++ "$src1_neg$src1$src1_sel$src1_rel, " |
5831 |
++ "$src2_neg$src2$src2_sel$src2_rel, " |
5832 |
+ "$literal $pred_sel$last"), |
5833 |
+ pattern, |
5834 |
+ itin>, |
5835 |
@@ -15743,6 +16611,27 @@ index 0000000..64bab18 |
5836 |
+ }] |
5837 |
+>; |
5838 |
+ |
5839 |
++def TEX_RECT : PatLeaf< |
5840 |
++ (imm), |
5841 |
++ [{uint32_t TType = (uint32_t)N->getZExtValue(); |
5842 |
++ return TType == 5; |
5843 |
++ }] |
5844 |
++>; |
5845 |
++ |
5846 |
++def TEX_ARRAY : PatLeaf< |
5847 |
++ (imm), |
5848 |
++ [{uint32_t TType = (uint32_t)N->getZExtValue(); |
5849 |
++ return TType == 9 || TType == 10 || TType == 15 || TType == 16; |
5850 |
++ }] |
5851 |
++>; |
5852 |
++ |
5853 |
++def TEX_SHADOW_ARRAY : PatLeaf< |
5854 |
++ (imm), |
5855 |
++ [{uint32_t TType = (uint32_t)N->getZExtValue(); |
5856 |
++ return TType == 11 || TType == 12 || TType == 17; |
5857 |
++ }] |
5858 |
++>; |
5859 |
++ |
5860 |
+class EG_CF_RAT <bits <8> cf_inst, bits <6> rat_inst, bits<4> rat_id, dag outs, |
5861 |
+ dag ins, string asm, list<dag> pattern> : |
5862 |
+ InstR600ISA <outs, ins, asm, pattern> { |
5863 |
@@ -15815,32 +16704,35 @@ index 0000000..64bab18 |
5864 |
+ "Subtarget.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX">; |
5865 |
+ |
5866 |
+//===----------------------------------------------------------------------===// |
5867 |
-+// Interpolation Instructions |
5868 |
++// R600 SDNodes |
5869 |
+//===----------------------------------------------------------------------===// |
5870 |
+ |
5871 |
-+def INTERP: SDNode<"AMDGPUISD::INTERP", |
5872 |
-+ SDTypeProfile<1, 2, [SDTCisFP<0>, SDTCisInt<1>, SDTCisInt<2>]> |
5873 |
-+ >; |
5874 |
-+ |
5875 |
-+def INTERP_P0: SDNode<"AMDGPUISD::INTERP_P0", |
5876 |
-+ SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisInt<1>]> |
5877 |
-+ >; |
5878 |
++def INTERP_PAIR_XY : AMDGPUShaderInst < |
5879 |
++ (outs R600_TReg32_X:$dst0, R600_TReg32_Y:$dst1), |
5880 |
++ (ins i32imm:$src0, R600_Reg32:$src1, R600_Reg32:$src2), |
5881 |
++ "INTERP_PAIR_XY $src0 $src1 $src2 : $dst0 dst1", |
5882 |
++ []>; |
5883 |
++ |
5884 |
++def INTERP_PAIR_ZW : AMDGPUShaderInst < |
5885 |
++ (outs R600_TReg32_Z:$dst0, R600_TReg32_W:$dst1), |
5886 |
++ (ins i32imm:$src0, R600_Reg32:$src1, R600_Reg32:$src2), |
5887 |
++ "INTERP_PAIR_ZW $src0 $src1 $src2 : $dst0 dst1", |
5888 |
++ []>; |
5889 |
++ |
5890 |
++def CONST_ADDRESS: SDNode<"AMDGPUISD::CONST_ADDRESS", |
5891 |
++ SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisPtrTy<1>]>, |
5892 |
++ [SDNPMayLoad] |
5893 |
++>; |
5894 |
+ |
5895 |
-+let usesCustomInserter = 1 in { |
5896 |
-+def input_perspective : AMDGPUShaderInst < |
5897 |
-+ (outs R600_Reg128:$dst), |
5898 |
-+ (ins i32imm:$src0, i32imm:$src1), |
5899 |
-+ "input_perspective $src0 $src1 : dst", |
5900 |
-+ [(set R600_Reg128:$dst, (INTERP (i32 imm:$src0), (i32 imm:$src1)))]>; |
5901 |
-+} // End usesCustomInserter = 1 |
5902 |
++//===----------------------------------------------------------------------===// |
5903 |
++// Interpolation Instructions |
5904 |
++//===----------------------------------------------------------------------===// |
5905 |
+ |
5906 |
-+def input_constant : AMDGPUShaderInst < |
5907 |
++def INTERP_VEC_LOAD : AMDGPUShaderInst < |
5908 |
+ (outs R600_Reg128:$dst), |
5909 |
-+ (ins i32imm:$src), |
5910 |
-+ "input_perspective $src : dst", |
5911 |
-+ [(set R600_Reg128:$dst, (INTERP_P0 (i32 imm:$src)))]>; |
5912 |
-+ |
5913 |
-+ |
5914 |
++ (ins i32imm:$src0), |
5915 |
++ "INTERP_LOAD $src0 : $dst", |
5916 |
++ []>; |
5917 |
+ |
5918 |
+def INTERP_XY : R600_2OP <0xD6, "INTERP_XY", []> { |
5919 |
+ let bank_swizzle = 5; |
5920 |
@@ -15908,19 +16800,24 @@ index 0000000..64bab18 |
5921 |
+multiclass ExportPattern<Instruction ExportInst, bits<8> cf_inst> { |
5922 |
+ def : Pat<(int_R600_store_pixel_depth R600_Reg32:$reg), |
5923 |
+ (ExportInst |
5924 |
-+ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sel_x), |
5925 |
++ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sub0), |
5926 |
+ 0, 61, 0, 7, 7, 7, cf_inst, 0) |
5927 |
+ >; |
5928 |
+ |
5929 |
+ def : Pat<(int_R600_store_pixel_stencil R600_Reg32:$reg), |
5930 |
+ (ExportInst |
5931 |
-+ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sel_x), |
5932 |
++ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sub0), |
5933 |
+ 0, 61, 7, 0, 7, 7, cf_inst, 0) |
5934 |
+ >; |
5935 |
+ |
5936 |
-+ def : Pat<(int_R600_store_pixel_dummy), |
5937 |
++ def : Pat<(int_R600_store_dummy (i32 imm:$type)), |
5938 |
++ (ExportInst |
5939 |
++ (v4f32 (IMPLICIT_DEF)), imm:$type, 0, 7, 7, 7, 7, cf_inst, 0) |
5940 |
++ >; |
5941 |
++ |
5942 |
++ def : Pat<(int_R600_store_dummy 1), |
5943 |
+ (ExportInst |
5944 |
-+ (v4f32 (IMPLICIT_DEF)), 0, 0, 7, 7, 7, 7, cf_inst, 0) |
5945 |
++ (v4f32 (IMPLICIT_DEF)), 1, 60, 7, 7, 7, 7, cf_inst, 0) |
5946 |
+ >; |
5947 |
+ |
5948 |
+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 0), |
5949 |
@@ -15928,29 +16825,40 @@ index 0000000..64bab18 |
5950 |
+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase, |
5951 |
+ 0, 1, 2, 3, cf_inst, 0) |
5952 |
+ >; |
5953 |
++ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 1), |
5954 |
++ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm)), |
5955 |
++ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase, |
5956 |
++ 0, 1, 2, 3, cf_inst, 0) |
5957 |
++ >; |
5958 |
++ |
5959 |
++ def : Pat<(int_R600_store_swizzle (v4f32 R600_Reg128:$src), imm:$arraybase, |
5960 |
++ imm:$type), |
5961 |
++ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase, |
5962 |
++ 0, 1, 2, 3, cf_inst, 0) |
5963 |
++ >; |
5964 |
+} |
5965 |
+ |
5966 |
+multiclass SteamOutputExportPattern<Instruction ExportInst, |
5967 |
+ bits<8> buf0inst, bits<8> buf1inst, bits<8> buf2inst, bits<8> buf3inst> { |
5968 |
+// Stream0 |
5969 |
-+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 1), |
5970 |
-+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)), |
5971 |
-+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase, |
5972 |
++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src), |
5973 |
++ (i32 imm:$arraybase), (i32 0), (i32 imm:$mask)), |
5974 |
++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase, |
5975 |
+ 4095, imm:$mask, buf0inst, 0)>; |
5976 |
+// Stream1 |
5977 |
-+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 2), |
5978 |
-+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)), |
5979 |
-+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase, |
5980 |
++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src), |
5981 |
++ (i32 imm:$arraybase), (i32 1), (i32 imm:$mask)), |
5982 |
++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase, |
5983 |
+ 4095, imm:$mask, buf1inst, 0)>; |
5984 |
+// Stream2 |
5985 |
-+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 3), |
5986 |
-+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)), |
5987 |
-+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase, |
5988 |
++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src), |
5989 |
++ (i32 imm:$arraybase), (i32 2), (i32 imm:$mask)), |
5990 |
++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase, |
5991 |
+ 4095, imm:$mask, buf2inst, 0)>; |
5992 |
+// Stream3 |
5993 |
-+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 4), |
5994 |
-+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)), |
5995 |
-+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase, |
5996 |
++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src), |
5997 |
++ (i32 imm:$arraybase), (i32 3), (i32 imm:$mask)), |
5998 |
++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase, |
5999 |
+ 4095, imm:$mask, buf3inst, 0)>; |
6000 |
+} |
6001 |
+ |
6002 |
@@ -16025,6 +16933,34 @@ index 0000000..64bab18 |
6003 |
+ COND_NE))] |
6004 |
+>; |
6005 |
+ |
6006 |
++def SETE_DX10 : R600_2OP < |
6007 |
++ 0xC, "SETE_DX10", |
6008 |
++ [(set R600_Reg32:$dst, |
6009 |
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0), |
6010 |
++ COND_EQ))] |
6011 |
++>; |
6012 |
++ |
6013 |
++def SETGT_DX10 : R600_2OP < |
6014 |
++ 0xD, "SETGT_DX10", |
6015 |
++ [(set R600_Reg32:$dst, |
6016 |
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0), |
6017 |
++ COND_GT))] |
6018 |
++>; |
6019 |
++ |
6020 |
++def SETGE_DX10 : R600_2OP < |
6021 |
++ 0xE, "SETGE_DX10", |
6022 |
++ [(set R600_Reg32:$dst, |
6023 |
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0), |
6024 |
++ COND_GE))] |
6025 |
++>; |
6026 |
++ |
6027 |
++def SETNE_DX10 : R600_2OP < |
6028 |
++ 0xF, "SETNE_DX10", |
6029 |
++ [(set R600_Reg32:$dst, |
6030 |
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0), |
6031 |
++ COND_NE))] |
6032 |
++>; |
6033 |
++ |
6034 |
+def FRACT : R600_1OP_Helper <0x10, "FRACT", AMDGPUfract>; |
6035 |
+def TRUNC : R600_1OP_Helper <0x11, "TRUNC", int_AMDGPU_trunc>; |
6036 |
+def CEIL : R600_1OP_Helper <0x12, "CEIL", fceil>; |
6037 |
@@ -16085,7 +17021,7 @@ index 0000000..64bab18 |
6038 |
+>; |
6039 |
+ |
6040 |
+def SETGT_INT : R600_2OP < |
6041 |
-+ 0x3B, "SGT_INT", |
6042 |
++ 0x3B, "SETGT_INT", |
6043 |
+ [(set (i32 R600_Reg32:$dst), |
6044 |
+ (selectcc (i32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETGT))] |
6045 |
+>; |
6046 |
@@ -16539,6 +17475,10 @@ index 0000000..64bab18 |
6047 |
+ defm DOT4_eg : DOT4_Common<0xBE>; |
6048 |
+ defm CUBE_eg : CUBE_Common<0xC0>; |
6049 |
+ |
6050 |
++let hasSideEffects = 1 in { |
6051 |
++ def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", []>; |
6052 |
++} |
6053 |
++ |
6054 |
+ def TGSI_LIT_Z_eg : TGSI_LIT_Z_Common<MUL_LIT_eg, LOG_CLAMPED_eg, EXP_IEEE_eg>; |
6055 |
+ |
6056 |
+ def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> { |
6057 |
@@ -16629,37 +17569,30 @@ index 0000000..64bab18 |
6058 |
+>; |
6059 |
+ |
6060 |
+class VTX_READ_eg <string name, bits<8> buffer_id, dag outs, list<dag> pattern> |
6061 |
-+ : InstR600ISA <outs, (ins MEMxi:$ptr), name#" $dst, $ptr", pattern> { |
6062 |
-+ |
6063 |
-+ // Operands |
6064 |
-+ bits<7> DST_GPR; |
6065 |
-+ bits<7> SRC_GPR; |
6066 |
++ : InstR600ISA <outs, (ins MEMxi:$ptr), name#" $dst, $ptr", pattern>, |
6067 |
++ VTX_WORD1_GPR, VTX_WORD0 { |
6068 |
+ |
6069 |
+ // Static fields |
6070 |
-+ bits<5> VC_INST = 0; |
6071 |
-+ bits<2> FETCH_TYPE = 2; |
6072 |
-+ bits<1> FETCH_WHOLE_QUAD = 0; |
6073 |
-+ bits<8> BUFFER_ID = buffer_id; |
6074 |
-+ bits<1> SRC_REL = 0; |
6075 |
++ let VC_INST = 0; |
6076 |
++ let FETCH_TYPE = 2; |
6077 |
++ let FETCH_WHOLE_QUAD = 0; |
6078 |
++ let BUFFER_ID = buffer_id; |
6079 |
++ let SRC_REL = 0; |
6080 |
+ // XXX: We can infer this field based on the SRC_GPR. This would allow us |
6081 |
+ // to store vertex addresses in any channel, not just X. |
6082 |
-+ bits<2> SRC_SEL_X = 0; |
6083 |
-+ bits<6> MEGA_FETCH_COUNT; |
6084 |
-+ bits<1> DST_REL = 0; |
6085 |
-+ bits<3> DST_SEL_X; |
6086 |
-+ bits<3> DST_SEL_Y; |
6087 |
-+ bits<3> DST_SEL_Z; |
6088 |
-+ bits<3> DST_SEL_W; |
6089 |
++ let SRC_SEL_X = 0; |
6090 |
++ let DST_REL = 0; |
6091 |
+ // The docs say that if this bit is set, then DATA_FORMAT, NUM_FORMAT_ALL, |
6092 |
+ // FORMAT_COMP_ALL, SRF_MODE_ALL, and ENDIAN_SWAP fields will be ignored, |
6093 |
+ // however, based on my testing if USE_CONST_FIELDS is set, then all |
6094 |
+ // these fields need to be set to 0. |
6095 |
-+ bits<1> USE_CONST_FIELDS = 0; |
6096 |
-+ bits<6> DATA_FORMAT; |
6097 |
-+ bits<2> NUM_FORMAT_ALL = 1; |
6098 |
-+ bits<1> FORMAT_COMP_ALL = 0; |
6099 |
-+ bits<1> SRF_MODE_ALL = 0; |
6100 |
++ let USE_CONST_FIELDS = 0; |
6101 |
++ let NUM_FORMAT_ALL = 1; |
6102 |
++ let FORMAT_COMP_ALL = 0; |
6103 |
++ let SRF_MODE_ALL = 0; |
6104 |
+ |
6105 |
++ let Inst{31-0} = Word0; |
6106 |
++ let Inst{63-32} = Word1; |
6107 |
+ // LLVM can only encode 64-bit instructions, so these fields are manually |
6108 |
+ // encoded in R600CodeEmitter |
6109 |
+ // |
6110 |
@@ -16670,29 +17603,7 @@ index 0000000..64bab18 |
6111 |
+ // bits<1> ALT_CONST = 0; |
6112 |
+ // bits<2> BUFFER_INDEX_MODE = 0; |
6113 |
+ |
6114 |
-+ // VTX_WORD0 |
6115 |
-+ let Inst{4-0} = VC_INST; |
6116 |
-+ let Inst{6-5} = FETCH_TYPE; |
6117 |
-+ let Inst{7} = FETCH_WHOLE_QUAD; |
6118 |
-+ let Inst{15-8} = BUFFER_ID; |
6119 |
-+ let Inst{22-16} = SRC_GPR; |
6120 |
-+ let Inst{23} = SRC_REL; |
6121 |
-+ let Inst{25-24} = SRC_SEL_X; |
6122 |
-+ let Inst{31-26} = MEGA_FETCH_COUNT; |
6123 |
-+ |
6124 |
-+ // VTX_WORD1_GPR |
6125 |
-+ let Inst{38-32} = DST_GPR; |
6126 |
-+ let Inst{39} = DST_REL; |
6127 |
-+ let Inst{40} = 0; // Reserved |
6128 |
-+ let Inst{43-41} = DST_SEL_X; |
6129 |
-+ let Inst{46-44} = DST_SEL_Y; |
6130 |
-+ let Inst{49-47} = DST_SEL_Z; |
6131 |
-+ let Inst{52-50} = DST_SEL_W; |
6132 |
-+ let Inst{53} = USE_CONST_FIELDS; |
6133 |
-+ let Inst{59-54} = DATA_FORMAT; |
6134 |
-+ let Inst{61-60} = NUM_FORMAT_ALL; |
6135 |
-+ let Inst{62} = FORMAT_COMP_ALL; |
6136 |
-+ let Inst{63} = SRF_MODE_ALL; |
6137 |
++ |
6138 |
+ |
6139 |
+ // VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding |
6140 |
+ // is done in R600CodeEmitter |
6141 |
@@ -16788,6 +17699,10 @@ index 0000000..64bab18 |
6142 |
+ [(set (i32 R600_TReg32_X:$dst), (load_param ADDRVTX_READ:$ptr))] |
6143 |
+>; |
6144 |
+ |
6145 |
++def VTX_READ_PARAM_128_eg : VTX_READ_128_eg <0, |
6146 |
++ [(set (v4i32 R600_Reg128:$dst), (load_param ADDRVTX_READ:$ptr))] |
6147 |
++>; |
6148 |
++ |
6149 |
+//===----------------------------------------------------------------------===// |
6150 |
+// VTX Read from global memory space |
6151 |
+//===----------------------------------------------------------------------===// |
6152 |
@@ -16818,6 +17733,12 @@ index 0000000..64bab18 |
6153 |
+ |
6154 |
+} |
6155 |
+ |
6156 |
++//===----------------------------------------------------------------------===// |
6157 |
++// Regist loads and stores - for indirect addressing |
6158 |
++//===----------------------------------------------------------------------===// |
6159 |
++ |
6160 |
++defm R600_ : RegisterLoadStore <R600_Reg32, FRAMEri, ADDRIndirect>; |
6161 |
++ |
6162 |
+let Predicates = [isCayman] in { |
6163 |
+ |
6164 |
+let isVector = 1 in { |
6165 |
@@ -16877,6 +17798,7 @@ index 0000000..64bab18 |
6166 |
+ (ins R600_Reg32:$src0, i32imm:$src1, i32imm:$flags), |
6167 |
+ "", [], NullALU> { |
6168 |
+ let FlagOperandIdx = 3; |
6169 |
++ let isTerminator = 1; |
6170 |
+} |
6171 |
+ |
6172 |
+let isTerminator = 1, isBranch = 1, isBarrier = 1 in { |
6173 |
@@ -16903,19 +17825,6 @@ index 0000000..64bab18 |
6174 |
+ |
6175 |
+} // End mayLoad = 0, mayStore = 0, hasSideEffects = 1 |
6176 |
+ |
6177 |
-+def R600_LOAD_CONST : AMDGPUShaderInst < |
6178 |
-+ (outs R600_Reg32:$dst), |
6179 |
-+ (ins i32imm:$src0), |
6180 |
-+ "R600_LOAD_CONST $dst, $src0", |
6181 |
-+ [(set R600_Reg32:$dst, (int_AMDGPU_load_const imm:$src0))] |
6182 |
-+>; |
6183 |
-+ |
6184 |
-+def RESERVE_REG : AMDGPUShaderInst < |
6185 |
-+ (outs), |
6186 |
-+ (ins i32imm:$src), |
6187 |
-+ "RESERVE_REG $src", |
6188 |
-+ [(int_AMDGPU_reserve_reg imm:$src)] |
6189 |
-+>; |
6190 |
+ |
6191 |
+def TXD: AMDGPUShaderInst < |
6192 |
+ (outs R600_Reg128:$dst), |
6193 |
@@ -16946,22 +17855,148 @@ index 0000000..64bab18 |
6194 |
+ "RETURN", [(IL_retflag)]>; |
6195 |
+} |
6196 |
+ |
6197 |
-+//===--------------------------------------------------------------------===// |
6198 |
-+// Instructions support |
6199 |
-+//===--------------------------------------------------------------------===// |
6200 |
-+//===---------------------------------------------------------------------===// |
6201 |
-+// Custom Inserter for Branches and returns, this eventually will be a |
6202 |
-+// seperate pass |
6203 |
-+//===---------------------------------------------------------------------===// |
6204 |
-+let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1 in { |
6205 |
-+ def BRANCH : ILFormat<(outs), (ins brtarget:$target), |
6206 |
-+ "; Pseudo unconditional branch instruction", |
6207 |
-+ [(br bb:$target)]>; |
6208 |
-+ defm BRANCH_COND : BranchConditional<IL_brcond>; |
6209 |
-+} |
6210 |
+ |
6211 |
-+//===---------------------------------------------------------------------===// |
6212 |
-+// Flow and Program control Instructions |
6213 |
++//===----------------------------------------------------------------------===// |
6214 |
++// Constant Buffer Addressing Support |
6215 |
++//===----------------------------------------------------------------------===// |
6216 |
++ |
6217 |
++let isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU" in { |
6218 |
++def CONST_COPY : Instruction { |
6219 |
++ let OutOperandList = (outs R600_Reg32:$dst); |
6220 |
++ let InOperandList = (ins i32imm:$src); |
6221 |
++ let Pattern = [(set R600_Reg32:$dst, (CONST_ADDRESS ADDRGA_CONST_OFFSET:$src))]; |
6222 |
++ let AsmString = "CONST_COPY"; |
6223 |
++ let neverHasSideEffects = 1; |
6224 |
++ let isAsCheapAsAMove = 1; |
6225 |
++ let Itinerary = NullALU; |
6226 |
++} |
6227 |
++} // end isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU" |
6228 |
++ |
6229 |
++def TEX_VTX_CONSTBUF : |
6230 |
++ InstR600ISA <(outs R600_Reg128:$dst), (ins MEMxi:$ptr), "VTX_READ_eg $dst, $ptr", |
6231 |
++ [(set R600_Reg128:$dst, (CONST_ADDRESS ADDRGA_VAR_OFFSET:$ptr))]>, |
6232 |
++ VTX_WORD1_GPR, VTX_WORD0 { |
6233 |
++ |
6234 |
++ let VC_INST = 0; |
6235 |
++ let FETCH_TYPE = 2; |
6236 |
++ let FETCH_WHOLE_QUAD = 0; |
6237 |
++ let BUFFER_ID = 0; |
6238 |
++ let SRC_REL = 0; |
6239 |
++ let SRC_SEL_X = 0; |
6240 |
++ let DST_REL = 0; |
6241 |
++ let USE_CONST_FIELDS = 0; |
6242 |
++ let NUM_FORMAT_ALL = 2; |
6243 |
++ let FORMAT_COMP_ALL = 1; |
6244 |
++ let SRF_MODE_ALL = 1; |
6245 |
++ let MEGA_FETCH_COUNT = 16; |
6246 |
++ let DST_SEL_X = 0; |
6247 |
++ let DST_SEL_Y = 1; |
6248 |
++ let DST_SEL_Z = 2; |
6249 |
++ let DST_SEL_W = 3; |
6250 |
++ let DATA_FORMAT = 35; |
6251 |
++ |
6252 |
++ let Inst{31-0} = Word0; |
6253 |
++ let Inst{63-32} = Word1; |
6254 |
++ |
6255 |
++// LLVM can only encode 64-bit instructions, so these fields are manually |
6256 |
++// encoded in R600CodeEmitter |
6257 |
++// |
6258 |
++// bits<16> OFFSET; |
6259 |
++// bits<2> ENDIAN_SWAP = 0; |
6260 |
++// bits<1> CONST_BUF_NO_STRIDE = 0; |
6261 |
++// bits<1> MEGA_FETCH = 0; |
6262 |
++// bits<1> ALT_CONST = 0; |
6263 |
++// bits<2> BUFFER_INDEX_MODE = 0; |
6264 |
++ |
6265 |
++ |
6266 |
++ |
6267 |
++// VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding |
6268 |
++// is done in R600CodeEmitter |
6269 |
++// |
6270 |
++// Inst{79-64} = OFFSET; |
6271 |
++// Inst{81-80} = ENDIAN_SWAP; |
6272 |
++// Inst{82} = CONST_BUF_NO_STRIDE; |
6273 |
++// Inst{83} = MEGA_FETCH; |
6274 |
++// Inst{84} = ALT_CONST; |
6275 |
++// Inst{86-85} = BUFFER_INDEX_MODE; |
6276 |
++// Inst{95-86} = 0; Reserved |
6277 |
++ |
6278 |
++// VTX_WORD3 (Padding) |
6279 |
++// |
6280 |
++// Inst{127-96} = 0; |
6281 |
++} |
6282 |
++ |
6283 |
++def TEX_VTX_TEXBUF: |
6284 |
++ InstR600ISA <(outs R600_Reg128:$dst), (ins MEMxi:$ptr, i32imm:$BUFFER_ID), "TEX_VTX_EXPLICIT_READ $dst, $ptr", |
6285 |
++ [(set R600_Reg128:$dst, (int_R600_load_texbuf ADDRGA_VAR_OFFSET:$ptr, imm:$BUFFER_ID))]>, |
6286 |
++VTX_WORD1_GPR, VTX_WORD0 { |
6287 |
++ |
6288 |
++let VC_INST = 0; |
6289 |
++let FETCH_TYPE = 2; |
6290 |
++let FETCH_WHOLE_QUAD = 0; |
6291 |
++let SRC_REL = 0; |
6292 |
++let SRC_SEL_X = 0; |
6293 |
++let DST_REL = 0; |
6294 |
++let USE_CONST_FIELDS = 1; |
6295 |
++let NUM_FORMAT_ALL = 0; |
6296 |
++let FORMAT_COMP_ALL = 0; |
6297 |
++let SRF_MODE_ALL = 1; |
6298 |
++let MEGA_FETCH_COUNT = 16; |
6299 |
++let DST_SEL_X = 0; |
6300 |
++let DST_SEL_Y = 1; |
6301 |
++let DST_SEL_Z = 2; |
6302 |
++let DST_SEL_W = 3; |
6303 |
++let DATA_FORMAT = 0; |
6304 |
++ |
6305 |
++let Inst{31-0} = Word0; |
6306 |
++let Inst{63-32} = Word1; |
6307 |
++ |
6308 |
++// LLVM can only encode 64-bit instructions, so these fields are manually |
6309 |
++// encoded in R600CodeEmitter |
6310 |
++// |
6311 |
++// bits<16> OFFSET; |
6312 |
++// bits<2> ENDIAN_SWAP = 0; |
6313 |
++// bits<1> CONST_BUF_NO_STRIDE = 0; |
6314 |
++// bits<1> MEGA_FETCH = 0; |
6315 |
++// bits<1> ALT_CONST = 0; |
6316 |
++// bits<2> BUFFER_INDEX_MODE = 0; |
6317 |
++ |
6318 |
++ |
6319 |
++ |
6320 |
++// VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding |
6321 |
++// is done in R600CodeEmitter |
6322 |
++// |
6323 |
++// Inst{79-64} = OFFSET; |
6324 |
++// Inst{81-80} = ENDIAN_SWAP; |
6325 |
++// Inst{82} = CONST_BUF_NO_STRIDE; |
6326 |
++// Inst{83} = MEGA_FETCH; |
6327 |
++// Inst{84} = ALT_CONST; |
6328 |
++// Inst{86-85} = BUFFER_INDEX_MODE; |
6329 |
++// Inst{95-86} = 0; Reserved |
6330 |
++ |
6331 |
++// VTX_WORD3 (Padding) |
6332 |
++// |
6333 |
++// Inst{127-96} = 0; |
6334 |
++} |
6335 |
++ |
6336 |
++ |
6337 |
++ |
6338 |
++//===--------------------------------------------------------------------===// |
6339 |
++// Instructions support |
6340 |
++//===--------------------------------------------------------------------===// |
6341 |
++//===---------------------------------------------------------------------===// |
6342 |
++// Custom Inserter for Branches and returns, this eventually will be a |
6343 |
++// seperate pass |
6344 |
++//===---------------------------------------------------------------------===// |
6345 |
++let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1 in { |
6346 |
++ def BRANCH : ILFormat<(outs), (ins brtarget:$target), |
6347 |
++ "; Pseudo unconditional branch instruction", |
6348 |
++ [(br bb:$target)]>; |
6349 |
++ defm BRANCH_COND : BranchConditional<IL_brcond>; |
6350 |
++} |
6351 |
++ |
6352 |
++//===---------------------------------------------------------------------===// |
6353 |
++// Flow and Program control Instructions |
6354 |
+//===---------------------------------------------------------------------===// |
6355 |
+let isTerminator=1 in { |
6356 |
+ def SWITCH : ILFormat< (outs), (ins GPRI32:$src), |
6357 |
@@ -17045,6 +18080,18 @@ index 0000000..64bab18 |
6358 |
+ (SGE R600_Reg32:$src1, R600_Reg32:$src0) |
6359 |
+>; |
6360 |
+ |
6361 |
++// SETGT_DX10 reverse args |
6362 |
++def : Pat < |
6363 |
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, COND_LT), |
6364 |
++ (SETGT_DX10 R600_Reg32:$src1, R600_Reg32:$src0) |
6365 |
++>; |
6366 |
++ |
6367 |
++// SETGE_DX10 reverse args |
6368 |
++def : Pat < |
6369 |
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, COND_LE), |
6370 |
++ (SETGE_DX10 R600_Reg32:$src1, R600_Reg32:$src0) |
6371 |
++>; |
6372 |
++ |
6373 |
+// SETGT_INT reverse args |
6374 |
+def : Pat < |
6375 |
+ (selectcc (i32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETLT), |
6376 |
@@ -17083,31 +18130,43 @@ index 0000000..64bab18 |
6377 |
+ (SETE R600_Reg32:$src0, R600_Reg32:$src1) |
6378 |
+>; |
6379 |
+ |
6380 |
++//SETE_DX10 - 'true if ordered' |
6381 |
++def : Pat < |
6382 |
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETO), |
6383 |
++ (SETE_DX10 R600_Reg32:$src0, R600_Reg32:$src1) |
6384 |
++>; |
6385 |
++ |
6386 |
+//SNE - 'true if unordered' |
6387 |
+def : Pat < |
6388 |
+ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, FP_ONE, FP_ZERO, SETUO), |
6389 |
+ (SNE R600_Reg32:$src0, R600_Reg32:$src1) |
6390 |
+>; |
6391 |
+ |
6392 |
-+def : Extract_Element <f32, v4f32, R600_Reg128, 0, sel_x>; |
6393 |
-+def : Extract_Element <f32, v4f32, R600_Reg128, 1, sel_y>; |
6394 |
-+def : Extract_Element <f32, v4f32, R600_Reg128, 2, sel_z>; |
6395 |
-+def : Extract_Element <f32, v4f32, R600_Reg128, 3, sel_w>; |
6396 |
++//SETNE_DX10 - 'true if ordered' |
6397 |
++def : Pat < |
6398 |
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETUO), |
6399 |
++ (SETNE_DX10 R600_Reg32:$src0, R600_Reg32:$src1) |
6400 |
++>; |
6401 |
+ |
6402 |
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 0, sel_x>; |
6403 |
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 1, sel_y>; |
6404 |
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 2, sel_z>; |
6405 |
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 3, sel_w>; |
6406 |
++def : Extract_Element <f32, v4f32, R600_Reg128, 0, sub0>; |
6407 |
++def : Extract_Element <f32, v4f32, R600_Reg128, 1, sub1>; |
6408 |
++def : Extract_Element <f32, v4f32, R600_Reg128, 2, sub2>; |
6409 |
++def : Extract_Element <f32, v4f32, R600_Reg128, 3, sub3>; |
6410 |
+ |
6411 |
-+def : Extract_Element <i32, v4i32, R600_Reg128, 0, sel_x>; |
6412 |
-+def : Extract_Element <i32, v4i32, R600_Reg128, 1, sel_y>; |
6413 |
-+def : Extract_Element <i32, v4i32, R600_Reg128, 2, sel_z>; |
6414 |
-+def : Extract_Element <i32, v4i32, R600_Reg128, 3, sel_w>; |
6415 |
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 0, sub0>; |
6416 |
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 1, sub1>; |
6417 |
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 2, sub2>; |
6418 |
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 3, sub3>; |
6419 |
+ |
6420 |
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 0, sel_x>; |
6421 |
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 1, sel_y>; |
6422 |
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 2, sel_z>; |
6423 |
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 3, sel_w>; |
6424 |
++def : Extract_Element <i32, v4i32, R600_Reg128, 0, sub0>; |
6425 |
++def : Extract_Element <i32, v4i32, R600_Reg128, 1, sub1>; |
6426 |
++def : Extract_Element <i32, v4i32, R600_Reg128, 2, sub2>; |
6427 |
++def : Extract_Element <i32, v4i32, R600_Reg128, 3, sub3>; |
6428 |
++ |
6429 |
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 0, sub0>; |
6430 |
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 1, sub1>; |
6431 |
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 2, sub2>; |
6432 |
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 3, sub3>; |
6433 |
+ |
6434 |
+def : Vector_Build <v4f32, R600_Reg128, f32, R600_Reg32>; |
6435 |
+def : Vector_Build <v4i32, R600_Reg128, i32, R600_Reg32>; |
6436 |
@@ -17125,10 +18184,10 @@ index 0000000..64bab18 |
6437 |
+} // End isR600toCayman Predicate |
6438 |
diff --git a/lib/Target/R600/R600Intrinsics.td b/lib/Target/R600/R600Intrinsics.td |
6439 |
new file mode 100644 |
6440 |
-index 0000000..3825bc4 |
6441 |
+index 0000000..6046f0d |
6442 |
--- /dev/null |
6443 |
+++ b/lib/Target/R600/R600Intrinsics.td |
6444 |
-@@ -0,0 +1,32 @@ |
6445 |
+@@ -0,0 +1,57 @@ |
6446 |
+//===-- R600Intrinsics.td - R600 Instrinsic defs -------*- tablegen -*-----===// |
6447 |
+// |
6448 |
+// The LLVM Compiler Infrastructure |
6449 |
@@ -17143,30 +18202,283 @@ index 0000000..3825bc4 |
6450 |
+//===----------------------------------------------------------------------===// |
6451 |
+ |
6452 |
+let TargetPrefix = "R600", isTarget = 1 in { |
6453 |
-+ def int_R600_load_input : Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem]>; |
6454 |
-+ def int_R600_load_input_perspective : |
6455 |
-+ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>; |
6456 |
-+ def int_R600_load_input_constant : |
6457 |
-+ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>; |
6458 |
-+ def int_R600_load_input_linear : |
6459 |
-+ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>; |
6460 |
++ def int_R600_load_input : |
6461 |
++ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem]>; |
6462 |
++ def int_R600_interp_input : |
6463 |
++ Intrinsic<[llvm_float_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>; |
6464 |
++ def int_R600_load_texbuf : |
6465 |
++ Intrinsic<[llvm_v4f32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>; |
6466 |
++ def int_R600_store_swizzle : |
6467 |
++ Intrinsic<[], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty], []>; |
6468 |
++ |
6469 |
+ def int_R600_store_stream_output : |
6470 |
-+ Intrinsic<[], [llvm_float_ty, llvm_i32_ty, llvm_i32_ty], []>; |
6471 |
++ Intrinsic<[], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], []>; |
6472 |
+ def int_R600_store_pixel_color : |
6473 |
+ Intrinsic<[], [llvm_float_ty, llvm_i32_ty], []>; |
6474 |
+ def int_R600_store_pixel_depth : |
6475 |
+ Intrinsic<[], [llvm_float_ty], []>; |
6476 |
+ def int_R600_store_pixel_stencil : |
6477 |
+ Intrinsic<[], [llvm_float_ty], []>; |
6478 |
-+ def int_R600_store_pixel_dummy : |
6479 |
-+ Intrinsic<[], [], []>; |
6480 |
++ def int_R600_store_dummy : |
6481 |
++ Intrinsic<[], [llvm_i32_ty], []>; |
6482 |
++} |
6483 |
++let TargetPrefix = "r600", isTarget = 1 in { |
6484 |
++ |
6485 |
++class R600ReadPreloadRegisterIntrinsic<string name> |
6486 |
++ : Intrinsic<[llvm_i32_ty], [], [IntrNoMem]>, |
6487 |
++ GCCBuiltin<name>; |
6488 |
++ |
6489 |
++multiclass R600ReadPreloadRegisterIntrinsic_xyz<string prefix> { |
6490 |
++ def _x : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_x")>; |
6491 |
++ def _y : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_y")>; |
6492 |
++ def _z : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_z")>; |
6493 |
++} |
6494 |
++ |
6495 |
++defm int_r600_read_global_size : R600ReadPreloadRegisterIntrinsic_xyz < |
6496 |
++ "__builtin_r600_read_global_size">; |
6497 |
++defm int_r600_read_local_size : R600ReadPreloadRegisterIntrinsic_xyz < |
6498 |
++ "__builtin_r600_read_local_size">; |
6499 |
++defm int_r600_read_ngroups : R600ReadPreloadRegisterIntrinsic_xyz < |
6500 |
++ "__builtin_r600_read_ngroups">; |
6501 |
++defm int_r600_read_tgid : R600ReadPreloadRegisterIntrinsic_xyz < |
6502 |
++ "__builtin_r600_read_tgid">; |
6503 |
++defm int_r600_read_tidig : R600ReadPreloadRegisterIntrinsic_xyz < |
6504 |
++ "__builtin_r600_read_tidig">; |
6505 |
++} |
6506 |
+diff --git a/lib/Target/R600/R600LowerConstCopy.cpp b/lib/Target/R600/R600LowerConstCopy.cpp |
6507 |
+new file mode 100644 |
6508 |
+index 0000000..c8c27a8 |
6509 |
+--- /dev/null |
6510 |
++++ b/lib/Target/R600/R600LowerConstCopy.cpp |
6511 |
+@@ -0,0 +1,222 @@ |
6512 |
++//===-- R600LowerConstCopy.cpp - Propagate ConstCopy / lower them to MOV---===// |
6513 |
++// |
6514 |
++// The LLVM Compiler Infrastructure |
6515 |
++// |
6516 |
++// This file is distributed under the University of Illinois Open Source |
6517 |
++// License. See LICENSE.TXT for details. |
6518 |
++// |
6519 |
++//===----------------------------------------------------------------------===// |
6520 |
++// |
6521 |
++/// \file |
6522 |
++/// This pass is intended to handle remaining ConstCopy pseudo MachineInstr. |
6523 |
++/// ISel will fold each Const Buffer read inside scalar ALU. However it cannot |
6524 |
++/// fold them inside vector instruction, like DOT4 or Cube ; ISel emits |
6525 |
++/// ConstCopy instead. This pass (executed after ExpandingSpecialInstr) will try |
6526 |
++/// to fold them if possible or replace them by MOV otherwise. |
6527 |
++// |
6528 |
++//===----------------------------------------------------------------------===// |
6529 |
++ |
6530 |
++#include "AMDGPU.h" |
6531 |
++#include "llvm/CodeGen/MachineFunction.h" |
6532 |
++#include "llvm/CodeGen/MachineFunctionPass.h" |
6533 |
++#include "R600InstrInfo.h" |
6534 |
++#include "llvm/GlobalValue.h" |
6535 |
++#include "llvm/CodeGen/MachineInstrBuilder.h" |
6536 |
++ |
6537 |
++namespace llvm { |
6538 |
++ |
6539 |
++class R600LowerConstCopy : public MachineFunctionPass { |
6540 |
++private: |
6541 |
++ static char ID; |
6542 |
++ const R600InstrInfo *TII; |
6543 |
++ |
6544 |
++ struct ConstPairs { |
6545 |
++ unsigned XYPair; |
6546 |
++ unsigned ZWPair; |
6547 |
++ }; |
6548 |
++ |
6549 |
++ bool canFoldInBundle(ConstPairs &UsedConst, unsigned ReadConst) const; |
6550 |
++public: |
6551 |
++ R600LowerConstCopy(TargetMachine &tm); |
6552 |
++ virtual bool runOnMachineFunction(MachineFunction &MF); |
6553 |
++ |
6554 |
++ const char *getPassName() const { return "R600 Eliminate Symbolic Operand"; } |
6555 |
++}; |
6556 |
++ |
6557 |
++char R600LowerConstCopy::ID = 0; |
6558 |
++ |
6559 |
++R600LowerConstCopy::R600LowerConstCopy(TargetMachine &tm) : |
6560 |
++ MachineFunctionPass(ID), |
6561 |
++ TII (static_cast<const R600InstrInfo *>(tm.getInstrInfo())) |
6562 |
++{ |
6563 |
++} |
6564 |
++ |
6565 |
++bool R600LowerConstCopy::canFoldInBundle(ConstPairs &UsedConst, |
6566 |
++ unsigned ReadConst) const { |
6567 |
++ unsigned ReadConstChan = ReadConst & 3; |
6568 |
++ unsigned ReadConstIndex = ReadConst & (~3); |
6569 |
++ if (ReadConstChan < 2) { |
6570 |
++ if (!UsedConst.XYPair) { |
6571 |
++ UsedConst.XYPair = ReadConstIndex; |
6572 |
++ } |
6573 |
++ return UsedConst.XYPair == ReadConstIndex; |
6574 |
++ } else { |
6575 |
++ if (!UsedConst.ZWPair) { |
6576 |
++ UsedConst.ZWPair = ReadConstIndex; |
6577 |
++ } |
6578 |
++ return UsedConst.ZWPair == ReadConstIndex; |
6579 |
++ } |
6580 |
++} |
6581 |
++ |
6582 |
++static bool isControlFlow(const MachineInstr &MI) { |
6583 |
++ return (MI.getOpcode() == AMDGPU::IF_PREDICATE_SET) || |
6584 |
++ (MI.getOpcode() == AMDGPU::ENDIF) || |
6585 |
++ (MI.getOpcode() == AMDGPU::ELSE) || |
6586 |
++ (MI.getOpcode() == AMDGPU::WHILELOOP) || |
6587 |
++ (MI.getOpcode() == AMDGPU::BREAK); |
6588 |
++} |
6589 |
++ |
6590 |
++bool R600LowerConstCopy::runOnMachineFunction(MachineFunction &MF) { |
6591 |
++ |
6592 |
++ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end(); |
6593 |
++ BB != BB_E; ++BB) { |
6594 |
++ MachineBasicBlock &MBB = *BB; |
6595 |
++ DenseMap<unsigned, MachineInstr *> RegToConstIndex; |
6596 |
++ for (MachineBasicBlock::instr_iterator I = MBB.instr_begin(), |
6597 |
++ E = MBB.instr_end(); I != E;) { |
6598 |
++ |
6599 |
++ if (I->getOpcode() == AMDGPU::CONST_COPY) { |
6600 |
++ MachineInstr &MI = *I; |
6601 |
++ I = llvm::next(I); |
6602 |
++ unsigned DstReg = MI.getOperand(0).getReg(); |
6603 |
++ DenseMap<unsigned, MachineInstr *>::iterator SrcMI = |
6604 |
++ RegToConstIndex.find(DstReg); |
6605 |
++ if (SrcMI != RegToConstIndex.end()) { |
6606 |
++ SrcMI->second->eraseFromParent(); |
6607 |
++ RegToConstIndex.erase(SrcMI); |
6608 |
++ } |
6609 |
++ MachineInstr *NewMI = |
6610 |
++ TII->buildDefaultInstruction(MBB, &MI, AMDGPU::MOV, |
6611 |
++ MI.getOperand(0).getReg(), AMDGPU::ALU_CONST); |
6612 |
++ TII->setImmOperand(NewMI, R600Operands::SRC0_SEL, |
6613 |
++ MI.getOperand(1).getImm()); |
6614 |
++ RegToConstIndex[DstReg] = NewMI; |
6615 |
++ MI.eraseFromParent(); |
6616 |
++ continue; |
6617 |
++ } |
6618 |
++ |
6619 |
++ std::vector<unsigned> Defs; |
6620 |
++ // We consider all Instructions as bundled because algorithm that handle |
6621 |
++ // const read port limitations inside an IG is still valid with single |
6622 |
++ // instructions. |
6623 |
++ std::vector<MachineInstr *> Bundle; |
6624 |
++ |
6625 |
++ if (I->isBundle()) { |
6626 |
++ unsigned BundleSize = I->getBundleSize(); |
6627 |
++ for (unsigned i = 0; i < BundleSize; i++) { |
6628 |
++ I = llvm::next(I); |
6629 |
++ Bundle.push_back(I); |
6630 |
++ } |
6631 |
++ } else if (TII->isALUInstr(I->getOpcode())){ |
6632 |
++ Bundle.push_back(I); |
6633 |
++ } else if (isControlFlow(*I)) { |
6634 |
++ RegToConstIndex.clear(); |
6635 |
++ I = llvm::next(I); |
6636 |
++ continue; |
6637 |
++ } else { |
6638 |
++ MachineInstr &MI = *I; |
6639 |
++ for (MachineInstr::mop_iterator MOp = MI.operands_begin(), |
6640 |
++ MOpE = MI.operands_end(); MOp != MOpE; ++MOp) { |
6641 |
++ MachineOperand &MO = *MOp; |
6642 |
++ if (!MO.isReg()) |
6643 |
++ continue; |
6644 |
++ if (MO.isDef()) { |
6645 |
++ Defs.push_back(MO.getReg()); |
6646 |
++ } else { |
6647 |
++ // Either a TEX or an Export inst, prevent from erasing def of used |
6648 |
++ // operand |
6649 |
++ RegToConstIndex.erase(MO.getReg()); |
6650 |
++ for (MCSubRegIterator SR(MO.getReg(), &TII->getRegisterInfo()); |
6651 |
++ SR.isValid(); ++SR) { |
6652 |
++ RegToConstIndex.erase(*SR); |
6653 |
++ } |
6654 |
++ } |
6655 |
++ } |
6656 |
++ } |
6657 |
++ |
6658 |
++ |
6659 |
++ R600Operands::Ops OpTable[3][2] = { |
6660 |
++ {R600Operands::SRC0, R600Operands::SRC0_SEL}, |
6661 |
++ {R600Operands::SRC1, R600Operands::SRC1_SEL}, |
6662 |
++ {R600Operands::SRC2, R600Operands::SRC2_SEL}, |
6663 |
++ }; |
6664 |
++ |
6665 |
++ for(std::vector<MachineInstr *>::iterator It = Bundle.begin(), |
6666 |
++ ItE = Bundle.end(); It != ItE; ++It) { |
6667 |
++ MachineInstr *MI = *It; |
6668 |
++ if (TII->isPredicated(MI)) { |
6669 |
++ // We don't want to erase previous assignment |
6670 |
++ RegToConstIndex.erase(MI->getOperand(0).getReg()); |
6671 |
++ } else { |
6672 |
++ int WriteIDX = TII->getOperandIdx(MI->getOpcode(), R600Operands::WRITE); |
6673 |
++ if (WriteIDX < 0 || MI->getOperand(WriteIDX).getImm()) |
6674 |
++ Defs.push_back(MI->getOperand(0).getReg()); |
6675 |
++ } |
6676 |
++ } |
6677 |
++ |
6678 |
++ ConstPairs CP = {0,0}; |
6679 |
++ for (unsigned SrcOp = 0; SrcOp < 3; SrcOp++) { |
6680 |
++ for(std::vector<MachineInstr *>::iterator It = Bundle.begin(), |
6681 |
++ ItE = Bundle.end(); It != ItE; ++It) { |
6682 |
++ MachineInstr *MI = *It; |
6683 |
++ int SrcIdx = TII->getOperandIdx(MI->getOpcode(), OpTable[SrcOp][0]); |
6684 |
++ if (SrcIdx < 0) |
6685 |
++ continue; |
6686 |
++ MachineOperand &MO = MI->getOperand(SrcIdx); |
6687 |
++ DenseMap<unsigned, MachineInstr *>::iterator SrcMI = |
6688 |
++ RegToConstIndex.find(MO.getReg()); |
6689 |
++ if (SrcMI != RegToConstIndex.end()) { |
6690 |
++ MachineInstr *CstMov = SrcMI->second; |
6691 |
++ int ConstMovSel = |
6692 |
++ TII->getOperandIdx(CstMov->getOpcode(), R600Operands::SRC0_SEL); |
6693 |
++ unsigned ConstIndex = CstMov->getOperand(ConstMovSel).getImm(); |
6694 |
++ if (MI->isInsideBundle() && canFoldInBundle(CP, ConstIndex)) { |
6695 |
++ TII->setImmOperand(MI, OpTable[SrcOp][1], ConstIndex); |
6696 |
++ MI->getOperand(SrcIdx).setReg(AMDGPU::ALU_CONST); |
6697 |
++ } else { |
6698 |
++ RegToConstIndex.erase(SrcMI); |
6699 |
++ } |
6700 |
++ } |
6701 |
++ } |
6702 |
++ } |
6703 |
++ |
6704 |
++ for (std::vector<unsigned>::iterator It = Defs.begin(), ItE = Defs.end(); |
6705 |
++ It != ItE; ++It) { |
6706 |
++ DenseMap<unsigned, MachineInstr *>::iterator SrcMI = |
6707 |
++ RegToConstIndex.find(*It); |
6708 |
++ if (SrcMI != RegToConstIndex.end()) { |
6709 |
++ SrcMI->second->eraseFromParent(); |
6710 |
++ RegToConstIndex.erase(SrcMI); |
6711 |
++ } |
6712 |
++ } |
6713 |
++ I = llvm::next(I); |
6714 |
++ } |
6715 |
++ |
6716 |
++ if (MBB.succ_empty()) { |
6717 |
++ for (DenseMap<unsigned, MachineInstr *>::iterator |
6718 |
++ DI = RegToConstIndex.begin(), DE = RegToConstIndex.end(); |
6719 |
++ DI != DE; ++DI) { |
6720 |
++ DI->second->eraseFromParent(); |
6721 |
++ } |
6722 |
++ } |
6723 |
++ } |
6724 |
++ return false; |
6725 |
++} |
6726 |
++ |
6727 |
++FunctionPass *createR600LowerConstCopy(TargetMachine &tm) { |
6728 |
++ return new R600LowerConstCopy(tm); |
6729 |
++} |
6730 |
++ |
6731 |
+} |
6732 |
++ |
6733 |
++ |
6734 |
diff --git a/lib/Target/R600/R600MachineFunctionInfo.cpp b/lib/Target/R600/R600MachineFunctionInfo.cpp |
6735 |
new file mode 100644 |
6736 |
-index 0000000..4eb5efa |
6737 |
+index 0000000..40aec83 |
6738 |
--- /dev/null |
6739 |
+++ b/lib/Target/R600/R600MachineFunctionInfo.cpp |
6740 |
-@@ -0,0 +1,34 @@ |
6741 |
+@@ -0,0 +1,18 @@ |
6742 |
+//===-- R600MachineFunctionInfo.cpp - R600 Machine Function Info-*- C++ -*-===// |
6743 |
+// |
6744 |
+// The LLVM Compiler Infrastructure |
6745 |
@@ -17182,31 +18494,15 @@ index 0000000..4eb5efa |
6746 |
+using namespace llvm; |
6747 |
+ |
6748 |
+R600MachineFunctionInfo::R600MachineFunctionInfo(const MachineFunction &MF) |
6749 |
-+ : MachineFunctionInfo(), |
6750 |
-+ HasLinearInterpolation(false), |
6751 |
-+ HasPerspectiveInterpolation(false) { |
6752 |
++ : MachineFunctionInfo() { |
6753 |
+ memset(Outputs, 0, sizeof(Outputs)); |
6754 |
-+ memset(StreamOutputs, 0, sizeof(StreamOutputs)); |
6755 |
+ } |
6756 |
-+ |
6757 |
-+unsigned R600MachineFunctionInfo::GetIJPerspectiveIndex() const { |
6758 |
-+ assert(HasPerspectiveInterpolation); |
6759 |
-+ return 0; |
6760 |
-+} |
6761 |
-+ |
6762 |
-+unsigned R600MachineFunctionInfo::GetIJLinearIndex() const { |
6763 |
-+ assert(HasLinearInterpolation); |
6764 |
-+ if (HasPerspectiveInterpolation) |
6765 |
-+ return 1; |
6766 |
-+ else |
6767 |
-+ return 0; |
6768 |
-+} |
6769 |
diff --git a/lib/Target/R600/R600MachineFunctionInfo.h b/lib/Target/R600/R600MachineFunctionInfo.h |
6770 |
new file mode 100644 |
6771 |
-index 0000000..e97fb5b |
6772 |
+index 0000000..41e4894 |
6773 |
--- /dev/null |
6774 |
+++ b/lib/Target/R600/R600MachineFunctionInfo.h |
6775 |
-@@ -0,0 +1,39 @@ |
6776 |
+@@ -0,0 +1,33 @@ |
6777 |
+//===-- R600MachineFunctionInfo.h - R600 Machine Function Info ----*- C++ -*-=// |
6778 |
+// |
6779 |
+// The LLVM Compiler Infrastructure |
6780 |
@@ -17222,6 +18518,7 @@ index 0000000..e97fb5b |
6781 |
+#ifndef R600MACHINEFUNCTIONINFO_H |
6782 |
+#define R600MACHINEFUNCTIONINFO_H |
6783 |
+ |
6784 |
++#include "llvm/ADT/BitVector.h" |
6785 |
+#include "llvm/CodeGen/MachineFunction.h" |
6786 |
+#include "llvm/CodeGen/SelectionDAG.h" |
6787 |
+#include <vector> |
6788 |
@@ -17232,15 +18529,8 @@ index 0000000..e97fb5b |
6789 |
+ |
6790 |
+public: |
6791 |
+ R600MachineFunctionInfo(const MachineFunction &MF); |
6792 |
-+ std::vector<unsigned> ReservedRegs; |
6793 |
++ std::vector<unsigned> IndirectRegs; |
6794 |
+ SDNode *Outputs[16]; |
6795 |
-+ SDNode *StreamOutputs[64][4]; |
6796 |
-+ bool HasLinearInterpolation; |
6797 |
-+ bool HasPerspectiveInterpolation; |
6798 |
-+ |
6799 |
-+ unsigned GetIJLinearIndex() const; |
6800 |
-+ unsigned GetIJPerspectiveIndex() const; |
6801 |
-+ |
6802 |
+}; |
6803 |
+ |
6804 |
+} // End llvm namespace |
6805 |
@@ -17248,10 +18538,10 @@ index 0000000..e97fb5b |
6806 |
+#endif //R600MACHINEFUNCTIONINFO_H |
6807 |
diff --git a/lib/Target/R600/R600RegisterInfo.cpp b/lib/Target/R600/R600RegisterInfo.cpp |
6808 |
new file mode 100644 |
6809 |
-index 0000000..a39f83d |
6810 |
+index 0000000..bbd7995 |
6811 |
--- /dev/null |
6812 |
+++ b/lib/Target/R600/R600RegisterInfo.cpp |
6813 |
-@@ -0,0 +1,89 @@ |
6814 |
+@@ -0,0 +1,99 @@ |
6815 |
+//===-- R600RegisterInfo.cpp - R600 Register Information ------------------===// |
6816 |
+// |
6817 |
+// The LLVM Compiler Infrastructure |
6818 |
@@ -17269,6 +18559,7 @@ index 0000000..a39f83d |
6819 |
+#include "R600RegisterInfo.h" |
6820 |
+#include "AMDGPUTargetMachine.h" |
6821 |
+#include "R600Defines.h" |
6822 |
++#include "R600InstrInfo.h" |
6823 |
+#include "R600MachineFunctionInfo.h" |
6824 |
+ |
6825 |
+using namespace llvm; |
6826 |
@@ -17282,7 +18573,6 @@ index 0000000..a39f83d |
6827 |
+ |
6828 |
+BitVector R600RegisterInfo::getReservedRegs(const MachineFunction &MF) const { |
6829 |
+ BitVector Reserved(getNumRegs()); |
6830 |
-+ const R600MachineFunctionInfo * MFI = MF.getInfo<R600MachineFunctionInfo>(); |
6831 |
+ |
6832 |
+ Reserved.set(AMDGPU::ZERO); |
6833 |
+ Reserved.set(AMDGPU::HALF); |
6834 |
@@ -17292,21 +18582,30 @@ index 0000000..a39f83d |
6835 |
+ Reserved.set(AMDGPU::NEG_ONE); |
6836 |
+ Reserved.set(AMDGPU::PV_X); |
6837 |
+ Reserved.set(AMDGPU::ALU_LITERAL_X); |
6838 |
++ Reserved.set(AMDGPU::ALU_CONST); |
6839 |
+ Reserved.set(AMDGPU::PREDICATE_BIT); |
6840 |
+ Reserved.set(AMDGPU::PRED_SEL_OFF); |
6841 |
+ Reserved.set(AMDGPU::PRED_SEL_ZERO); |
6842 |
+ Reserved.set(AMDGPU::PRED_SEL_ONE); |
6843 |
+ |
6844 |
-+ for (TargetRegisterClass::iterator I = AMDGPU::R600_CReg32RegClass.begin(), |
6845 |
-+ E = AMDGPU::R600_CReg32RegClass.end(); I != E; ++I) { |
6846 |
++ for (TargetRegisterClass::iterator I = AMDGPU::R600_AddrRegClass.begin(), |
6847 |
++ E = AMDGPU::R600_AddrRegClass.end(); I != E; ++I) { |
6848 |
+ Reserved.set(*I); |
6849 |
+ } |
6850 |
+ |
6851 |
-+ for (std::vector<unsigned>::const_iterator I = MFI->ReservedRegs.begin(), |
6852 |
-+ E = MFI->ReservedRegs.end(); I != E; ++I) { |
6853 |
++ for (TargetRegisterClass::iterator I = AMDGPU::TRegMemRegClass.begin(), |
6854 |
++ E = AMDGPU::TRegMemRegClass.end(); |
6855 |
++ I != E; ++I) { |
6856 |
+ Reserved.set(*I); |
6857 |
+ } |
6858 |
+ |
6859 |
++ const R600InstrInfo *RII = static_cast<const R600InstrInfo*>(&TII); |
6860 |
++ std::vector<unsigned> IndirectRegs = RII->getIndirectReservedRegs(MF); |
6861 |
++ for (std::vector<unsigned>::iterator I = IndirectRegs.begin(), |
6862 |
++ E = IndirectRegs.end(); |
6863 |
++ I != E; ++I) { |
6864 |
++ Reserved.set(*I); |
6865 |
++ } |
6866 |
+ return Reserved; |
6867 |
+} |
6868 |
+ |
6869 |
@@ -17335,12 +18634,13 @@ index 0000000..a39f83d |
6870 |
+unsigned R600RegisterInfo::getSubRegFromChannel(unsigned Channel) const { |
6871 |
+ switch (Channel) { |
6872 |
+ default: assert(!"Invalid channel index"); return 0; |
6873 |
-+ case 0: return AMDGPU::sel_x; |
6874 |
-+ case 1: return AMDGPU::sel_y; |
6875 |
-+ case 2: return AMDGPU::sel_z; |
6876 |
-+ case 3: return AMDGPU::sel_w; |
6877 |
++ case 0: return AMDGPU::sub0; |
6878 |
++ case 1: return AMDGPU::sub1; |
6879 |
++ case 2: return AMDGPU::sub2; |
6880 |
++ case 3: return AMDGPU::sub3; |
6881 |
+ } |
6882 |
+} |
6883 |
++ |
6884 |
diff --git a/lib/Target/R600/R600RegisterInfo.h b/lib/Target/R600/R600RegisterInfo.h |
6885 |
new file mode 100644 |
6886 |
index 0000000..c170ccb |
6887 |
@@ -17404,10 +18704,10 @@ index 0000000..c170ccb |
6888 |
+#endif // AMDIDSAREGISTERINFO_H_ |
6889 |
diff --git a/lib/Target/R600/R600RegisterInfo.td b/lib/Target/R600/R600RegisterInfo.td |
6890 |
new file mode 100644 |
6891 |
-index 0000000..d3d6d25 |
6892 |
+index 0000000..a7d847a |
6893 |
--- /dev/null |
6894 |
+++ b/lib/Target/R600/R600RegisterInfo.td |
6895 |
-@@ -0,0 +1,107 @@ |
6896 |
+@@ -0,0 +1,146 @@ |
6897 |
+ |
6898 |
+class R600Reg <string name, bits<16> encoding> : Register<name> { |
6899 |
+ let Namespace = "AMDGPU"; |
6900 |
@@ -17429,7 +18729,7 @@ index 0000000..d3d6d25 |
6901 |
+class R600Reg_128<string n, list<Register> subregs, bits<16> encoding> : |
6902 |
+ RegisterWithSubRegs<n, subregs> { |
6903 |
+ let Namespace = "AMDGPU"; |
6904 |
-+ let SubRegIndices = [sel_x, sel_y, sel_z, sel_w]; |
6905 |
++ let SubRegIndices = [sub0, sub1, sub2, sub3]; |
6906 |
+ let HWEncoding = encoding; |
6907 |
+} |
6908 |
+ |
6909 |
@@ -17438,9 +18738,11 @@ index 0000000..d3d6d25 |
6910 |
+ // 32-bit Temporary Registers |
6911 |
+ def T#Index#_#Chan : R600RegWithChan <"T"#Index#"."#Chan, Index, Chan>; |
6912 |
+ |
6913 |
-+ // 32-bit Constant Registers (There are more than 128, this the number |
6914 |
-+ // that is currently supported. |
6915 |
-+ def C#Index#_#Chan : R600RegWithChan <"C"#Index#"."#Chan, Index, Chan>; |
6916 |
++ // Indirect addressing offset registers |
6917 |
++ def Addr#Index#_#Chan : R600RegWithChan <"T("#Index#" + AR.x)."#Chan, |
6918 |
++ Index, Chan>; |
6919 |
++ def TRegMem#Index#_#Chan : R600RegWithChan <"T"#Index#"."#Chan, Index, |
6920 |
++ Chan>; |
6921 |
+ } |
6922 |
+ // 128-bit Temporary Registers |
6923 |
+ def T#Index#_XYZW : R600Reg_128 <"T"#Index#".XYZW", |
6924 |
@@ -17471,19 +18773,25 @@ index 0000000..d3d6d25 |
6925 |
+def PRED_SEL_OFF: R600Reg<"Pred_sel_off", 0>; |
6926 |
+def PRED_SEL_ZERO : R600Reg<"Pred_sel_zero", 2>; |
6927 |
+def PRED_SEL_ONE : R600Reg<"Pred_sel_one", 3>; |
6928 |
++def AR_X : R600Reg<"AR.x", 0>; |
6929 |
+ |
6930 |
+def R600_ArrayBase : RegisterClass <"AMDGPU", [f32, i32], 32, |
6931 |
+ (add (sequence "ArrayBase%u", 448, 464))>; |
6932 |
++// special registers for ALU src operands |
6933 |
++// const buffer reference, SRCx_SEL contains index |
6934 |
++def ALU_CONST : R600Reg<"CBuf", 0>; |
6935 |
++// interpolation param reference, SRCx_SEL contains index |
6936 |
++def ALU_PARAM : R600Reg<"Param", 0>; |
6937 |
++ |
6938 |
++let isAllocatable = 0 in { |
6939 |
++ |
6940 |
++// XXX: Only use the X channel, until we support wider stack widths |
6941 |
++def R600_Addr : RegisterClass <"AMDGPU", [i32], 127, (add (sequence "Addr%u_X", 0, 127))>; |
6942 |
+ |
6943 |
-+def R600_CReg32 : RegisterClass <"AMDGPU", [f32, i32], 32, |
6944 |
-+ (add (interleave |
6945 |
-+ (interleave (sequence "C%u_X", 0, 127), |
6946 |
-+ (sequence "C%u_Z", 0, 127)), |
6947 |
-+ (interleave (sequence "C%u_Y", 0, 127), |
6948 |
-+ (sequence "C%u_W", 0, 127))))>; |
6949 |
++} // End isAllocatable = 0 |
6950 |
+ |
6951 |
+def R600_TReg32_X : RegisterClass <"AMDGPU", [f32, i32], 32, |
6952 |
-+ (add (sequence "T%u_X", 0, 127))>; |
6953 |
++ (add (sequence "T%u_X", 0, 127), AR_X)>; |
6954 |
+ |
6955 |
+def R600_TReg32_Y : RegisterClass <"AMDGPU", [f32, i32], 32, |
6956 |
+ (add (sequence "T%u_Y", 0, 127))>; |
6957 |
@@ -17495,15 +18803,16 @@ index 0000000..d3d6d25 |
6958 |
+ (add (sequence "T%u_W", 0, 127))>; |
6959 |
+ |
6960 |
+def R600_TReg32 : RegisterClass <"AMDGPU", [f32, i32], 32, |
6961 |
-+ (add (interleave |
6962 |
-+ (interleave R600_TReg32_X, R600_TReg32_Z), |
6963 |
-+ (interleave R600_TReg32_Y, R600_TReg32_W)))>; |
6964 |
++ (interleave R600_TReg32_X, R600_TReg32_Y, |
6965 |
++ R600_TReg32_Z, R600_TReg32_W)>; |
6966 |
+ |
6967 |
+def R600_Reg32 : RegisterClass <"AMDGPU", [f32, i32], 32, (add |
6968 |
+ R600_TReg32, |
6969 |
-+ R600_CReg32, |
6970 |
+ R600_ArrayBase, |
6971 |
-+ ZERO, HALF, ONE, ONE_INT, PV_X, ALU_LITERAL_X, NEG_ONE, NEG_HALF)>; |
6972 |
++ R600_Addr, |
6973 |
++ ZERO, HALF, ONE, ONE_INT, PV_X, ALU_LITERAL_X, NEG_ONE, NEG_HALF, |
6974 |
++ ALU_CONST, ALU_PARAM |
6975 |
++ )>; |
6976 |
+ |
6977 |
+def R600_Predicate : RegisterClass <"AMDGPU", [i32], 32, (add |
6978 |
+ PRED_SEL_OFF, PRED_SEL_ZERO, PRED_SEL_ONE)>; |
6979 |
@@ -17515,6 +18824,36 @@ index 0000000..d3d6d25 |
6980 |
+ (add (sequence "T%u_XYZW", 0, 127))> { |
6981 |
+ let CopyCost = -1; |
6982 |
+} |
6983 |
++ |
6984 |
++//===----------------------------------------------------------------------===// |
6985 |
++// Register classes for indirect addressing |
6986 |
++//===----------------------------------------------------------------------===// |
6987 |
++ |
6988 |
++// Super register for all the Indirect Registers. This register class is used |
6989 |
++// by the REG_SEQUENCE instruction to specify the registers to use for direct |
6990 |
++// reads / writes which may be written / read by an indirect address. |
6991 |
++class IndirectSuper<string n, list<Register> subregs> : |
6992 |
++ RegisterWithSubRegs<n, subregs> { |
6993 |
++ let Namespace = "AMDGPU"; |
6994 |
++ let SubRegIndices = |
6995 |
++ [sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7, |
6996 |
++ sub8, sub9, sub10, sub11, sub12, sub13, sub14, sub15]; |
6997 |
++} |
6998 |
++ |
6999 |
++def IndirectSuperReg : IndirectSuper<"Indirect", |
7000 |
++ [TRegMem0_X, TRegMem1_X, TRegMem2_X, TRegMem3_X, TRegMem4_X, TRegMem5_X, |
7001 |
++ TRegMem6_X, TRegMem7_X, TRegMem8_X, TRegMem9_X, TRegMem10_X, TRegMem11_X, |
7002 |
++ TRegMem12_X, TRegMem13_X, TRegMem14_X, TRegMem15_X] |
7003 |
++>; |
7004 |
++ |
7005 |
++def IndirectReg : RegisterClass<"AMDGPU", [f32, i32], 32, (add IndirectSuperReg)>; |
7006 |
++ |
7007 |
++// This register class defines the registers that are the storage units for |
7008 |
++// the "Indirect Addressing" pseudo memory space. |
7009 |
++// XXX: Only use the X channel, until we support wider stack widths |
7010 |
++def TRegMem : RegisterClass<"AMDGPU", [f32, i32], 32, |
7011 |
++ (add (sequence "TRegMem%u_X", 0, 16)) |
7012 |
++>; |
7013 |
diff --git a/lib/Target/R600/R600Schedule.td b/lib/Target/R600/R600Schedule.td |
7014 |
new file mode 100644 |
7015 |
index 0000000..7ede181 |
7016 |
@@ -18053,10 +19392,10 @@ index 0000000..832e44d |
7017 |
+} |
7018 |
diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp |
7019 |
new file mode 100644 |
7020 |
-index 0000000..cd6e0e9 |
7021 |
+index 0000000..694c045 |
7022 |
--- /dev/null |
7023 |
+++ b/lib/Target/R600/SIISelLowering.cpp |
7024 |
-@@ -0,0 +1,512 @@ |
7025 |
+@@ -0,0 +1,399 @@ |
7026 |
+//===-- SIISelLowering.cpp - SI DAG Lowering Implementation ---------------===// |
7027 |
+// |
7028 |
+// The LLVM Compiler Infrastructure |
7029 |
@@ -18090,16 +19429,16 @@ index 0000000..cd6e0e9 |
7030 |
+ addRegisterClass(MVT::f32, &AMDGPU::VReg_32RegClass); |
7031 |
+ addRegisterClass(MVT::i32, &AMDGPU::VReg_32RegClass); |
7032 |
+ addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass); |
7033 |
-+ addRegisterClass(MVT::i1, &AMDGPU::SCCRegRegClass); |
7034 |
-+ addRegisterClass(MVT::i1, &AMDGPU::VCCRegRegClass); |
7035 |
++ addRegisterClass(MVT::i1, &AMDGPU::SReg_64RegClass); |
7036 |
+ |
7037 |
-+ addRegisterClass(MVT::v4i32, &AMDGPU::SReg_128RegClass); |
7038 |
-+ addRegisterClass(MVT::v8i32, &AMDGPU::SReg_256RegClass); |
7039 |
++ addRegisterClass(MVT::v1i32, &AMDGPU::VReg_32RegClass); |
7040 |
++ addRegisterClass(MVT::v2i32, &AMDGPU::VReg_64RegClass); |
7041 |
++ addRegisterClass(MVT::v4i32, &AMDGPU::VReg_128RegClass); |
7042 |
++ addRegisterClass(MVT::v8i32, &AMDGPU::VReg_256RegClass); |
7043 |
++ addRegisterClass(MVT::v16i32, &AMDGPU::VReg_512RegClass); |
7044 |
+ |
7045 |
+ computeRegisterProperties(); |
7046 |
+ |
7047 |
-+ setOperationAction(ISD::AND, MVT::i1, Custom); |
7048 |
-+ |
7049 |
+ setOperationAction(ISD::ADD, MVT::i64, Legal); |
7050 |
+ setOperationAction(ISD::ADD, MVT::i32, Legal); |
7051 |
+ |
7052 |
@@ -18125,23 +19464,16 @@ index 0000000..cd6e0e9 |
7053 |
+ MachineRegisterInfo & MRI = BB->getParent()->getRegInfo(); |
7054 |
+ MachineBasicBlock::iterator I = MI; |
7055 |
+ |
7056 |
-+ if (TII->get(MI->getOpcode()).TSFlags & SIInstrFlags::NEED_WAIT) { |
7057 |
-+ AppendS_WAITCNT(MI, *BB, llvm::next(I)); |
7058 |
-+ return BB; |
7059 |
-+ } |
7060 |
-+ |
7061 |
+ switch (MI->getOpcode()) { |
7062 |
+ default: |
7063 |
+ return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB); |
7064 |
+ case AMDGPU::BRANCH: return BB; |
7065 |
+ case AMDGPU::CLAMP_SI: |
7066 |
-+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64)) |
7067 |
++ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64)) |
7068 |
+ .addOperand(MI->getOperand(0)) |
7069 |
+ .addOperand(MI->getOperand(1)) |
7070 |
-+ // VSRC1-2 are unused, but we still need to fill all the |
7071 |
-+ // operand slots, so we just reuse the VSRC0 operand |
7072 |
-+ .addOperand(MI->getOperand(1)) |
7073 |
-+ .addOperand(MI->getOperand(1)) |
7074 |
++ .addImm(0x80) // SRC1 |
7075 |
++ .addImm(0x80) // SRC2 |
7076 |
+ .addImm(0) // ABS |
7077 |
+ .addImm(1) // CLAMP |
7078 |
+ .addImm(0) // OMOD |
7079 |
@@ -18150,13 +19482,11 @@ index 0000000..cd6e0e9 |
7080 |
+ break; |
7081 |
+ |
7082 |
+ case AMDGPU::FABS_SI: |
7083 |
-+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64)) |
7084 |
++ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64)) |
7085 |
+ .addOperand(MI->getOperand(0)) |
7086 |
+ .addOperand(MI->getOperand(1)) |
7087 |
-+ // VSRC1-2 are unused, but we still need to fill all the |
7088 |
-+ // operand slots, so we just reuse the VSRC0 operand |
7089 |
-+ .addOperand(MI->getOperand(1)) |
7090 |
-+ .addOperand(MI->getOperand(1)) |
7091 |
++ .addImm(0x80) // SRC1 |
7092 |
++ .addImm(0x80) // SRC2 |
7093 |
+ .addImm(1) // ABS |
7094 |
+ .addImm(0) // CLAMP |
7095 |
+ .addImm(0) // OMOD |
7096 |
@@ -18165,13 +19495,11 @@ index 0000000..cd6e0e9 |
7097 |
+ break; |
7098 |
+ |
7099 |
+ case AMDGPU::FNEG_SI: |
7100 |
-+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64)) |
7101 |
++ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64)) |
7102 |
+ .addOperand(MI->getOperand(0)) |
7103 |
+ .addOperand(MI->getOperand(1)) |
7104 |
-+ // VSRC1-2 are unused, but we still need to fill all the |
7105 |
-+ // operand slots, so we just reuse the VSRC0 operand |
7106 |
-+ .addOperand(MI->getOperand(1)) |
7107 |
-+ .addOperand(MI->getOperand(1)) |
7108 |
++ .addImm(0x80) // SRC1 |
7109 |
++ .addImm(0x80) // SRC2 |
7110 |
+ .addImm(0) // ABS |
7111 |
+ .addImm(0) // CLAMP |
7112 |
+ .addImm(0) // OMOD |
7113 |
@@ -18187,29 +19515,13 @@ index 0000000..cd6e0e9 |
7114 |
+ case AMDGPU::SI_INTERP: |
7115 |
+ LowerSI_INTERP(MI, *BB, I, MRI); |
7116 |
+ break; |
7117 |
-+ case AMDGPU::SI_INTERP_CONST: |
7118 |
-+ LowerSI_INTERP_CONST(MI, *BB, I, MRI); |
7119 |
-+ break; |
7120 |
-+ case AMDGPU::SI_KIL: |
7121 |
-+ LowerSI_KIL(MI, *BB, I, MRI); |
7122 |
-+ break; |
7123 |
+ case AMDGPU::SI_WQM: |
7124 |
+ LowerSI_WQM(MI, *BB, I, MRI); |
7125 |
+ break; |
7126 |
-+ case AMDGPU::SI_V_CNDLT: |
7127 |
-+ LowerSI_V_CNDLT(MI, *BB, I, MRI); |
7128 |
-+ break; |
7129 |
+ } |
7130 |
+ return BB; |
7131 |
+} |
7132 |
+ |
7133 |
-+void SITargetLowering::AppendS_WAITCNT(MachineInstr *MI, MachineBasicBlock &BB, |
7134 |
-+ MachineBasicBlock::iterator I) const { |
7135 |
-+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_WAITCNT)) |
7136 |
-+ .addImm(0); |
7137 |
-+} |
7138 |
-+ |
7139 |
-+ |
7140 |
+void SITargetLowering::LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB, |
7141 |
+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const { |
7142 |
+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_WQM_B64), AMDGPU::EXEC) |
7143 |
@@ -18249,57 +19561,6 @@ index 0000000..cd6e0e9 |
7144 |
+ MI->eraseFromParent(); |
7145 |
+} |
7146 |
+ |
7147 |
-+void SITargetLowering::LowerSI_INTERP_CONST(MachineInstr *MI, |
7148 |
-+ MachineBasicBlock &BB, MachineBasicBlock::iterator I, |
7149 |
-+ MachineRegisterInfo &MRI) const { |
7150 |
-+ MachineOperand dst = MI->getOperand(0); |
7151 |
-+ MachineOperand attr_chan = MI->getOperand(1); |
7152 |
-+ MachineOperand attr = MI->getOperand(2); |
7153 |
-+ MachineOperand params = MI->getOperand(3); |
7154 |
-+ unsigned M0 = MRI.createVirtualRegister(&AMDGPU::M0RegRegClass); |
7155 |
-+ |
7156 |
-+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_MOV_B32), M0) |
7157 |
-+ .addOperand(params); |
7158 |
-+ |
7159 |
-+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_INTERP_MOV_F32)) |
7160 |
-+ .addOperand(dst) |
7161 |
-+ .addOperand(attr_chan) |
7162 |
-+ .addOperand(attr) |
7163 |
-+ .addReg(M0); |
7164 |
-+ |
7165 |
-+ MI->eraseFromParent(); |
7166 |
-+} |
7167 |
-+ |
7168 |
-+void SITargetLowering::LowerSI_KIL(MachineInstr *MI, MachineBasicBlock &BB, |
7169 |
-+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const { |
7170 |
-+ // Clear this pixel from the exec mask if the operand is negative |
7171 |
-+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CMPX_LE_F32_e32), |
7172 |
-+ AMDGPU::VCC) |
7173 |
-+ .addReg(AMDGPU::SREG_LIT_0) |
7174 |
-+ .addOperand(MI->getOperand(0)); |
7175 |
-+ |
7176 |
-+ MI->eraseFromParent(); |
7177 |
-+} |
7178 |
-+ |
7179 |
-+void SITargetLowering::LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB, |
7180 |
-+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const { |
7181 |
-+ unsigned VCC = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass); |
7182 |
-+ |
7183 |
-+ BuildMI(BB, I, BB.findDebugLoc(I), |
7184 |
-+ TII->get(AMDGPU::V_CMP_GT_F32_e32), |
7185 |
-+ VCC) |
7186 |
-+ .addReg(AMDGPU::SREG_LIT_0) |
7187 |
-+ .addOperand(MI->getOperand(1)); |
7188 |
-+ |
7189 |
-+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CNDMASK_B32_e32)) |
7190 |
-+ .addOperand(MI->getOperand(0)) |
7191 |
-+ .addOperand(MI->getOperand(3)) |
7192 |
-+ .addOperand(MI->getOperand(2)) |
7193 |
-+ .addReg(VCC); |
7194 |
-+ |
7195 |
-+ MI->eraseFromParent(); |
7196 |
-+} |
7197 |
-+ |
7198 |
+EVT SITargetLowering::getSetCCResultType(EVT VT) const { |
7199 |
+ return MVT::i1; |
7200 |
+} |
7201 |
@@ -18314,7 +19575,6 @@ index 0000000..cd6e0e9 |
7202 |
+ case ISD::BRCOND: return LowerBRCOND(Op, DAG); |
7203 |
+ case ISD::LOAD: return LowerLOAD(Op, DAG); |
7204 |
+ case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG); |
7205 |
-+ case ISD::AND: return Loweri1ContextSwitch(Op, DAG, ISD::AND); |
7206 |
+ case ISD::INTRINSIC_WO_CHAIN: { |
7207 |
+ unsigned IntrinsicID = |
7208 |
+ cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue(); |
7209 |
@@ -18331,30 +19591,6 @@ index 0000000..cd6e0e9 |
7210 |
+ return SDValue(); |
7211 |
+} |
7212 |
+ |
7213 |
-+/// \brief The function is for lowering i1 operations on the |
7214 |
-+/// VCC register. |
7215 |
-+/// |
7216 |
-+/// In the VALU context, VCC is a one bit register, but in the |
7217 |
-+/// SALU context the VCC is a 64-bit register (1-bit per thread). Since only |
7218 |
-+/// the SALU can perform operations on the VCC register, we need to promote |
7219 |
-+/// the operand types from i1 to i64 in order for tablegen to be able to match |
7220 |
-+/// this operation to the correct SALU instruction. We do this promotion by |
7221 |
-+/// wrapping the operands in a CopyToReg node. |
7222 |
-+/// |
7223 |
-+SDValue SITargetLowering::Loweri1ContextSwitch(SDValue Op, |
7224 |
-+ SelectionDAG &DAG, |
7225 |
-+ unsigned VCCNode) const { |
7226 |
-+ DebugLoc DL = Op.getDebugLoc(); |
7227 |
-+ |
7228 |
-+ SDValue OpNode = DAG.getNode(VCCNode, DL, MVT::i64, |
7229 |
-+ DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i64, |
7230 |
-+ Op.getOperand(0)), |
7231 |
-+ DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i64, |
7232 |
-+ Op.getOperand(1))); |
7233 |
-+ |
7234 |
-+ return DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i1, OpNode); |
7235 |
-+} |
7236 |
-+ |
7237 |
+/// \brief Helper function for LowerBRCOND |
7238 |
+static SDNode *findUser(SDValue Value, unsigned Opcode) { |
7239 |
+ |
7240 |
@@ -18559,22 +19795,12 @@ index 0000000..cd6e0e9 |
7241 |
+ } |
7242 |
+ return SDValue(); |
7243 |
+} |
7244 |
-+ |
7245 |
-+#define NODE_NAME_CASE(node) case SIISD::node: return #node; |
7246 |
-+ |
7247 |
-+const char* SITargetLowering::getTargetNodeName(unsigned Opcode) const { |
7248 |
-+ switch (Opcode) { |
7249 |
-+ default: return AMDGPUTargetLowering::getTargetNodeName(Opcode); |
7250 |
-+ NODE_NAME_CASE(VCC_AND) |
7251 |
-+ NODE_NAME_CASE(VCC_BITCAST) |
7252 |
-+ } |
7253 |
-+} |
7254 |
diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h |
7255 |
new file mode 100644 |
7256 |
-index 0000000..c088112 |
7257 |
+index 0000000..5d048f8 |
7258 |
--- /dev/null |
7259 |
+++ b/lib/Target/R600/SIISelLowering.h |
7260 |
-@@ -0,0 +1,62 @@ |
7261 |
+@@ -0,0 +1,48 @@ |
7262 |
+//===-- SIISelLowering.h - SI DAG Lowering Interface ------------*- C++ -*-===// |
7263 |
+// |
7264 |
+// The LLVM Compiler Infrastructure |
7265 |
@@ -18600,26 +19826,13 @@ index 0000000..c088112 |
7266 |
+class SITargetLowering : public AMDGPUTargetLowering { |
7267 |
+ const SIInstrInfo * TII; |
7268 |
+ |
7269 |
-+ /// Memory reads and writes are syncronized using the S_WAITCNT instruction. |
7270 |
-+ /// This function takes the most conservative approach and inserts an |
7271 |
-+ /// S_WAITCNT instruction after every read and write. |
7272 |
-+ void AppendS_WAITCNT(MachineInstr *MI, MachineBasicBlock &BB, |
7273 |
-+ MachineBasicBlock::iterator I) const; |
7274 |
+ void LowerMOV_IMM(MachineInstr *MI, MachineBasicBlock &BB, |
7275 |
+ MachineBasicBlock::iterator I, unsigned Opocde) const; |
7276 |
+ void LowerSI_INTERP(MachineInstr *MI, MachineBasicBlock &BB, |
7277 |
+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const; |
7278 |
-+ void LowerSI_INTERP_CONST(MachineInstr *MI, MachineBasicBlock &BB, |
7279 |
-+ MachineBasicBlock::iterator I, MachineRegisterInfo &MRI) const; |
7280 |
-+ void LowerSI_KIL(MachineInstr *MI, MachineBasicBlock &BB, |
7281 |
-+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const; |
7282 |
+ void LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB, |
7283 |
+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const; |
7284 |
-+ void LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB, |
7285 |
-+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const; |
7286 |
+ |
7287 |
-+ SDValue Loweri1ContextSwitch(SDValue Op, SelectionDAG &DAG, |
7288 |
-+ unsigned VCCNode) const; |
7289 |
+ SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const; |
7290 |
+ SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const; |
7291 |
+ SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const; |
7292 |
@@ -18631,18 +19844,376 @@ index 0000000..c088112 |
7293 |
+ virtual EVT getSetCCResultType(EVT VT) const; |
7294 |
+ virtual SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const; |
7295 |
+ virtual SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const; |
7296 |
-+ virtual const char* getTargetNodeName(unsigned Opcode) const; |
7297 |
+}; |
7298 |
+ |
7299 |
+} // End namespace llvm |
7300 |
+ |
7301 |
+#endif //SIISELLOWERING_H |
7302 |
+diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp |
7303 |
+new file mode 100644 |
7304 |
+index 0000000..24fc929 |
7305 |
+--- /dev/null |
7306 |
++++ b/lib/Target/R600/SIInsertWaits.cpp |
7307 |
+@@ -0,0 +1,353 @@ |
7308 |
++//===-- SILowerControlFlow.cpp - Use predicates for control flow ----------===// |
7309 |
++// |
7310 |
++// The LLVM Compiler Infrastructure |
7311 |
++// |
7312 |
++// This file is distributed under the University of Illinois Open Source |
7313 |
++// License. See LICENSE.TXT for details. |
7314 |
++// |
7315 |
++//===----------------------------------------------------------------------===// |
7316 |
++// |
7317 |
++/// \file |
7318 |
++/// \brief Insert wait instructions for memory reads and writes. |
7319 |
++/// |
7320 |
++/// Memory reads and writes are issued asynchronously, so we need to insert |
7321 |
++/// S_WAITCNT instructions when we want to access any of their results or |
7322 |
++/// overwrite any register that's used asynchronously. |
7323 |
++// |
7324 |
++//===----------------------------------------------------------------------===// |
7325 |
++ |
7326 |
++#include "AMDGPU.h" |
7327 |
++#include "SIInstrInfo.h" |
7328 |
++#include "SIMachineFunctionInfo.h" |
7329 |
++#include "llvm/CodeGen/MachineFunction.h" |
7330 |
++#include "llvm/CodeGen/MachineFunctionPass.h" |
7331 |
++#include "llvm/CodeGen/MachineInstrBuilder.h" |
7332 |
++#include "llvm/CodeGen/MachineRegisterInfo.h" |
7333 |
++ |
7334 |
++using namespace llvm; |
7335 |
++ |
7336 |
++namespace { |
7337 |
++ |
7338 |
++/// \brief One variable for each of the hardware counters |
7339 |
++typedef union { |
7340 |
++ struct { |
7341 |
++ unsigned VM; |
7342 |
++ unsigned EXP; |
7343 |
++ unsigned LGKM; |
7344 |
++ } Named; |
7345 |
++ unsigned Array[3]; |
7346 |
++ |
7347 |
++} Counters; |
7348 |
++ |
7349 |
++typedef Counters RegCounters[512]; |
7350 |
++typedef std::pair<unsigned, unsigned> RegInterval; |
7351 |
++ |
7352 |
++class SIInsertWaits : public MachineFunctionPass { |
7353 |
++ |
7354 |
++private: |
7355 |
++ static char ID; |
7356 |
++ const SIInstrInfo *TII; |
7357 |
++ const SIRegisterInfo &TRI; |
7358 |
++ const MachineRegisterInfo *MRI; |
7359 |
++ |
7360 |
++ /// \brief Constant hardware limits |
7361 |
++ static const Counters WaitCounts; |
7362 |
++ |
7363 |
++ /// \brief Constant zero value |
7364 |
++ static const Counters ZeroCounts; |
7365 |
++ |
7366 |
++ /// \brief Counter values we have already waited on. |
7367 |
++ Counters WaitedOn; |
7368 |
++ |
7369 |
++ /// \brief Counter values for last instruction issued. |
7370 |
++ Counters LastIssued; |
7371 |
++ |
7372 |
++ /// \brief Registers used by async instructions. |
7373 |
++ RegCounters UsedRegs; |
7374 |
++ |
7375 |
++ /// \brief Registers defined by async instructions. |
7376 |
++ RegCounters DefinedRegs; |
7377 |
++ |
7378 |
++ /// \brief Different export instruction types seen since last wait. |
7379 |
++ unsigned ExpInstrTypesSeen; |
7380 |
++ |
7381 |
++ /// \brief Get increment/decrement amount for this instruction. |
7382 |
++ Counters getHwCounts(MachineInstr &MI); |
7383 |
++ |
7384 |
++ /// \brief Is operand relevant for async execution? |
7385 |
++ bool isOpRelevant(MachineOperand &Op); |
7386 |
++ |
7387 |
++ /// \brief Get register interval an operand affects. |
7388 |
++ RegInterval getRegInterval(MachineOperand &Op); |
7389 |
++ |
7390 |
++ /// \brief Handle instructions async components |
7391 |
++ void pushInstruction(MachineInstr &MI); |
7392 |
++ |
7393 |
++ /// \brief Insert the actual wait instruction |
7394 |
++ bool insertWait(MachineBasicBlock &MBB, |
7395 |
++ MachineBasicBlock::iterator I, |
7396 |
++ const Counters &Counts); |
7397 |
++ |
7398 |
++ /// \brief Resolve all operand dependencies to counter requirements |
7399 |
++ Counters handleOperands(MachineInstr &MI); |
7400 |
++ |
7401 |
++public: |
7402 |
++ SIInsertWaits(TargetMachine &tm) : |
7403 |
++ MachineFunctionPass(ID), |
7404 |
++ TII(static_cast<const SIInstrInfo*>(tm.getInstrInfo())), |
7405 |
++ TRI(TII->getRegisterInfo()) { } |
7406 |
++ |
7407 |
++ virtual bool runOnMachineFunction(MachineFunction &MF); |
7408 |
++ |
7409 |
++ const char *getPassName() const { |
7410 |
++ return "SI insert wait instructions"; |
7411 |
++ } |
7412 |
++ |
7413 |
++}; |
7414 |
++ |
7415 |
++} // End anonymous namespace |
7416 |
++ |
7417 |
++char SIInsertWaits::ID = 0; |
7418 |
++ |
7419 |
++const Counters SIInsertWaits::WaitCounts = { { 15, 7, 7 } }; |
7420 |
++const Counters SIInsertWaits::ZeroCounts = { { 0, 0, 0 } }; |
7421 |
++ |
7422 |
++FunctionPass *llvm::createSIInsertWaits(TargetMachine &tm) { |
7423 |
++ return new SIInsertWaits(tm); |
7424 |
++} |
7425 |
++ |
7426 |
++Counters SIInsertWaits::getHwCounts(MachineInstr &MI) { |
7427 |
++ |
7428 |
++ uint64_t TSFlags = TII->get(MI.getOpcode()).TSFlags; |
7429 |
++ Counters Result; |
7430 |
++ |
7431 |
++ Result.Named.VM = !!(TSFlags & SIInstrFlags::VM_CNT); |
7432 |
++ |
7433 |
++ // Only consider stores or EXP for EXP_CNT |
7434 |
++ Result.Named.EXP = !!(TSFlags & SIInstrFlags::EXP_CNT && |
7435 |
++ (MI.getOpcode() == AMDGPU::EXP || !MI.getDesc().mayStore())); |
7436 |
++ |
7437 |
++ // LGKM may uses larger values |
7438 |
++ if (TSFlags & SIInstrFlags::LGKM_CNT) { |
7439 |
++ |
7440 |
++ MachineOperand &Op = MI.getOperand(0); |
7441 |
++ assert(Op.isReg() && "First LGKM operand must be a register!"); |
7442 |
++ |
7443 |
++ unsigned Reg = Op.getReg(); |
7444 |
++ unsigned Size = TRI.getMinimalPhysRegClass(Reg)->getSize(); |
7445 |
++ Result.Named.LGKM = Size > 4 ? 2 : 1; |
7446 |
++ |
7447 |
++ } else { |
7448 |
++ Result.Named.LGKM = 0; |
7449 |
++ } |
7450 |
++ |
7451 |
++ return Result; |
7452 |
++} |
7453 |
++ |
7454 |
++bool SIInsertWaits::isOpRelevant(MachineOperand &Op) { |
7455 |
++ |
7456 |
++ // Constants are always irrelevant |
7457 |
++ if (!Op.isReg()) |
7458 |
++ return false; |
7459 |
++ |
7460 |
++ // Defines are always relevant |
7461 |
++ if (Op.isDef()) |
7462 |
++ return true; |
7463 |
++ |
7464 |
++ // For exports all registers are relevant |
7465 |
++ MachineInstr &MI = *Op.getParent(); |
7466 |
++ if (MI.getOpcode() == AMDGPU::EXP) |
7467 |
++ return true; |
7468 |
++ |
7469 |
++ // For stores the stored value is also relevant |
7470 |
++ if (!MI.getDesc().mayStore()) |
7471 |
++ return false; |
7472 |
++ |
7473 |
++ for (MachineInstr::mop_iterator I = MI.operands_begin(), |
7474 |
++ E = MI.operands_end(); I != E; ++I) { |
7475 |
++ |
7476 |
++ if (I->isReg() && I->isUse()) |
7477 |
++ return Op.isIdenticalTo(*I); |
7478 |
++ } |
7479 |
++ |
7480 |
++ return false; |
7481 |
++} |
7482 |
++ |
7483 |
++RegInterval SIInsertWaits::getRegInterval(MachineOperand &Op) { |
7484 |
++ |
7485 |
++ if (!Op.isReg()) |
7486 |
++ return std::make_pair(0, 0); |
7487 |
++ |
7488 |
++ unsigned Reg = Op.getReg(); |
7489 |
++ unsigned Size = TRI.getMinimalPhysRegClass(Reg)->getSize(); |
7490 |
++ |
7491 |
++ assert(Size >= 4); |
7492 |
++ |
7493 |
++ RegInterval Result; |
7494 |
++ Result.first = TRI.getEncodingValue(Reg); |
7495 |
++ Result.second = Result.first + Size / 4; |
7496 |
++ |
7497 |
++ return Result; |
7498 |
++} |
7499 |
++ |
7500 |
++void SIInsertWaits::pushInstruction(MachineInstr &MI) { |
7501 |
++ |
7502 |
++ // Get the hardware counter increments and sum them up |
7503 |
++ Counters Increment = getHwCounts(MI); |
7504 |
++ unsigned Sum = 0; |
7505 |
++ |
7506 |
++ for (unsigned i = 0; i < 3; ++i) { |
7507 |
++ LastIssued.Array[i] += Increment.Array[i]; |
7508 |
++ Sum += Increment.Array[i]; |
7509 |
++ } |
7510 |
++ |
7511 |
++ // If we don't increase anything then that's it |
7512 |
++ if (Sum == 0) |
7513 |
++ return; |
7514 |
++ |
7515 |
++ // Remember which export instructions we have seen |
7516 |
++ if (Increment.Named.EXP) { |
7517 |
++ ExpInstrTypesSeen |= MI.getOpcode() == AMDGPU::EXP ? 1 : 2; |
7518 |
++ } |
7519 |
++ |
7520 |
++ for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) { |
7521 |
++ |
7522 |
++ MachineOperand &Op = MI.getOperand(i); |
7523 |
++ if (!isOpRelevant(Op)) |
7524 |
++ continue; |
7525 |
++ |
7526 |
++ RegInterval Interval = getRegInterval(Op); |
7527 |
++ for (unsigned j = Interval.first; j < Interval.second; ++j) { |
7528 |
++ |
7529 |
++ // Remember which registers we define |
7530 |
++ if (Op.isDef()) |
7531 |
++ DefinedRegs[j] = LastIssued; |
7532 |
++ |
7533 |
++ // and which one we are using |
7534 |
++ if (Op.isUse()) |
7535 |
++ UsedRegs[j] = LastIssued; |
7536 |
++ } |
7537 |
++ } |
7538 |
++} |
7539 |
++ |
7540 |
++bool SIInsertWaits::insertWait(MachineBasicBlock &MBB, |
7541 |
++ MachineBasicBlock::iterator I, |
7542 |
++ const Counters &Required) { |
7543 |
++ |
7544 |
++ // End of program? No need to wait on anything |
7545 |
++ if (I != MBB.end() && I->getOpcode() == AMDGPU::S_ENDPGM) |
7546 |
++ return false; |
7547 |
++ |
7548 |
++ // Figure out if the async instructions execute in order |
7549 |
++ bool Ordered[3]; |
7550 |
++ |
7551 |
++ // VM_CNT is always ordered |
7552 |
++ Ordered[0] = true; |
7553 |
++ |
7554 |
++ // EXP_CNT is unordered if we have both EXP & VM-writes |
7555 |
++ Ordered[1] = ExpInstrTypesSeen == 3; |
7556 |
++ |
7557 |
++ // LGKM_CNT is handled as always unordered. TODO: Handle LDS and GDS |
7558 |
++ Ordered[2] = false; |
7559 |
++ |
7560 |
++ // The values we are going to put into the S_WAITCNT instruction |
7561 |
++ Counters Counts = WaitCounts; |
7562 |
++ |
7563 |
++ // Do we really need to wait? |
7564 |
++ bool NeedWait = false; |
7565 |
++ |
7566 |
++ for (unsigned i = 0; i < 3; ++i) { |
7567 |
++ |
7568 |
++ if (Required.Array[i] <= WaitedOn.Array[i]) |
7569 |
++ continue; |
7570 |
++ |
7571 |
++ NeedWait = true; |
7572 |
++ |
7573 |
++ if (Ordered[i]) { |
7574 |
++ unsigned Value = LastIssued.Array[i] - Required.Array[i]; |
7575 |
++ |
7576 |
++ // adjust the value to the real hardware posibilities |
7577 |
++ Counts.Array[i] = std::min(Value, WaitCounts.Array[i]); |
7578 |
++ |
7579 |
++ } else |
7580 |
++ Counts.Array[i] = 0; |
7581 |
++ |
7582 |
++ // Remember on what we have waited on |
7583 |
++ WaitedOn.Array[i] = LastIssued.Array[i] - Counts.Array[i]; |
7584 |
++ } |
7585 |
++ |
7586 |
++ if (!NeedWait) |
7587 |
++ return false; |
7588 |
++ |
7589 |
++ // Reset EXP_CNT instruction types |
7590 |
++ if (Counts.Named.EXP == 0) |
7591 |
++ ExpInstrTypesSeen = 0; |
7592 |
++ |
7593 |
++ // Build the wait instruction |
7594 |
++ BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)) |
7595 |
++ .addImm((Counts.Named.VM & 0xF) | |
7596 |
++ ((Counts.Named.EXP & 0x7) << 4) | |
7597 |
++ ((Counts.Named.LGKM & 0x7) << 8)); |
7598 |
++ |
7599 |
++ return true; |
7600 |
++} |
7601 |
++ |
7602 |
++/// \brief helper function for handleOperands |
7603 |
++static void increaseCounters(Counters &Dst, const Counters &Src) { |
7604 |
++ |
7605 |
++ for (unsigned i = 0; i < 3; ++i) |
7606 |
++ Dst.Array[i] = std::max(Dst.Array[i], Src.Array[i]); |
7607 |
++} |
7608 |
++ |
7609 |
++Counters SIInsertWaits::handleOperands(MachineInstr &MI) { |
7610 |
++ |
7611 |
++ Counters Result = ZeroCounts; |
7612 |
++ |
7613 |
++ // For each register affected by this |
7614 |
++ // instruction increase the result sequence |
7615 |
++ for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) { |
7616 |
++ |
7617 |
++ MachineOperand &Op = MI.getOperand(i); |
7618 |
++ RegInterval Interval = getRegInterval(Op); |
7619 |
++ for (unsigned j = Interval.first; j < Interval.second; ++j) { |
7620 |
++ |
7621 |
++ if (Op.isDef()) |
7622 |
++ increaseCounters(Result, UsedRegs[j]); |
7623 |
++ |
7624 |
++ if (Op.isUse()) |
7625 |
++ increaseCounters(Result, DefinedRegs[j]); |
7626 |
++ } |
7627 |
++ } |
7628 |
++ |
7629 |
++ return Result; |
7630 |
++} |
7631 |
++ |
7632 |
++bool SIInsertWaits::runOnMachineFunction(MachineFunction &MF) { |
7633 |
++ |
7634 |
++ bool Changes = false; |
7635 |
++ |
7636 |
++ MRI = &MF.getRegInfo(); |
7637 |
++ |
7638 |
++ WaitedOn = ZeroCounts; |
7639 |
++ LastIssued = ZeroCounts; |
7640 |
++ |
7641 |
++ memset(&UsedRegs, 0, sizeof(UsedRegs)); |
7642 |
++ memset(&DefinedRegs, 0, sizeof(DefinedRegs)); |
7643 |
++ |
7644 |
++ for (MachineFunction::iterator BI = MF.begin(), BE = MF.end(); |
7645 |
++ BI != BE; ++BI) { |
7646 |
++ |
7647 |
++ MachineBasicBlock &MBB = *BI; |
7648 |
++ for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end(); |
7649 |
++ I != E; ++I) { |
7650 |
++ |
7651 |
++ Changes |= insertWait(MBB, I, handleOperands(*I)); |
7652 |
++ pushInstruction(*I); |
7653 |
++ } |
7654 |
++ |
7655 |
++ // Wait for everything at the end of the MBB |
7656 |
++ Changes |= insertWait(MBB, MBB.getFirstTerminator(), LastIssued); |
7657 |
++ } |
7658 |
++ |
7659 |
++ return Changes; |
7660 |
++} |
7661 |
diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td |
7662 |
new file mode 100644 |
7663 |
-index 0000000..aea3b5a |
7664 |
+index 0000000..40e37aa |
7665 |
--- /dev/null |
7666 |
+++ b/lib/Target/R600/SIInstrFormats.td |
7667 |
-@@ -0,0 +1,146 @@ |
7668 |
+@@ -0,0 +1,188 @@ |
7669 |
+//===-- SIInstrFormats.td - SI Instruction Formats ------------------------===// |
7670 |
+// |
7671 |
+// The LLVM Compiler Infrastructure |
7672 |
@@ -18666,40 +20237,23 @@ index 0000000..aea3b5a |
7673 |
+// |
7674 |
+//===----------------------------------------------------------------------===// |
7675 |
+ |
7676 |
-+class VOP3b_2IN <bits<9> op, string opName, RegisterClass dstClass, |
7677 |
-+ RegisterClass src0Class, RegisterClass src1Class, |
7678 |
-+ list<dag> pattern> |
7679 |
-+ : VOP3b <op, (outs dstClass:$vdst), |
7680 |
-+ (ins src0Class:$src0, src1Class:$src1, InstFlag:$src2, InstFlag:$sdst, |
7681 |
-+ InstFlag:$omod, InstFlag:$neg), |
7682 |
-+ opName, pattern |
7683 |
-+>; |
7684 |
-+ |
7685 |
-+ |
7686 |
-+class VOP3_1_32 <bits<9> op, string opName, list<dag> pattern> |
7687 |
-+ : VOP3b_2IN <op, opName, SReg_1, AllReg_32, VReg_32, pattern>; |
7688 |
-+ |
7689 |
+class VOP3_32 <bits<9> op, string opName, list<dag> pattern> |
7690 |
-+ : VOP3 <op, (outs VReg_32:$dst), (ins AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>; |
7691 |
++ : VOP3 <op, (outs VReg_32:$dst), (ins VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>; |
7692 |
+ |
7693 |
+class VOP3_64 <bits<9> op, string opName, list<dag> pattern> |
7694 |
-+ : VOP3 <op, (outs VReg_64:$dst), (ins AllReg_64:$src0, VReg_64:$src1, VReg_64:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>; |
7695 |
-+ |
7696 |
++ : VOP3 <op, (outs VReg_64:$dst), (ins VSrc_64:$src0, VReg_64:$src1, VReg_64:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>; |
7697 |
+ |
7698 |
+class SOP1_32 <bits<8> op, string opName, list<dag> pattern> |
7699 |
-+ : SOP1 <op, (outs SReg_32:$dst), (ins SReg_32:$src0), opName, pattern>; |
7700 |
++ : SOP1 <op, (outs SReg_32:$dst), (ins SSrc_32:$src0), opName, pattern>; |
7701 |
+ |
7702 |
+class SOP1_64 <bits<8> op, string opName, list<dag> pattern> |
7703 |
-+ : SOP1 <op, (outs SReg_64:$dst), (ins SReg_64:$src0), opName, pattern>; |
7704 |
++ : SOP1 <op, (outs SReg_64:$dst), (ins SSrc_64:$src0), opName, pattern>; |
7705 |
+ |
7706 |
+class SOP2_32 <bits<7> op, string opName, list<dag> pattern> |
7707 |
-+ : SOP2 <op, (outs SReg_32:$dst), (ins SReg_32:$src0, SReg_32:$src1), opName, pattern>; |
7708 |
++ : SOP2 <op, (outs SReg_32:$dst), (ins SSrc_32:$src0, SSrc_32:$src1), opName, pattern>; |
7709 |
+ |
7710 |
+class SOP2_64 <bits<7> op, string opName, list<dag> pattern> |
7711 |
-+ : SOP2 <op, (outs SReg_64:$dst), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>; |
7712 |
-+ |
7713 |
-+class SOP2_VCC <bits<7> op, string opName, list<dag> pattern> |
7714 |
-+ : SOP2 <op, (outs SReg_1:$vcc), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>; |
7715 |
++ : SOP2 <op, (outs SReg_64:$dst), (ins SSrc_64:$src0, SSrc_64:$src1), opName, pattern>; |
7716 |
+ |
7717 |
+class VOP1_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc, |
7718 |
+ string opName, list<dag> pattern> : |
7719 |
@@ -18708,7 +20262,7 @@ index 0000000..aea3b5a |
7720 |
+ >; |
7721 |
+ |
7722 |
+multiclass VOP1_32 <bits<8> op, string opName, list<dag> pattern> { |
7723 |
-+ def _e32: VOP1_Helper <op, VReg_32, AllReg_32, opName, pattern>; |
7724 |
++ def _e32: VOP1_Helper <op, VReg_32, VSrc_32, opName, pattern>; |
7725 |
+ def _e64 : VOP3_32 <{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, |
7726 |
+ opName, [] |
7727 |
+ >; |
7728 |
@@ -18716,7 +20270,7 @@ index 0000000..aea3b5a |
7729 |
+ |
7730 |
+multiclass VOP1_64 <bits<8> op, string opName, list<dag> pattern> { |
7731 |
+ |
7732 |
-+ def _e32 : VOP1_Helper <op, VReg_64, AllReg_64, opName, pattern>; |
7733 |
++ def _e32 : VOP1_Helper <op, VReg_64, VSrc_64, opName, pattern>; |
7734 |
+ |
7735 |
+ def _e64 : VOP3_64 < |
7736 |
+ {1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, |
7737 |
@@ -18732,7 +20286,7 @@ index 0000000..aea3b5a |
7738 |
+ |
7739 |
+multiclass VOP2_32 <bits<6> op, string opName, list<dag> pattern> { |
7740 |
+ |
7741 |
-+ def _e32 : VOP2_Helper <op, VReg_32, AllReg_32, opName, pattern>; |
7742 |
++ def _e32 : VOP2_Helper <op, VReg_32, VSrc_32, opName, pattern>; |
7743 |
+ |
7744 |
+ def _e64 : VOP3_32 <{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, |
7745 |
+ opName, [] |
7746 |
@@ -18740,7 +20294,7 @@ index 0000000..aea3b5a |
7747 |
+} |
7748 |
+ |
7749 |
+multiclass VOP2_64 <bits<6> op, string opName, list<dag> pattern> { |
7750 |
-+ def _e32: VOP2_Helper <op, VReg_64, AllReg_64, opName, pattern>; |
7751 |
++ def _e32: VOP2_Helper <op, VReg_64, VSrc_64, opName, pattern>; |
7752 |
+ |
7753 |
+ def _e64 : VOP3_64 < |
7754 |
+ {1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, |
7755 |
@@ -18754,47 +20308,106 @@ index 0000000..aea3b5a |
7756 |
+class SOPK_64 <bits<5> op, string opName, list<dag> pattern> |
7757 |
+ : SOPK <op, (outs SReg_64:$dst), (ins i16imm:$src0), opName, pattern>; |
7758 |
+ |
7759 |
-+class VOPC_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc, |
7760 |
-+ string opName, list<dag> pattern> : |
7761 |
-+ VOPC < |
7762 |
-+ op, (ins arc:$src0, vrc:$src1), opName, pattern |
7763 |
-+ >; |
7764 |
++multiclass VOPC_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc, |
7765 |
++ string opName, list<dag> pattern> { |
7766 |
+ |
7767 |
-+multiclass VOPC_32 <bits<9> op, string opName, list<dag> pattern> { |
7768 |
++ def _e32 : VOPC <op, (ins arc:$src0, vrc:$src1), opName, pattern>; |
7769 |
++ def _e64 : VOP3 < |
7770 |
++ {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, |
7771 |
++ (outs SReg_64:$dst), |
7772 |
++ (ins arc:$src0, vrc:$src1, |
7773 |
++ InstFlag:$abs, InstFlag:$clamp, |
7774 |
++ InstFlag:$omod, InstFlag:$neg), |
7775 |
++ opName, pattern |
7776 |
++ > { |
7777 |
++ let SRC2 = 0x80; |
7778 |
++ } |
7779 |
++} |
7780 |
+ |
7781 |
-+ def _e32 : VOPC_Helper < |
7782 |
-+ {op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, |
7783 |
-+ VReg_32, AllReg_32, opName, pattern |
7784 |
-+ >; |
7785 |
++multiclass VOPC_32 <bits<8> op, string opName, list<dag> pattern> |
7786 |
++ : VOPC_Helper <op, VReg_32, VSrc_32, opName, pattern>; |
7787 |
+ |
7788 |
-+ def _e64 : VOP3_1_32 < |
7789 |
-+ op, |
7790 |
-+ opName, pattern |
7791 |
-+ >; |
7792 |
++multiclass VOPC_64 <bits<8> op, string opName, list<dag> pattern> |
7793 |
++ : VOPC_Helper <op, VReg_64, VSrc_64, opName, pattern>; |
7794 |
++ |
7795 |
++class SOPC_32 <bits<7> op, string opName, list<dag> pattern> |
7796 |
++ : SOPC <op, (outs SCCReg:$dst), (ins SSrc_32:$src0, SSrc_32:$src1), opName, pattern>; |
7797 |
++ |
7798 |
++class SOPC_64 <bits<7> op, string opName, list<dag> pattern> |
7799 |
++ : SOPC <op, (outs SCCReg:$dst), (ins SSrc_64:$src0, SSrc_64:$src1), opName, pattern>; |
7800 |
++ |
7801 |
++class MIMG_Load_Helper <bits<7> op, string asm> : MIMG < |
7802 |
++ op, |
7803 |
++ (outs VReg_128:$vdata), |
7804 |
++ (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128, |
7805 |
++ i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr, |
7806 |
++ GPR4Align<SReg_256>:$srsrc, GPR4Align<SReg_128>:$ssamp), |
7807 |
++ asm, |
7808 |
++ []> { |
7809 |
++ let mayLoad = 1; |
7810 |
++ let mayStore = 0; |
7811 |
+} |
7812 |
+ |
7813 |
-+multiclass VOPC_64 <bits<8> op, string opName, list<dag> pattern> { |
7814 |
++class MTBUF_Store_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF < |
7815 |
++ op, |
7816 |
++ (outs), |
7817 |
++ (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, |
7818 |
++ i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, |
7819 |
++ GPR4Align<SReg_128>:$srsrc, i1imm:$slc, i1imm:$tfe, SSrc_32:$soffset), |
7820 |
++ asm, |
7821 |
++ []> { |
7822 |
++ let mayStore = 1; |
7823 |
++ let mayLoad = 0; |
7824 |
++} |
7825 |
+ |
7826 |
-+ def _e32 : VOPC_Helper <op, VReg_64, AllReg_64, opName, pattern>; |
7827 |
++class MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass> : MUBUF < |
7828 |
++ op, |
7829 |
++ (outs regClass:$dst), |
7830 |
++ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64, |
7831 |
++ i1imm:$lds, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc, i1imm:$slc, |
7832 |
++ i1imm:$tfe, SSrc_32:$soffset), |
7833 |
++ asm, |
7834 |
++ []> { |
7835 |
++ let mayLoad = 1; |
7836 |
++ let mayStore = 0; |
7837 |
++} |
7838 |
+ |
7839 |
-+ def _e64 : VOP3_64 < |
7840 |
-+ {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}}, |
7841 |
-+ opName, [] |
7842 |
-+ >; |
7843 |
++class MTBUF_Load_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF < |
7844 |
++ op, |
7845 |
++ (outs regClass:$dst), |
7846 |
++ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64, |
7847 |
++ i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc, |
7848 |
++ i1imm:$slc, i1imm:$tfe, SSrc_32:$soffset), |
7849 |
++ asm, |
7850 |
++ []> { |
7851 |
++ let mayLoad = 1; |
7852 |
++ let mayStore = 0; |
7853 |
+} |
7854 |
+ |
7855 |
-+class SOPC_32 <bits<7> op, string opName, list<dag> pattern> |
7856 |
-+ : SOPC <op, (outs SCCReg:$dst), (ins SReg_32:$src0, SReg_32:$src1), opName, pattern>; |
7857 |
++multiclass SMRD_Helper <bits<5> op, string asm, RegisterClass dstClass> { |
7858 |
++ def _IMM : SMRD < |
7859 |
++ op, 1, |
7860 |
++ (outs dstClass:$dst), |
7861 |
++ (ins GPR2Align<SReg_64>:$sbase, i32imm:$offset), |
7862 |
++ asm, |
7863 |
++ [] |
7864 |
++ >; |
7865 |
+ |
7866 |
-+class SOPC_64 <bits<7> op, string opName, list<dag> pattern> |
7867 |
-+ : SOPC <op, (outs SCCReg:$dst), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>; |
7868 |
++ def _SGPR : SMRD < |
7869 |
++ op, 0, |
7870 |
++ (outs dstClass:$dst), |
7871 |
++ (ins GPR2Align<SReg_64>:$sbase, SReg_32:$soff), |
7872 |
++ asm, |
7873 |
++ [] |
7874 |
++ >; |
7875 |
++} |
7876 |
+ |
7877 |
diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp |
7878 |
new file mode 100644 |
7879 |
-index 0000000..adcffa8 |
7880 |
+index 0000000..1c4b3cf |
7881 |
--- /dev/null |
7882 |
+++ b/lib/Target/R600/SIInstrInfo.cpp |
7883 |
-@@ -0,0 +1,90 @@ |
7884 |
+@@ -0,0 +1,143 @@ |
7885 |
+//===-- SIInstrInfo.cpp - SI Instruction Information ---------------------===// |
7886 |
+// |
7887 |
+// The LLVM Compiler Infrastructure |
7888 |
@@ -18839,7 +20452,15 @@ index 0000000..adcffa8 |
7889 |
+ // never be necessary. |
7890 |
+ assert(DestReg != AMDGPU::SCC && SrcReg != AMDGPU::SCC); |
7891 |
+ |
7892 |
-+ if (AMDGPU::SReg_64RegClass.contains(DestReg)) { |
7893 |
++ if (AMDGPU::VReg_64RegClass.contains(DestReg)) { |
7894 |
++ assert(AMDGPU::VReg_64RegClass.contains(SrcReg) || |
7895 |
++ AMDGPU::SReg_64RegClass.contains(SrcReg)); |
7896 |
++ BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), RI.getSubReg(DestReg, AMDGPU::sub0)) |
7897 |
++ .addReg(RI.getSubReg(SrcReg, AMDGPU::sub0), getKillRegState(KillSrc)) |
7898 |
++ .addReg(DestReg, RegState::Define | RegState::Implicit); |
7899 |
++ BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), RI.getSubReg(DestReg, AMDGPU::sub1)) |
7900 |
++ .addReg(RI.getSubReg(SrcReg, AMDGPU::sub1), getKillRegState(KillSrc)); |
7901 |
++ } else if (AMDGPU::SReg_64RegClass.contains(DestReg)) { |
7902 |
+ assert(AMDGPU::SReg_64RegClass.contains(SrcReg)); |
7903 |
+ BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B64), DestReg) |
7904 |
+ .addReg(SrcReg, getKillRegState(KillSrc)); |
7905 |
@@ -18858,8 +20479,8 @@ index 0000000..adcffa8 |
7906 |
+ |
7907 |
+MachineInstr * SIInstrInfo::getMovImmInstr(MachineFunction *MF, unsigned DstReg, |
7908 |
+ int64_t Imm) const { |
7909 |
-+ MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::V_MOV_IMM_I32), DebugLoc()); |
7910 |
-+ MachineInstrBuilder MIB(*MF, MI); |
7911 |
++ MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::V_MOV_B32_e32), DebugLoc()); |
7912 |
++ MachineInstrBuilder MIB(MI); |
7913 |
+ MIB.addReg(DstReg, RegState::Define); |
7914 |
+ MIB.addImm(Imm); |
7915 |
+ |
7916 |
@@ -18874,9 +20495,6 @@ index 0000000..adcffa8 |
7917 |
+ case AMDGPU::S_MOV_B64: |
7918 |
+ case AMDGPU::V_MOV_B32_e32: |
7919 |
+ case AMDGPU::V_MOV_B32_e64: |
7920 |
-+ case AMDGPU::V_MOV_IMM_F32: |
7921 |
-+ case AMDGPU::V_MOV_IMM_I32: |
7922 |
-+ case AMDGPU::S_MOV_IMM_I32: |
7923 |
+ return true; |
7924 |
+ } |
7925 |
+} |
7926 |
@@ -18885,12 +20503,60 @@ index 0000000..adcffa8 |
7927 |
+SIInstrInfo::isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const { |
7928 |
+ return RC != &AMDGPU::EXECRegRegClass; |
7929 |
+} |
7930 |
++ |
7931 |
++//===----------------------------------------------------------------------===// |
7932 |
++// Indirect addressing callbacks |
7933 |
++//===----------------------------------------------------------------------===// |
7934 |
++ |
7935 |
++unsigned SIInstrInfo::calculateIndirectAddress(unsigned RegIndex, |
7936 |
++ unsigned Channel) const { |
7937 |
++ assert(Channel == 0); |
7938 |
++ return RegIndex; |
7939 |
++} |
7940 |
++ |
7941 |
++ |
7942 |
++int SIInstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const { |
7943 |
++ llvm_unreachable("Unimplemented"); |
7944 |
++} |
7945 |
++ |
7946 |
++int SIInstrInfo::getIndirectIndexEnd(const MachineFunction &MF) const { |
7947 |
++ llvm_unreachable("Unimplemented"); |
7948 |
++} |
7949 |
++ |
7950 |
++const TargetRegisterClass *SIInstrInfo::getIndirectAddrStoreRegClass( |
7951 |
++ unsigned SourceReg) const { |
7952 |
++ llvm_unreachable("Unimplemented"); |
7953 |
++} |
7954 |
++ |
7955 |
++const TargetRegisterClass *SIInstrInfo::getIndirectAddrLoadRegClass() const { |
7956 |
++ llvm_unreachable("Unimplemented"); |
7957 |
++} |
7958 |
++ |
7959 |
++MachineInstrBuilder SIInstrInfo::buildIndirectWrite( |
7960 |
++ MachineBasicBlock *MBB, |
7961 |
++ MachineBasicBlock::iterator I, |
7962 |
++ unsigned ValueReg, |
7963 |
++ unsigned Address, unsigned OffsetReg) const { |
7964 |
++ llvm_unreachable("Unimplemented"); |
7965 |
++} |
7966 |
++ |
7967 |
++MachineInstrBuilder SIInstrInfo::buildIndirectRead( |
7968 |
++ MachineBasicBlock *MBB, |
7969 |
++ MachineBasicBlock::iterator I, |
7970 |
++ unsigned ValueReg, |
7971 |
++ unsigned Address, unsigned OffsetReg) const { |
7972 |
++ llvm_unreachable("Unimplemented"); |
7973 |
++} |
7974 |
++ |
7975 |
++const TargetRegisterClass *SIInstrInfo::getSuperIndirectRegClass() const { |
7976 |
++ llvm_unreachable("Unimplemented"); |
7977 |
++} |
7978 |
diff --git a/lib/Target/R600/SIInstrInfo.h b/lib/Target/R600/SIInstrInfo.h |
7979 |
new file mode 100644 |
7980 |
-index 0000000..631f6c0 |
7981 |
+index 0000000..a65f7b6 |
7982 |
--- /dev/null |
7983 |
+++ b/lib/Target/R600/SIInstrInfo.h |
7984 |
-@@ -0,0 +1,62 @@ |
7985 |
+@@ -0,0 +1,84 @@ |
7986 |
+//===-- SIInstrInfo.h - SI Instruction Info Interface ---------------------===// |
7987 |
+// |
7988 |
+// The LLVM Compiler Infrastructure |
7989 |
@@ -18928,12 +20594,6 @@ index 0000000..631f6c0 |
7990 |
+ unsigned DestReg, unsigned SrcReg, |
7991 |
+ bool KillSrc) const; |
7992 |
+ |
7993 |
-+ /// \returns the encoding type of this instruction. |
7994 |
-+ unsigned getEncodingType(const MachineInstr &MI) const; |
7995 |
-+ |
7996 |
-+ /// \returns the size of this instructions encoding in number of bytes. |
7997 |
-+ unsigned getEncodingBytes(const MachineInstr &MI) const; |
7998 |
-+ |
7999 |
+ virtual MachineInstr * getMovImmInstr(MachineFunction *MF, unsigned DstReg, |
8000 |
+ int64_t Imm) const; |
8001 |
+ |
8002 |
@@ -18941,6 +20601,32 @@ index 0000000..631f6c0 |
8003 |
+ virtual bool isMov(unsigned Opcode) const; |
8004 |
+ |
8005 |
+ virtual bool isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const; |
8006 |
++ |
8007 |
++ virtual int getIndirectIndexBegin(const MachineFunction &MF) const; |
8008 |
++ |
8009 |
++ virtual int getIndirectIndexEnd(const MachineFunction &MF) const; |
8010 |
++ |
8011 |
++ virtual unsigned calculateIndirectAddress(unsigned RegIndex, |
8012 |
++ unsigned Channel) const; |
8013 |
++ |
8014 |
++ virtual const TargetRegisterClass *getIndirectAddrStoreRegClass( |
8015 |
++ unsigned SourceReg) const; |
8016 |
++ |
8017 |
++ virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const; |
8018 |
++ |
8019 |
++ virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB, |
8020 |
++ MachineBasicBlock::iterator I, |
8021 |
++ unsigned ValueReg, |
8022 |
++ unsigned Address, |
8023 |
++ unsigned OffsetReg) const; |
8024 |
++ |
8025 |
++ virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB, |
8026 |
++ MachineBasicBlock::iterator I, |
8027 |
++ unsigned ValueReg, |
8028 |
++ unsigned Address, |
8029 |
++ unsigned OffsetReg) const; |
8030 |
++ |
8031 |
++ virtual const TargetRegisterClass *getSuperIndirectRegClass() const; |
8032 |
+ }; |
8033 |
+ |
8034 |
+} // End namespace llvm |
8035 |
@@ -18948,17 +20634,19 @@ index 0000000..631f6c0 |
8036 |
+namespace SIInstrFlags { |
8037 |
+ enum Flags { |
8038 |
+ // First 4 bits are the instruction encoding |
8039 |
-+ NEED_WAIT = 1 << 4 |
8040 |
++ VM_CNT = 1 << 0, |
8041 |
++ EXP_CNT = 1 << 1, |
8042 |
++ LGKM_CNT = 1 << 2 |
8043 |
+ }; |
8044 |
+} |
8045 |
+ |
8046 |
+#endif //SIINSTRINFO_H |
8047 |
diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td |
8048 |
new file mode 100644 |
8049 |
-index 0000000..873a451 |
8050 |
+index 0000000..8c4e5af |
8051 |
--- /dev/null |
8052 |
+++ b/lib/Target/R600/SIInstrInfo.td |
8053 |
-@@ -0,0 +1,589 @@ |
8054 |
+@@ -0,0 +1,465 @@ |
8055 |
+//===-- SIInstrInfo.td - SI Instruction Encodings ---------*- tablegen -*--===// |
8056 |
+// |
8057 |
+// The LLVM Compiler Infrastructure |
8058 |
@@ -18969,57 +20657,66 @@ index 0000000..873a451 |
8059 |
+//===----------------------------------------------------------------------===// |
8060 |
+ |
8061 |
+//===----------------------------------------------------------------------===// |
8062 |
-+// SI DAG Profiles |
8063 |
-+//===----------------------------------------------------------------------===// |
8064 |
-+def SDTVCCBinaryOp : SDTypeProfile<1, 2, [ |
8065 |
-+ SDTCisInt<0>, SDTCisInt<1>, SDTCisSameAs<1, 2> |
8066 |
-+]>; |
8067 |
-+ |
8068 |
-+//===----------------------------------------------------------------------===// |
8069 |
+// SI DAG Nodes |
8070 |
+//===----------------------------------------------------------------------===// |
8071 |
+ |
8072 |
-+// and operation on 64-bit wide vcc |
8073 |
-+def SIsreg1_and : SDNode<"SIISD::VCC_AND", SDTVCCBinaryOp, |
8074 |
-+ [SDNPCommutative, SDNPAssociative] |
8075 |
++// SMRD takes a 64bit memory address and can only add an 32bit offset |
8076 |
++def SIadd64bit32bit : SDNode<"ISD::ADD", |
8077 |
++ SDTypeProfile<1, 2, [SDTCisSameAs<0, 1>, SDTCisVT<0, i64>, SDTCisVT<2, i32>]> |
8078 |
+>; |
8079 |
+ |
8080 |
-+// Special bitcast node for sharing VCC register between VALU and SALU |
8081 |
-+def SIsreg1_bitcast : SDNode<"SIISD::VCC_BITCAST", |
8082 |
-+ SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisInt<1>]> |
8083 |
-+>; |
8084 |
++// Transformation function, extract the lower 32bit of a 64bit immediate |
8085 |
++def LO32 : SDNodeXForm<imm, [{ |
8086 |
++ return CurDAG->getTargetConstant(N->getZExtValue() & 0xffffffff, MVT::i32); |
8087 |
++}]>; |
8088 |
+ |
8089 |
-+// and operation on 64-bit wide vcc |
8090 |
-+def SIvcc_and : SDNode<"SIISD::VCC_AND", SDTVCCBinaryOp, |
8091 |
-+ [SDNPCommutative, SDNPAssociative] |
8092 |
++// Transformation function, extract the upper 32bit of a 64bit immediate |
8093 |
++def HI32 : SDNodeXForm<imm, [{ |
8094 |
++ return CurDAG->getTargetConstant(N->getZExtValue() >> 32, MVT::i32); |
8095 |
++}]>; |
8096 |
++ |
8097 |
++def IMM8bitDWORD : ImmLeaf < |
8098 |
++ i32, [{ |
8099 |
++ return (Imm & ~0x3FC) == 0; |
8100 |
++ }], SDNodeXForm<imm, [{ |
8101 |
++ return CurDAG->getTargetConstant( |
8102 |
++ N->getZExtValue() >> 2, MVT::i32); |
8103 |
++ }]> |
8104 |
+>; |
8105 |
+ |
8106 |
-+// Special bitcast node for sharing VCC register between VALU and SALU |
8107 |
-+def SIvcc_bitcast : SDNode<"SIISD::VCC_BITCAST", |
8108 |
-+ SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisInt<1>]> |
8109 |
++def IMM12bit : ImmLeaf < |
8110 |
++ i16, |
8111 |
++ [{return isUInt<12>(Imm);}] |
8112 |
+>; |
8113 |
+ |
8114 |
++class InlineImm <ValueType vt> : ImmLeaf <vt, [{ |
8115 |
++ return -16 <= Imm && Imm <= 64; |
8116 |
++}]>; |
8117 |
++ |
8118 |
+class InstSI <dag outs, dag ins, string asm, list<dag> pattern> : |
8119 |
+ AMDGPUInst<outs, ins, asm, pattern> { |
8120 |
+ |
8121 |
-+ field bits<4> EncodingType = 0; |
8122 |
-+ field bits<1> NeedWait = 0; |
8123 |
-+ |
8124 |
-+ let TSFlags{3-0} = EncodingType; |
8125 |
-+ let TSFlags{4} = NeedWait; |
8126 |
++ field bits<1> VM_CNT = 0; |
8127 |
++ field bits<1> EXP_CNT = 0; |
8128 |
++ field bits<1> LGKM_CNT = 0; |
8129 |
+ |
8130 |
++ let TSFlags{0} = VM_CNT; |
8131 |
++ let TSFlags{1} = EXP_CNT; |
8132 |
++ let TSFlags{2} = LGKM_CNT; |
8133 |
+} |
8134 |
+ |
8135 |
+class Enc32 <dag outs, dag ins, string asm, list<dag> pattern> : |
8136 |
+ InstSI <outs, ins, asm, pattern> { |
8137 |
+ |
8138 |
+ field bits<32> Inst; |
8139 |
++ let Size = 4; |
8140 |
+} |
8141 |
+ |
8142 |
+class Enc64 <dag outs, dag ins, string asm, list<dag> pattern> : |
8143 |
+ InstSI <outs, ins, asm, pattern> { |
8144 |
+ |
8145 |
+ field bits<64> Inst; |
8146 |
++ let Size = 8; |
8147 |
+} |
8148 |
+ |
8149 |
+class SIOperand <ValueType vt, dag opInfo>: Operand <vt> { |
8150 |
@@ -19027,49 +20724,16 @@ index 0000000..873a451 |
8151 |
+ let MIOperandInfo = opInfo; |
8152 |
+} |
8153 |
+ |
8154 |
-+def IMM16bit : ImmLeaf < |
8155 |
-+ i16, |
8156 |
-+ [{return isInt<16>(Imm);}] |
8157 |
-+>; |
8158 |
-+ |
8159 |
-+def IMM8bit : ImmLeaf < |
8160 |
-+ i32, |
8161 |
-+ [{return (int32_t)Imm >= 0 && (int32_t)Imm <= 0xff;}] |
8162 |
-+>; |
8163 |
-+ |
8164 |
-+def IMM12bit : ImmLeaf < |
8165 |
-+ i16, |
8166 |
-+ [{return (int16_t)Imm >= 0 && (int16_t)Imm <= 0xfff;}] |
8167 |
-+>; |
8168 |
-+ |
8169 |
-+def IMM32bitIn64bit : ImmLeaf < |
8170 |
-+ i64, |
8171 |
-+ [{return isInt<32>(Imm);}] |
8172 |
-+>; |
8173 |
-+ |
8174 |
+class GPR4Align <RegisterClass rc> : Operand <vAny> { |
8175 |
+ let EncoderMethod = "GPR4AlignEncode"; |
8176 |
+ let MIOperandInfo = (ops rc:$reg); |
8177 |
+} |
8178 |
+ |
8179 |
-+class GPR2Align <RegisterClass rc, ValueType vt> : Operand <vt> { |
8180 |
++class GPR2Align <RegisterClass rc> : Operand <iPTR> { |
8181 |
+ let EncoderMethod = "GPR2AlignEncode"; |
8182 |
+ let MIOperandInfo = (ops rc:$reg); |
8183 |
+} |
8184 |
+ |
8185 |
-+def SMRDmemrr : Operand<iPTR> { |
8186 |
-+ let MIOperandInfo = (ops SReg_64, SReg_32); |
8187 |
-+ let EncoderMethod = "GPR2AlignEncode"; |
8188 |
-+} |
8189 |
-+ |
8190 |
-+def SMRDmemri : Operand<iPTR> { |
8191 |
-+ let MIOperandInfo = (ops SReg_64, i32imm); |
8192 |
-+ let EncoderMethod = "SMRDmemriEncode"; |
8193 |
-+} |
8194 |
-+ |
8195 |
-+def ADDR_Reg : ComplexPattern<i64, 2, "SelectADDRReg", [], []>; |
8196 |
-+def ADDR_Offset8 : ComplexPattern<i64, 2, "SelectADDR8BitOffset", [], []>; |
8197 |
-+ |
8198 |
+let Uses = [EXEC] in { |
8199 |
+ |
8200 |
+def EXP : Enc64< |
8201 |
@@ -19099,10 +20763,8 @@ index 0000000..873a451 |
8202 |
+ let Inst{47-40} = VSRC1; |
8203 |
+ let Inst{55-48} = VSRC2; |
8204 |
+ let Inst{63-56} = VSRC3; |
8205 |
-+ let EncodingType = 0; //SIInstrEncodingType::EXP |
8206 |
+ |
8207 |
-+ let NeedWait = 1; |
8208 |
-+ let usesCustomInserter = 1; |
8209 |
++ let EXP_CNT = 1; |
8210 |
+} |
8211 |
+ |
8212 |
+class MIMG <bits<7> op, dag outs, dag ins, string asm, list<dag> pattern> : |
8213 |
@@ -19136,10 +20798,8 @@ index 0000000..873a451 |
8214 |
+ let Inst{52-48} = SRSRC; |
8215 |
+ let Inst{57-53} = SSAMP; |
8216 |
+ |
8217 |
-+ let EncodingType = 2; //SIInstrEncodingType::MIMG |
8218 |
-+ |
8219 |
-+ let NeedWait = 1; |
8220 |
-+ let usesCustomInserter = 1; |
8221 |
++ let VM_CNT = 1; |
8222 |
++ let EXP_CNT = 1; |
8223 |
+} |
8224 |
+ |
8225 |
+class MTBUF <bits<3> op, dag outs, dag ins, string asm, list<dag> pattern> : |
8226 |
@@ -19174,10 +20834,10 @@ index 0000000..873a451 |
8227 |
+ let Inst{54} = SLC; |
8228 |
+ let Inst{55} = TFE; |
8229 |
+ let Inst{63-56} = SOFFSET; |
8230 |
-+ let EncodingType = 3; //SIInstrEncodingType::MTBUF |
8231 |
+ |
8232 |
-+ let NeedWait = 1; |
8233 |
-+ let usesCustomInserter = 1; |
8234 |
++ let VM_CNT = 1; |
8235 |
++ let EXP_CNT = 1; |
8236 |
++ |
8237 |
+ let neverHasSideEffects = 1; |
8238 |
+} |
8239 |
+ |
8240 |
@@ -19211,34 +20871,30 @@ index 0000000..873a451 |
8241 |
+ let Inst{54} = SLC; |
8242 |
+ let Inst{55} = TFE; |
8243 |
+ let Inst{63-56} = SOFFSET; |
8244 |
-+ let EncodingType = 4; //SIInstrEncodingType::MUBUF |
8245 |
+ |
8246 |
-+ let NeedWait = 1; |
8247 |
-+ let usesCustomInserter = 1; |
8248 |
++ let VM_CNT = 1; |
8249 |
++ let EXP_CNT = 1; |
8250 |
++ |
8251 |
+ let neverHasSideEffects = 1; |
8252 |
+} |
8253 |
+ |
8254 |
+} // End Uses = [EXEC] |
8255 |
+ |
8256 |
-+class SMRD <bits<5> op, dag outs, dag ins, string asm, list<dag> pattern> : |
8257 |
-+ Enc32<outs, ins, asm, pattern> { |
8258 |
++class SMRD <bits<5> op, bits<1> imm, dag outs, dag ins, string asm, |
8259 |
++ list<dag> pattern> : Enc32<outs, ins, asm, pattern> { |
8260 |
+ |
8261 |
+ bits<7> SDST; |
8262 |
-+ bits<15> PTR; |
8263 |
-+ bits<8> OFFSET = PTR{7-0}; |
8264 |
-+ bits<1> IMM = PTR{8}; |
8265 |
-+ bits<6> SBASE = PTR{14-9}; |
8266 |
++ bits<6> SBASE; |
8267 |
++ bits<8> OFFSET; |
8268 |
+ |
8269 |
+ let Inst{7-0} = OFFSET; |
8270 |
-+ let Inst{8} = IMM; |
8271 |
++ let Inst{8} = imm; |
8272 |
+ let Inst{14-9} = SBASE; |
8273 |
+ let Inst{21-15} = SDST; |
8274 |
+ let Inst{26-22} = op; |
8275 |
+ let Inst{31-27} = 0x18; //encoding |
8276 |
-+ let EncodingType = 5; //SIInstrEncodingType::SMRD |
8277 |
+ |
8278 |
-+ let NeedWait = 1; |
8279 |
-+ let usesCustomInserter = 1; |
8280 |
++ let LGKM_CNT = 1; |
8281 |
+} |
8282 |
+ |
8283 |
+class SOP1 <bits<8> op, dag outs, dag ins, string asm, list<dag> pattern> : |
8284 |
@@ -19251,7 +20907,6 @@ index 0000000..873a451 |
8285 |
+ let Inst{15-8} = op; |
8286 |
+ let Inst{22-16} = SDST; |
8287 |
+ let Inst{31-23} = 0x17d; //encoding; |
8288 |
-+ let EncodingType = 6; //SIInstrEncodingType::SOP1 |
8289 |
+ |
8290 |
+ let mayLoad = 0; |
8291 |
+ let mayStore = 0; |
8292 |
@@ -19270,7 +20925,6 @@ index 0000000..873a451 |
8293 |
+ let Inst{22-16} = SDST; |
8294 |
+ let Inst{29-23} = op; |
8295 |
+ let Inst{31-30} = 0x2; // encoding |
8296 |
-+ let EncodingType = 7; // SIInstrEncodingType::SOP2 |
8297 |
+ |
8298 |
+ let mayLoad = 0; |
8299 |
+ let mayStore = 0; |
8300 |
@@ -19287,7 +20941,6 @@ index 0000000..873a451 |
8301 |
+ let Inst{15-8} = SSRC1; |
8302 |
+ let Inst{22-16} = op; |
8303 |
+ let Inst{31-23} = 0x17e; |
8304 |
-+ let EncodingType = 8; // SIInstrEncodingType::SOPC |
8305 |
+ |
8306 |
+ let DisableEncoding = "$dst"; |
8307 |
+ let mayLoad = 0; |
8308 |
@@ -19305,7 +20958,6 @@ index 0000000..873a451 |
8309 |
+ let Inst{22-16} = SDST; |
8310 |
+ let Inst{27-23} = op; |
8311 |
+ let Inst{31-28} = 0xb; //encoding |
8312 |
-+ let EncodingType = 9; // SIInstrEncodingType::SOPK |
8313 |
+ |
8314 |
+ let mayLoad = 0; |
8315 |
+ let mayStore = 0; |
8316 |
@@ -19323,7 +20975,6 @@ index 0000000..873a451 |
8317 |
+ let Inst{15-0} = SIMM16; |
8318 |
+ let Inst{22-16} = op; |
8319 |
+ let Inst{31-23} = 0x17f; // encoding |
8320 |
-+ let EncodingType = 10; // SIInstrEncodingType::SOPP |
8321 |
+ |
8322 |
+ let mayLoad = 0; |
8323 |
+ let mayStore = 0; |
8324 |
@@ -19346,7 +20997,6 @@ index 0000000..873a451 |
8325 |
+ let Inst{17-16} = op; |
8326 |
+ let Inst{25-18} = VDST; |
8327 |
+ let Inst{31-26} = 0x32; // encoding |
8328 |
-+ let EncodingType = 11; // SIInstrEncodingType::VINTRP |
8329 |
+ |
8330 |
+ let neverHasSideEffects = 1; |
8331 |
+ let mayLoad = 1; |
8332 |
@@ -19364,9 +21014,6 @@ index 0000000..873a451 |
8333 |
+ let Inst{24-17} = VDST; |
8334 |
+ let Inst{31-25} = 0x3f; //encoding |
8335 |
+ |
8336 |
-+ let EncodingType = 12; // SIInstrEncodingType::VOP1 |
8337 |
-+ let PostEncoderMethod = "VOPPostEncode"; |
8338 |
-+ |
8339 |
+ let mayLoad = 0; |
8340 |
+ let mayStore = 0; |
8341 |
+ let hasSideEffects = 0; |
8342 |
@@ -19385,9 +21032,6 @@ index 0000000..873a451 |
8343 |
+ let Inst{30-25} = op; |
8344 |
+ let Inst{31} = 0x0; //encoding |
8345 |
+ |
8346 |
-+ let EncodingType = 13; // SIInstrEncodingType::VOP2 |
8347 |
-+ let PostEncoderMethod = "VOPPostEncode"; |
8348 |
-+ |
8349 |
+ let mayLoad = 0; |
8350 |
+ let mayStore = 0; |
8351 |
+ let hasSideEffects = 0; |
8352 |
@@ -19416,9 +21060,6 @@ index 0000000..873a451 |
8353 |
+ let Inst{60-59} = OMOD; |
8354 |
+ let Inst{63-61} = NEG; |
8355 |
+ |
8356 |
-+ let EncodingType = 14; // SIInstrEncodingType::VOP3 |
8357 |
-+ let PostEncoderMethod = "VOPPostEncode"; |
8358 |
-+ |
8359 |
+ let mayLoad = 0; |
8360 |
+ let mayStore = 0; |
8361 |
+ let hasSideEffects = 0; |
8362 |
@@ -19433,127 +21074,50 @@ index 0000000..873a451 |
8363 |
+ bits<9> SRC2; |
8364 |
+ bits<7> SDST; |
8365 |
+ bits<2> OMOD; |
8366 |
-+ bits<3> NEG; |
8367 |
-+ |
8368 |
-+ let Inst{7-0} = VDST; |
8369 |
-+ let Inst{14-8} = SDST; |
8370 |
-+ let Inst{25-17} = op; |
8371 |
-+ let Inst{31-26} = 0x34; //encoding |
8372 |
-+ let Inst{40-32} = SRC0; |
8373 |
-+ let Inst{49-41} = SRC1; |
8374 |
-+ let Inst{58-50} = SRC2; |
8375 |
-+ let Inst{60-59} = OMOD; |
8376 |
-+ let Inst{63-61} = NEG; |
8377 |
-+ |
8378 |
-+ let EncodingType = 14; // SIInstrEncodingType::VOP3 |
8379 |
-+ let PostEncoderMethod = "VOPPostEncode"; |
8380 |
-+ |
8381 |
-+ let mayLoad = 0; |
8382 |
-+ let mayStore = 0; |
8383 |
-+ let hasSideEffects = 0; |
8384 |
-+} |
8385 |
-+ |
8386 |
-+class VOPC <bits<8> op, dag ins, string asm, list<dag> pattern> : |
8387 |
-+ Enc32 <(outs VCCReg:$dst), ins, asm, pattern> { |
8388 |
-+ |
8389 |
-+ bits<9> SRC0; |
8390 |
-+ bits<8> VSRC1; |
8391 |
-+ |
8392 |
-+ let Inst{8-0} = SRC0; |
8393 |
-+ let Inst{16-9} = VSRC1; |
8394 |
-+ let Inst{24-17} = op; |
8395 |
-+ let Inst{31-25} = 0x3e; |
8396 |
-+ |
8397 |
-+ let EncodingType = 15; //SIInstrEncodingType::VOPC |
8398 |
-+ let PostEncoderMethod = "VOPPostEncode"; |
8399 |
-+ let DisableEncoding = "$dst"; |
8400 |
-+ let mayLoad = 0; |
8401 |
-+ let mayStore = 0; |
8402 |
-+ let hasSideEffects = 0; |
8403 |
-+} |
8404 |
-+ |
8405 |
-+} // End Uses = [EXEC] |
8406 |
-+ |
8407 |
-+class MIMG_Load_Helper <bits<7> op, string asm> : MIMG < |
8408 |
-+ op, |
8409 |
-+ (outs VReg_128:$vdata), |
8410 |
-+ (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128, |
8411 |
-+ i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_128:$vaddr, |
8412 |
-+ GPR4Align<SReg_256>:$srsrc, GPR4Align<SReg_128>:$ssamp), |
8413 |
-+ asm, |
8414 |
-+ []> { |
8415 |
-+ let mayLoad = 1; |
8416 |
-+ let mayStore = 0; |
8417 |
-+} |
8418 |
-+ |
8419 |
-+class MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass> : MUBUF < |
8420 |
-+ op, |
8421 |
-+ (outs regClass:$dst), |
8422 |
-+ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64, |
8423 |
-+ i1imm:$lds, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc, i1imm:$slc, |
8424 |
-+ i1imm:$tfe, SReg_32:$soffset), |
8425 |
-+ asm, |
8426 |
-+ []> { |
8427 |
-+ let mayLoad = 1; |
8428 |
-+ let mayStore = 0; |
8429 |
-+} |
8430 |
++ bits<3> NEG; |
8431 |
+ |
8432 |
-+class MTBUF_Load_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF < |
8433 |
-+ op, |
8434 |
-+ (outs regClass:$dst), |
8435 |
-+ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64, |
8436 |
-+ i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc, |
8437 |
-+ i1imm:$slc, i1imm:$tfe, SReg_32:$soffset), |
8438 |
-+ asm, |
8439 |
-+ []> { |
8440 |
-+ let mayLoad = 1; |
8441 |
-+ let mayStore = 0; |
8442 |
-+} |
8443 |
++ let Inst{7-0} = VDST; |
8444 |
++ let Inst{14-8} = SDST; |
8445 |
++ let Inst{25-17} = op; |
8446 |
++ let Inst{31-26} = 0x34; //encoding |
8447 |
++ let Inst{40-32} = SRC0; |
8448 |
++ let Inst{49-41} = SRC1; |
8449 |
++ let Inst{58-50} = SRC2; |
8450 |
++ let Inst{60-59} = OMOD; |
8451 |
++ let Inst{63-61} = NEG; |
8452 |
+ |
8453 |
-+class MTBUF_Store_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF < |
8454 |
-+ op, |
8455 |
-+ (outs), |
8456 |
-+ (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, |
8457 |
-+ i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, |
8458 |
-+ GPR4Align<SReg_128>:$srsrc, i1imm:$slc, i1imm:$tfe, SReg_32:$soffset), |
8459 |
-+ asm, |
8460 |
-+ []> { |
8461 |
-+ let mayStore = 1; |
8462 |
+ let mayLoad = 0; |
8463 |
++ let mayStore = 0; |
8464 |
++ let hasSideEffects = 0; |
8465 |
+} |
8466 |
+ |
8467 |
-+multiclass SMRD_Helper <bits<5> op, string asm, RegisterClass dstClass, |
8468 |
-+ ValueType vt> { |
8469 |
-+ def _IMM : SMRD < |
8470 |
-+ op, |
8471 |
-+ (outs dstClass:$dst), |
8472 |
-+ (ins SMRDmemri:$src0), |
8473 |
-+ asm, |
8474 |
-+ [(set (vt dstClass:$dst), (constant_load ADDR_Offset8:$src0))] |
8475 |
-+ >; |
8476 |
++class VOPC <bits<8> op, dag ins, string asm, list<dag> pattern> : |
8477 |
++ Enc32 <(outs VCCReg:$dst), ins, asm, pattern> { |
8478 |
+ |
8479 |
-+ def _SGPR : SMRD < |
8480 |
-+ op, |
8481 |
-+ (outs dstClass:$dst), |
8482 |
-+ (ins SMRDmemrr:$src0), |
8483 |
-+ asm, |
8484 |
-+ [(set (vt dstClass:$dst), (constant_load ADDR_Reg:$src0))] |
8485 |
-+ >; |
8486 |
-+} |
8487 |
++ bits<9> SRC0; |
8488 |
++ bits<8> VSRC1; |
8489 |
+ |
8490 |
-+multiclass SMRD_32 <bits<5> op, string asm, RegisterClass dstClass> { |
8491 |
-+ defm _F32 : SMRD_Helper <op, asm, dstClass, f32>; |
8492 |
-+ defm _I32 : SMRD_Helper <op, asm, dstClass, i32>; |
8493 |
++ let Inst{8-0} = SRC0; |
8494 |
++ let Inst{16-9} = VSRC1; |
8495 |
++ let Inst{24-17} = op; |
8496 |
++ let Inst{31-25} = 0x3e; |
8497 |
++ |
8498 |
++ let DisableEncoding = "$dst"; |
8499 |
++ let mayLoad = 0; |
8500 |
++ let mayStore = 0; |
8501 |
++ let hasSideEffects = 0; |
8502 |
+} |
8503 |
+ |
8504 |
++} // End Uses = [EXEC] |
8505 |
++ |
8506 |
+include "SIInstrFormats.td" |
8507 |
+include "SIInstructions.td" |
8508 |
diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td |
8509 |
new file mode 100644 |
8510 |
-index 0000000..005be96 |
8511 |
+index 0000000..3a9822a |
8512 |
--- /dev/null |
8513 |
+++ b/lib/Target/R600/SIInstructions.td |
8514 |
-@@ -0,0 +1,1351 @@ |
8515 |
+@@ -0,0 +1,1462 @@ |
8516 |
+//===-- SIInstructions.td - SI Instruction Defintions ---------------------===// |
8517 |
+// |
8518 |
+// The LLVM Compiler Infrastructure |
8519 |
@@ -19567,6 +21131,17 @@ index 0000000..005be96 |
8520 |
+// that are not yet supported remain commented out. |
8521 |
+//===----------------------------------------------------------------------===// |
8522 |
+ |
8523 |
++class InterpSlots { |
8524 |
++int P0 = 2; |
8525 |
++int P10 = 0; |
8526 |
++int P20 = 1; |
8527 |
++} |
8528 |
++def INTERP : InterpSlots; |
8529 |
++ |
8530 |
++def InterpSlot : Operand<i32> { |
8531 |
++ let PrintMethod = "printInterpSlot"; |
8532 |
++} |
8533 |
++ |
8534 |
+def isSI : Predicate<"Subtarget.device()" |
8535 |
+ "->getGeneration() == AMDGPUDeviceInfo::HD7XXX">; |
8536 |
+ |
8537 |
@@ -19675,33 +21250,33 @@ index 0000000..005be96 |
8538 |
+defm V_CMP_F_F32 : VOPC_32 <0x00000000, "V_CMP_F_F32", []>; |
8539 |
+defm V_CMP_LT_F32 : VOPC_32 <0x00000001, "V_CMP_LT_F32", []>; |
8540 |
+def : Pat < |
8541 |
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_LT)), |
8542 |
-+ (V_CMP_LT_F32_e64 AllReg_32:$src0, VReg_32:$src1) |
8543 |
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LT)), |
8544 |
++ (V_CMP_LT_F32_e64 VSrc_32:$src0, VReg_32:$src1) |
8545 |
+>; |
8546 |
+defm V_CMP_EQ_F32 : VOPC_32 <0x00000002, "V_CMP_EQ_F32", []>; |
8547 |
+def : Pat < |
8548 |
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_EQ)), |
8549 |
-+ (V_CMP_EQ_F32_e64 AllReg_32:$src0, VReg_32:$src1) |
8550 |
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_EQ)), |
8551 |
++ (V_CMP_EQ_F32_e64 VSrc_32:$src0, VReg_32:$src1) |
8552 |
+>; |
8553 |
+defm V_CMP_LE_F32 : VOPC_32 <0x00000003, "V_CMP_LE_F32", []>; |
8554 |
+def : Pat < |
8555 |
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_LE)), |
8556 |
-+ (V_CMP_LE_F32_e64 AllReg_32:$src0, VReg_32:$src1) |
8557 |
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LE)), |
8558 |
++ (V_CMP_LE_F32_e64 VSrc_32:$src0, VReg_32:$src1) |
8559 |
+>; |
8560 |
+defm V_CMP_GT_F32 : VOPC_32 <0x00000004, "V_CMP_GT_F32", []>; |
8561 |
+def : Pat < |
8562 |
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_GT)), |
8563 |
-+ (V_CMP_GT_F32_e64 AllReg_32:$src0, VReg_32:$src1) |
8564 |
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GT)), |
8565 |
++ (V_CMP_GT_F32_e64 VSrc_32:$src0, VReg_32:$src1) |
8566 |
+>; |
8567 |
+defm V_CMP_LG_F32 : VOPC_32 <0x00000005, "V_CMP_LG_F32", []>; |
8568 |
+def : Pat < |
8569 |
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_NE)), |
8570 |
-+ (V_CMP_LG_F32_e64 AllReg_32:$src0, VReg_32:$src1) |
8571 |
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)), |
8572 |
++ (V_CMP_LG_F32_e64 VSrc_32:$src0, VReg_32:$src1) |
8573 |
+>; |
8574 |
+defm V_CMP_GE_F32 : VOPC_32 <0x00000006, "V_CMP_GE_F32", []>; |
8575 |
+def : Pat < |
8576 |
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_GE)), |
8577 |
-+ (V_CMP_GE_F32_e64 AllReg_32:$src0, VReg_32:$src1) |
8578 |
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GE)), |
8579 |
++ (V_CMP_GE_F32_e64 VSrc_32:$src0, VReg_32:$src1) |
8580 |
+>; |
8581 |
+defm V_CMP_O_F32 : VOPC_32 <0x00000007, "V_CMP_O_F32", []>; |
8582 |
+defm V_CMP_U_F32 : VOPC_32 <0x00000008, "V_CMP_U_F32", []>; |
8583 |
@@ -19711,8 +21286,8 @@ index 0000000..005be96 |
8584 |
+defm V_CMP_NLE_F32 : VOPC_32 <0x0000000c, "V_CMP_NLE_F32", []>; |
8585 |
+defm V_CMP_NEQ_F32 : VOPC_32 <0x0000000d, "V_CMP_NEQ_F32", []>; |
8586 |
+def : Pat < |
8587 |
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_NE)), |
8588 |
-+ (V_CMP_NEQ_F32_e64 AllReg_32:$src0, VReg_32:$src1) |
8589 |
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)), |
8590 |
++ (V_CMP_NEQ_F32_e64 VSrc_32:$src0, VReg_32:$src1) |
8591 |
+>; |
8592 |
+defm V_CMP_NLT_F32 : VOPC_32 <0x0000000e, "V_CMP_NLT_F32", []>; |
8593 |
+defm V_CMP_TRU_F32 : VOPC_32 <0x0000000f, "V_CMP_TRU_F32", []>; |
8594 |
@@ -19845,33 +21420,33 @@ index 0000000..005be96 |
8595 |
+defm V_CMP_F_I32 : VOPC_32 <0x00000080, "V_CMP_F_I32", []>; |
8596 |
+defm V_CMP_LT_I32 : VOPC_32 <0x00000081, "V_CMP_LT_I32", []>; |
8597 |
+def : Pat < |
8598 |
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_LT)), |
8599 |
-+ (V_CMP_LT_I32_e64 AllReg_32:$src0, VReg_32:$src1) |
8600 |
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_LT)), |
8601 |
++ (V_CMP_LT_I32_e64 VSrc_32:$src0, VReg_32:$src1) |
8602 |
+>; |
8603 |
+defm V_CMP_EQ_I32 : VOPC_32 <0x00000082, "V_CMP_EQ_I32", []>; |
8604 |
+def : Pat < |
8605 |
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_EQ)), |
8606 |
-+ (V_CMP_EQ_I32_e64 AllReg_32:$src0, VReg_32:$src1) |
8607 |
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_EQ)), |
8608 |
++ (V_CMP_EQ_I32_e64 VSrc_32:$src0, VReg_32:$src1) |
8609 |
+>; |
8610 |
+defm V_CMP_LE_I32 : VOPC_32 <0x00000083, "V_CMP_LE_I32", []>; |
8611 |
+def : Pat < |
8612 |
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_LE)), |
8613 |
-+ (V_CMP_LE_I32_e64 AllReg_32:$src0, VReg_32:$src1) |
8614 |
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_LE)), |
8615 |
++ (V_CMP_LE_I32_e64 VSrc_32:$src0, VReg_32:$src1) |
8616 |
+>; |
8617 |
+defm V_CMP_GT_I32 : VOPC_32 <0x00000084, "V_CMP_GT_I32", []>; |
8618 |
+def : Pat < |
8619 |
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_GT)), |
8620 |
-+ (V_CMP_GT_I32_e64 AllReg_32:$src0, VReg_32:$src1) |
8621 |
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_GT)), |
8622 |
++ (V_CMP_GT_I32_e64 VSrc_32:$src0, VReg_32:$src1) |
8623 |
+>; |
8624 |
+defm V_CMP_NE_I32 : VOPC_32 <0x00000085, "V_CMP_NE_I32", []>; |
8625 |
+def : Pat < |
8626 |
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_NE)), |
8627 |
-+ (V_CMP_NE_I32_e64 AllReg_32:$src0, VReg_32:$src1) |
8628 |
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_NE)), |
8629 |
++ (V_CMP_NE_I32_e64 VSrc_32:$src0, VReg_32:$src1) |
8630 |
+>; |
8631 |
+defm V_CMP_GE_I32 : VOPC_32 <0x00000086, "V_CMP_GE_I32", []>; |
8632 |
+def : Pat < |
8633 |
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_GE)), |
8634 |
-+ (V_CMP_GE_I32_e64 AllReg_32:$src0, VReg_32:$src1) |
8635 |
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_GE)), |
8636 |
++ (V_CMP_GE_I32_e64 VSrc_32:$src0, VReg_32:$src1) |
8637 |
+>; |
8638 |
+defm V_CMP_T_I32 : VOPC_32 <0x00000087, "V_CMP_T_I32", []>; |
8639 |
+ |
8640 |
@@ -20017,11 +21592,13 @@ index 0000000..005be96 |
8641 |
+//def TBUFFER_STORE_FORMAT_XYZ : MTBUF_ <0x00000006, "TBUFFER_STORE_FORMAT_XYZ", []>; |
8642 |
+//def TBUFFER_STORE_FORMAT_XYZW : MTBUF_ <0x00000007, "TBUFFER_STORE_FORMAT_XYZW", []>; |
8643 |
+ |
8644 |
-+defm S_LOAD_DWORD : SMRD_32 <0x00000000, "S_LOAD_DWORD", SReg_32>; |
8645 |
++let mayLoad = 1 in { |
8646 |
++ |
8647 |
++defm S_LOAD_DWORD : SMRD_Helper <0x00000000, "S_LOAD_DWORD", SReg_32>; |
8648 |
+ |
8649 |
+//def S_LOAD_DWORDX2 : SMRD_DWORDX2 <0x00000001, "S_LOAD_DWORDX2", []>; |
8650 |
-+defm S_LOAD_DWORDX4 : SMRD_Helper <0x00000002, "S_LOAD_DWORDX4", SReg_128, v4i32>; |
8651 |
-+defm S_LOAD_DWORDX8 : SMRD_Helper <0x00000003, "S_LOAD_DWORDX8", SReg_256, v8i32>; |
8652 |
++defm S_LOAD_DWORDX4 : SMRD_Helper <0x00000002, "S_LOAD_DWORDX4", SReg_128>; |
8653 |
++defm S_LOAD_DWORDX8 : SMRD_Helper <0x00000003, "S_LOAD_DWORDX8", SReg_256>; |
8654 |
+//def S_LOAD_DWORDX16 : SMRD_DWORDX16 <0x00000004, "S_LOAD_DWORDX16", []>; |
8655 |
+//def S_BUFFER_LOAD_DWORD : SMRD_ <0x00000008, "S_BUFFER_LOAD_DWORD", []>; |
8656 |
+//def S_BUFFER_LOAD_DWORDX2 : SMRD_DWORDX2 <0x00000009, "S_BUFFER_LOAD_DWORDX2", []>; |
8657 |
@@ -20029,6 +21606,8 @@ index 0000000..005be96 |
8658 |
+//def S_BUFFER_LOAD_DWORDX8 : SMRD_DWORDX8 <0x0000000b, "S_BUFFER_LOAD_DWORDX8", []>; |
8659 |
+//def S_BUFFER_LOAD_DWORDX16 : SMRD_DWORDX16 <0x0000000c, "S_BUFFER_LOAD_DWORDX16", []>; |
8660 |
+ |
8661 |
++} // mayLoad = 1 |
8662 |
++ |
8663 |
+//def S_MEMTIME : SMRD_ <0x0000001e, "S_MEMTIME", []>; |
8664 |
+//def S_DCACHE_INV : SMRD_ <0x0000001f, "S_DCACHE_INV", []>; |
8665 |
+//def IMAGE_LOAD : MIMG_NoPattern_ <"IMAGE_LOAD", 0x00000000>; |
8666 |
@@ -20067,12 +21646,12 @@ index 0000000..005be96 |
8667 |
+def IMAGE_SAMPLE_B : MIMG_Load_Helper <0x00000025, "IMAGE_SAMPLE_B">; |
8668 |
+//def IMAGE_SAMPLE_B_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_B_CL", 0x00000026>; |
8669 |
+//def IMAGE_SAMPLE_LZ : MIMG_NoPattern_ <"IMAGE_SAMPLE_LZ", 0x00000027>; |
8670 |
-+//def IMAGE_SAMPLE_C : MIMG_NoPattern_ <"IMAGE_SAMPLE_C", 0x00000028>; |
8671 |
++def IMAGE_SAMPLE_C : MIMG_Load_Helper <0x00000028, "IMAGE_SAMPLE_C">; |
8672 |
+//def IMAGE_SAMPLE_C_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_CL", 0x00000029>; |
8673 |
+//def IMAGE_SAMPLE_C_D : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_D", 0x0000002a>; |
8674 |
+//def IMAGE_SAMPLE_C_D_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_D_CL", 0x0000002b>; |
8675 |
-+//def IMAGE_SAMPLE_C_L : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_L", 0x0000002c>; |
8676 |
-+//def IMAGE_SAMPLE_C_B : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_B", 0x0000002d>; |
8677 |
++def IMAGE_SAMPLE_C_L : MIMG_Load_Helper <0x0000002c, "IMAGE_SAMPLE_C_L">; |
8678 |
++def IMAGE_SAMPLE_C_B : MIMG_Load_Helper <0x0000002d, "IMAGE_SAMPLE_C_B">; |
8679 |
+//def IMAGE_SAMPLE_C_B_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_B_CL", 0x0000002e>; |
8680 |
+//def IMAGE_SAMPLE_C_LZ : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_LZ", 0x0000002f>; |
8681 |
+//def IMAGE_SAMPLE_O : MIMG_NoPattern_ <"IMAGE_SAMPLE_O", 0x00000030>; |
8682 |
@@ -20135,12 +21714,12 @@ index 0000000..005be96 |
8683 |
+//defm V_CVT_I32_F64 : VOP1_32 <0x00000003, "V_CVT_I32_F64", []>; |
8684 |
+//defm V_CVT_F64_I32 : VOP1_64 <0x00000004, "V_CVT_F64_I32", []>; |
8685 |
+defm V_CVT_F32_I32 : VOP1_32 <0x00000005, "V_CVT_F32_I32", |
8686 |
-+ [(set VReg_32:$dst, (sint_to_fp AllReg_32:$src0))] |
8687 |
++ [(set VReg_32:$dst, (sint_to_fp VSrc_32:$src0))] |
8688 |
+>; |
8689 |
+//defm V_CVT_F32_U32 : VOP1_32 <0x00000006, "V_CVT_F32_U32", []>; |
8690 |
+//defm V_CVT_U32_F32 : VOP1_32 <0x00000007, "V_CVT_U32_F32", []>; |
8691 |
+defm V_CVT_I32_F32 : VOP1_32 <0x00000008, "V_CVT_I32_F32", |
8692 |
-+ [(set VReg_32:$dst, (fp_to_sint AllReg_32:$src0))] |
8693 |
++ [(set (i32 VReg_32:$dst), (fp_to_sint VSrc_32:$src0))] |
8694 |
+>; |
8695 |
+defm V_MOV_FED_B32 : VOP1_32 <0x00000009, "V_MOV_FED_B32", []>; |
8696 |
+////def V_CVT_F16_F32 : VOP1_F16 <0x0000000a, "V_CVT_F16_F32", []>; |
8697 |
@@ -20157,31 +21736,35 @@ index 0000000..005be96 |
8698 |
+//defm V_CVT_U32_F64 : VOP1_32 <0x00000015, "V_CVT_U32_F64", []>; |
8699 |
+//defm V_CVT_F64_U32 : VOP1_64 <0x00000016, "V_CVT_F64_U32", []>; |
8700 |
+defm V_FRACT_F32 : VOP1_32 <0x00000020, "V_FRACT_F32", |
8701 |
-+ [(set VReg_32:$dst, (AMDGPUfract AllReg_32:$src0))] |
8702 |
++ [(set VReg_32:$dst, (AMDGPUfract VSrc_32:$src0))] |
8703 |
+>; |
8704 |
+defm V_TRUNC_F32 : VOP1_32 <0x00000021, "V_TRUNC_F32", []>; |
8705 |
-+defm V_CEIL_F32 : VOP1_32 <0x00000022, "V_CEIL_F32", []>; |
8706 |
++defm V_CEIL_F32 : VOP1_32 <0x00000022, "V_CEIL_F32", |
8707 |
++ [(set VReg_32:$dst, (fceil VSrc_32:$src0))] |
8708 |
++>; |
8709 |
+defm V_RNDNE_F32 : VOP1_32 <0x00000023, "V_RNDNE_F32", |
8710 |
-+ [(set VReg_32:$dst, (frint AllReg_32:$src0))] |
8711 |
++ [(set VReg_32:$dst, (frint VSrc_32:$src0))] |
8712 |
+>; |
8713 |
+defm V_FLOOR_F32 : VOP1_32 <0x00000024, "V_FLOOR_F32", |
8714 |
-+ [(set VReg_32:$dst, (ffloor AllReg_32:$src0))] |
8715 |
++ [(set VReg_32:$dst, (ffloor VSrc_32:$src0))] |
8716 |
+>; |
8717 |
+defm V_EXP_F32 : VOP1_32 <0x00000025, "V_EXP_F32", |
8718 |
-+ [(set VReg_32:$dst, (fexp2 AllReg_32:$src0))] |
8719 |
++ [(set VReg_32:$dst, (fexp2 VSrc_32:$src0))] |
8720 |
+>; |
8721 |
+defm V_LOG_CLAMP_F32 : VOP1_32 <0x00000026, "V_LOG_CLAMP_F32", []>; |
8722 |
-+defm V_LOG_F32 : VOP1_32 <0x00000027, "V_LOG_F32", []>; |
8723 |
++defm V_LOG_F32 : VOP1_32 <0x00000027, "V_LOG_F32", |
8724 |
++ [(set VReg_32:$dst, (flog2 VSrc_32:$src0))] |
8725 |
++>; |
8726 |
+defm V_RCP_CLAMP_F32 : VOP1_32 <0x00000028, "V_RCP_CLAMP_F32", []>; |
8727 |
+defm V_RCP_LEGACY_F32 : VOP1_32 <0x00000029, "V_RCP_LEGACY_F32", []>; |
8728 |
+defm V_RCP_F32 : VOP1_32 <0x0000002a, "V_RCP_F32", |
8729 |
-+ [(set VReg_32:$dst, (fdiv FP_ONE, AllReg_32:$src0))] |
8730 |
++ [(set VReg_32:$dst, (fdiv FP_ONE, VSrc_32:$src0))] |
8731 |
+>; |
8732 |
+defm V_RCP_IFLAG_F32 : VOP1_32 <0x0000002b, "V_RCP_IFLAG_F32", []>; |
8733 |
+defm V_RSQ_CLAMP_F32 : VOP1_32 <0x0000002c, "V_RSQ_CLAMP_F32", []>; |
8734 |
+defm V_RSQ_LEGACY_F32 : VOP1_32 < |
8735 |
+ 0x0000002d, "V_RSQ_LEGACY_F32", |
8736 |
-+ [(set VReg_32:$dst, (int_AMDGPU_rsq AllReg_32:$src0))] |
8737 |
++ [(set VReg_32:$dst, (int_AMDGPU_rsq VSrc_32:$src0))] |
8738 |
+>; |
8739 |
+defm V_RSQ_F32 : VOP1_32 <0x0000002e, "V_RSQ_F32", []>; |
8740 |
+defm V_RCP_F64 : VOP1_64 <0x0000002f, "V_RCP_F64", []>; |
8741 |
@@ -20231,10 +21814,9 @@ index 0000000..005be96 |
8742 |
+def V_INTERP_MOV_F32 : VINTRP < |
8743 |
+ 0x00000002, |
8744 |
+ (outs VReg_32:$dst), |
8745 |
-+ (ins i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0), |
8746 |
-+ "V_INTERP_MOV_F32", |
8747 |
++ (ins InterpSlot:$src0, i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0), |
8748 |
++ "V_INTERP_MOV_F32 $dst, $src0, $attr_chan, $attr", |
8749 |
+ []> { |
8750 |
-+ let VSRC = 0; |
8751 |
+ let DisableEncoding = "$m0"; |
8752 |
+} |
8753 |
+ |
8754 |
@@ -20314,22 +21896,22 @@ index 0000000..005be96 |
8755 |
+//def S_TTRACEDATA : SOPP_ <0x00000016, "S_TTRACEDATA", []>; |
8756 |
+ |
8757 |
+def V_CNDMASK_B32_e32 : VOP2 <0x00000000, (outs VReg_32:$dst), |
8758 |
-+ (ins AllReg_32:$src0, VReg_32:$src1, VCCReg:$vcc), "V_CNDMASK_B32_e32", |
8759 |
++ (ins VSrc_32:$src0, VReg_32:$src1, VCCReg:$vcc), "V_CNDMASK_B32_e32", |
8760 |
+ [] |
8761 |
+>{ |
8762 |
+ let DisableEncoding = "$vcc"; |
8763 |
+} |
8764 |
+ |
8765 |
+def V_CNDMASK_B32_e64 : VOP3 <0x00000100, (outs VReg_32:$dst), |
8766 |
-+ (ins VReg_32:$src0, VReg_32:$src1, SReg_1:$src2, InstFlag:$abs, InstFlag:$clamp, InstFlag:$omod, InstFlag:$neg), |
8767 |
++ (ins VReg_32:$src0, VReg_32:$src1, SReg_64:$src2, InstFlag:$abs, InstFlag:$clamp, InstFlag:$omod, InstFlag:$neg), |
8768 |
+ "V_CNDMASK_B32_e64", |
8769 |
-+ [(set (i32 VReg_32:$dst), (select SReg_1:$src2, VReg_32:$src1, VReg_32:$src0))] |
8770 |
++ [(set (i32 VReg_32:$dst), (select (i1 SReg_64:$src2), VReg_32:$src1, VReg_32:$src0))] |
8771 |
+>; |
8772 |
+ |
8773 |
+//f32 pattern for V_CNDMASK_B32_e64 |
8774 |
+def : Pat < |
8775 |
-+ (f32 (select SReg_1:$src2, VReg_32:$src1, VReg_32:$src0)), |
8776 |
-+ (V_CNDMASK_B32_e64 VReg_32:$src0, VReg_32:$src1, SReg_1:$src2) |
8777 |
++ (f32 (select (i1 SReg_64:$src2), VReg_32:$src1, VReg_32:$src0)), |
8778 |
++ (V_CNDMASK_B32_e64 VReg_32:$src0, VReg_32:$src1, SReg_64:$src2) |
8779 |
+>; |
8780 |
+ |
8781 |
+defm V_READLANE_B32 : VOP2_32 <0x00000001, "V_READLANE_B32", []>; |
8782 |
@@ -20337,35 +21919,35 @@ index 0000000..005be96 |
8783 |
+ |
8784 |
+defm V_ADD_F32 : VOP2_32 <0x00000003, "V_ADD_F32", []>; |
8785 |
+def : Pat < |
8786 |
-+ (f32 (fadd AllReg_32:$src0, VReg_32:$src1)), |
8787 |
-+ (V_ADD_F32_e32 AllReg_32:$src0, VReg_32:$src1) |
8788 |
++ (f32 (fadd VSrc_32:$src0, VReg_32:$src1)), |
8789 |
++ (V_ADD_F32_e32 VSrc_32:$src0, VReg_32:$src1) |
8790 |
+>; |
8791 |
+ |
8792 |
+defm V_SUB_F32 : VOP2_32 <0x00000004, "V_SUB_F32", []>; |
8793 |
+def : Pat < |
8794 |
-+ (f32 (fsub AllReg_32:$src0, VReg_32:$src1)), |
8795 |
-+ (V_SUB_F32_e32 AllReg_32:$src0, VReg_32:$src1) |
8796 |
++ (f32 (fsub VSrc_32:$src0, VReg_32:$src1)), |
8797 |
++ (V_SUB_F32_e32 VSrc_32:$src0, VReg_32:$src1) |
8798 |
+>; |
8799 |
+defm V_SUBREV_F32 : VOP2_32 <0x00000005, "V_SUBREV_F32", []>; |
8800 |
+defm V_MAC_LEGACY_F32 : VOP2_32 <0x00000006, "V_MAC_LEGACY_F32", []>; |
8801 |
+defm V_MUL_LEGACY_F32 : VOP2_32 < |
8802 |
+ 0x00000007, "V_MUL_LEGACY_F32", |
8803 |
-+ [(set VReg_32:$dst, (int_AMDGPU_mul AllReg_32:$src0, VReg_32:$src1))] |
8804 |
++ [(set VReg_32:$dst, (int_AMDGPU_mul VSrc_32:$src0, VReg_32:$src1))] |
8805 |
+>; |
8806 |
+ |
8807 |
+defm V_MUL_F32 : VOP2_32 <0x00000008, "V_MUL_F32", |
8808 |
-+ [(set VReg_32:$dst, (fmul AllReg_32:$src0, VReg_32:$src1))] |
8809 |
++ [(set VReg_32:$dst, (fmul VSrc_32:$src0, VReg_32:$src1))] |
8810 |
+>; |
8811 |
+//defm V_MUL_I32_I24 : VOP2_32 <0x00000009, "V_MUL_I32_I24", []>; |
8812 |
+//defm V_MUL_HI_I32_I24 : VOP2_32 <0x0000000a, "V_MUL_HI_I32_I24", []>; |
8813 |
+//defm V_MUL_U32_U24 : VOP2_32 <0x0000000b, "V_MUL_U32_U24", []>; |
8814 |
+//defm V_MUL_HI_U32_U24 : VOP2_32 <0x0000000c, "V_MUL_HI_U32_U24", []>; |
8815 |
+defm V_MIN_LEGACY_F32 : VOP2_32 <0x0000000d, "V_MIN_LEGACY_F32", |
8816 |
-+ [(set VReg_32:$dst, (AMDGPUfmin AllReg_32:$src0, VReg_32:$src1))] |
8817 |
++ [(set VReg_32:$dst, (AMDGPUfmin VSrc_32:$src0, VReg_32:$src1))] |
8818 |
+>; |
8819 |
+ |
8820 |
+defm V_MAX_LEGACY_F32 : VOP2_32 <0x0000000e, "V_MAX_LEGACY_F32", |
8821 |
-+ [(set VReg_32:$dst, (AMDGPUfmax AllReg_32:$src0, VReg_32:$src1))] |
8822 |
++ [(set VReg_32:$dst, (AMDGPUfmax VSrc_32:$src0, VReg_32:$src1))] |
8823 |
+>; |
8824 |
+defm V_MIN_F32 : VOP2_32 <0x0000000f, "V_MIN_F32", []>; |
8825 |
+defm V_MAX_F32 : VOP2_32 <0x00000010, "V_MAX_F32", []>; |
8826 |
@@ -20380,13 +21962,13 @@ index 0000000..005be96 |
8827 |
+defm V_LSHL_B32 : VOP2_32 <0x00000019, "V_LSHL_B32", []>; |
8828 |
+defm V_LSHLREV_B32 : VOP2_32 <0x0000001a, "V_LSHLREV_B32", []>; |
8829 |
+defm V_AND_B32 : VOP2_32 <0x0000001b, "V_AND_B32", |
8830 |
-+ [(set VReg_32:$dst, (and AllReg_32:$src0, VReg_32:$src1))] |
8831 |
++ [(set VReg_32:$dst, (and VSrc_32:$src0, VReg_32:$src1))] |
8832 |
+>; |
8833 |
+defm V_OR_B32 : VOP2_32 <0x0000001c, "V_OR_B32", |
8834 |
-+ [(set VReg_32:$dst, (or AllReg_32:$src0, VReg_32:$src1))] |
8835 |
++ [(set VReg_32:$dst, (or VSrc_32:$src0, VReg_32:$src1))] |
8836 |
+>; |
8837 |
+defm V_XOR_B32 : VOP2_32 <0x0000001d, "V_XOR_B32", |
8838 |
-+ [(set VReg_32:$dst, (xor AllReg_32:$src0, VReg_32:$src1))] |
8839 |
++ [(set VReg_32:$dst, (xor VSrc_32:$src0, VReg_32:$src1))] |
8840 |
+>; |
8841 |
+defm V_BFM_B32 : VOP2_32 <0x0000001e, "V_BFM_B32", []>; |
8842 |
+defm V_MAC_F32 : VOP2_32 <0x0000001f, "V_MAC_F32", []>; |
8843 |
@@ -20397,10 +21979,10 @@ index 0000000..005be96 |
8844 |
+//defm V_MBCNT_HI_U32_B32 : VOP2_32 <0x00000024, "V_MBCNT_HI_U32_B32", []>; |
8845 |
+let Defs = [VCC] in { // Carry-out goes to VCC |
8846 |
+defm V_ADD_I32 : VOP2_32 <0x00000025, "V_ADD_I32", |
8847 |
-+ [(set VReg_32:$dst, (add (i32 AllReg_32:$src0), (i32 VReg_32:$src1)))] |
8848 |
++ [(set VReg_32:$dst, (add (i32 VSrc_32:$src0), (i32 VReg_32:$src1)))] |
8849 |
+>; |
8850 |
+defm V_SUB_I32 : VOP2_32 <0x00000026, "V_SUB_I32", |
8851 |
-+ [(set VReg_32:$dst, (sub (i32 AllReg_32:$src0), (i32 VReg_32:$src1)))] |
8852 |
++ [(set VReg_32:$dst, (sub (i32 VSrc_32:$src0), (i32 VReg_32:$src1)))] |
8853 |
+>; |
8854 |
+} // End Defs = [VCC] |
8855 |
+defm V_SUBREV_I32 : VOP2_32 <0x00000027, "V_SUBREV_I32", []>; |
8856 |
@@ -20412,7 +21994,7 @@ index 0000000..005be96 |
8857 |
+////def V_CVT_PKNORM_I16_F32 : VOP2_I16 <0x0000002d, "V_CVT_PKNORM_I16_F32", []>; |
8858 |
+////def V_CVT_PKNORM_U16_F32 : VOP2_U16 <0x0000002e, "V_CVT_PKNORM_U16_F32", []>; |
8859 |
+defm V_CVT_PKRTZ_F16_F32 : VOP2_32 <0x0000002f, "V_CVT_PKRTZ_F16_F32", |
8860 |
-+ [(set VReg_32:$dst, (int_SI_packf16 AllReg_32:$src0, VReg_32:$src1))] |
8861 |
++ [(set VReg_32:$dst, (int_SI_packf16 VSrc_32:$src0, VReg_32:$src1))] |
8862 |
+>; |
8863 |
+////def V_CVT_PK_U16_U32 : VOP2_U16 <0x00000030, "V_CVT_PK_U16_U32", []>; |
8864 |
+////def V_CVT_PK_I16_I32 : VOP2_I16 <0x00000031, "V_CVT_PK_I16_I32", []>; |
8865 |
@@ -20482,6 +22064,10 @@ index 0000000..005be96 |
8866 |
+def V_MUL_LO_U32 : VOP3_32 <0x00000169, "V_MUL_LO_U32", []>; |
8867 |
+def V_MUL_HI_U32 : VOP3_32 <0x0000016a, "V_MUL_HI_U32", []>; |
8868 |
+def V_MUL_LO_I32 : VOP3_32 <0x0000016b, "V_MUL_LO_I32", []>; |
8869 |
++def : Pat < |
8870 |
++ (mul VSrc_32:$src0, VReg_32:$src1), |
8871 |
++ (V_MUL_LO_I32 VSrc_32:$src0, VReg_32:$src1, (IMPLICIT_DEF), 0, 0, 0, 0) |
8872 |
++>; |
8873 |
+def V_MUL_HI_I32 : VOP3_32 <0x0000016c, "V_MUL_HI_I32", []>; |
8874 |
+def V_DIV_SCALE_F32 : VOP3_32 <0x0000016d, "V_DIV_SCALE_F32", []>; |
8875 |
+def V_DIV_SCALE_F64 : VOP3_64 <0x0000016e, "V_DIV_SCALE_F64", []>; |
8876 |
@@ -20519,13 +22105,20 @@ index 0000000..005be96 |
8877 |
+def S_AND_B32 : SOP2_32 <0x0000000e, "S_AND_B32", []>; |
8878 |
+ |
8879 |
+def S_AND_B64 : SOP2_64 <0x0000000f, "S_AND_B64", |
8880 |
-+ [(set SReg_64:$dst, (and SReg_64:$src0, SReg_64:$src1))] |
8881 |
++ [(set SReg_64:$dst, (i64 (and SSrc_64:$src0, SSrc_64:$src1)))] |
8882 |
+>; |
8883 |
-+def S_AND_VCC : SOP2_VCC <0x0000000f, "S_AND_B64", |
8884 |
-+ [(set SReg_1:$vcc, (SIvcc_and SReg_64:$src0, SReg_64:$src1))] |
8885 |
++ |
8886 |
++def : Pat < |
8887 |
++ (i1 (and SSrc_64:$src0, SSrc_64:$src1)), |
8888 |
++ (S_AND_B64 SSrc_64:$src0, SSrc_64:$src1) |
8889 |
+>; |
8890 |
++ |
8891 |
+def S_OR_B32 : SOP2_32 <0x00000010, "S_OR_B32", []>; |
8892 |
+def S_OR_B64 : SOP2_64 <0x00000011, "S_OR_B64", []>; |
8893 |
++def : Pat < |
8894 |
++ (i1 (or SSrc_64:$src0, SSrc_64:$src1)), |
8895 |
++ (S_OR_B64 SSrc_64:$src0, SSrc_64:$src1) |
8896 |
++>; |
8897 |
+def S_XOR_B32 : SOP2_32 <0x00000012, "S_XOR_B32", []>; |
8898 |
+def S_XOR_B64 : SOP2_64 <0x00000013, "S_XOR_B64", []>; |
8899 |
+def S_ANDN2_B32 : SOP2_32 <0x00000014, "S_ANDN2_B32", []>; |
8900 |
@@ -20554,48 +22147,6 @@ index 0000000..005be96 |
8901 |
+//def S_CBRANCH_G_FORK : SOP2_ <0x0000002b, "S_CBRANCH_G_FORK", []>; |
8902 |
+def S_ABSDIFF_I32 : SOP2_32 <0x0000002c, "S_ABSDIFF_I32", []>; |
8903 |
+ |
8904 |
-+class V_MOV_IMM <Operand immType, SDNode immNode> : InstSI < |
8905 |
-+ (outs VReg_32:$dst), |
8906 |
-+ (ins immType:$src0), |
8907 |
-+ "V_MOV_IMM", |
8908 |
-+ [(set VReg_32:$dst, (immNode:$src0))] |
8909 |
-+>; |
8910 |
-+ |
8911 |
-+let isCodeGenOnly = 1, isPseudo = 1 in { |
8912 |
-+ |
8913 |
-+def V_MOV_IMM_I32 : V_MOV_IMM<i32imm, imm>; |
8914 |
-+def V_MOV_IMM_F32 : V_MOV_IMM<f32imm, fpimm>; |
8915 |
-+ |
8916 |
-+def S_MOV_IMM_I32 : InstSI < |
8917 |
-+ (outs SReg_32:$dst), |
8918 |
-+ (ins i32imm:$src0), |
8919 |
-+ "S_MOV_IMM_I32", |
8920 |
-+ [(set SReg_32:$dst, (imm:$src0))] |
8921 |
-+>; |
8922 |
-+ |
8923 |
-+// i64 immediates aren't really supported in hardware, but LLVM will use the i64 |
8924 |
-+// type for indices on load and store instructions. The pattern for |
8925 |
-+// S_MOV_IMM_I64 will only match i64 immediates that can fit into 32-bits, |
8926 |
-+// which the hardware can handle. |
8927 |
-+def S_MOV_IMM_I64 : InstSI < |
8928 |
-+ (outs SReg_64:$dst), |
8929 |
-+ (ins i64imm:$src0), |
8930 |
-+ "S_MOV_IMM_I64 $dst, $src0", |
8931 |
-+ [(set SReg_64:$dst, (IMM32bitIn64bit:$src0))] |
8932 |
-+>; |
8933 |
-+ |
8934 |
-+} // End isCodeGenOnly, isPseudo = 1 |
8935 |
-+ |
8936 |
-+class SI_LOAD_LITERAL<Operand ImmType> : |
8937 |
-+ Enc32 <(outs), (ins ImmType:$imm), "LOAD_LITERAL $imm", []> { |
8938 |
-+ |
8939 |
-+ bits<32> imm; |
8940 |
-+ let Inst{31-0} = imm; |
8941 |
-+} |
8942 |
-+ |
8943 |
-+def SI_LOAD_LITERAL_I32 : SI_LOAD_LITERAL<i32imm>; |
8944 |
-+def SI_LOAD_LITERAL_F32 : SI_LOAD_LITERAL<f32imm>; |
8945 |
-+ |
8946 |
+let isCodeGenOnly = 1, isPseudo = 1 in { |
8947 |
+ |
8948 |
+def SET_M0 : InstSI < |
8949 |
@@ -20614,13 +22165,6 @@ index 0000000..005be96 |
8950 |
+ |
8951 |
+let usesCustomInserter = 1 in { |
8952 |
+ |
8953 |
-+def SI_V_CNDLT : InstSI < |
8954 |
-+ (outs VReg_32:$dst), |
8955 |
-+ (ins VReg_32:$src0, VReg_32:$src1, VReg_32:$src2), |
8956 |
-+ "SI_V_CNDLT $dst, $src0, $src1, $src2", |
8957 |
-+ [(set VReg_32:$dst, (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2))] |
8958 |
-+>; |
8959 |
-+ |
8960 |
+def SI_INTERP : InstSI < |
8961 |
+ (outs VReg_32:$dst), |
8962 |
+ (ins VReg_32:$i, VReg_32:$j, i32imm:$attr_chan, i32imm:$attr, SReg_32:$params), |
8963 |
@@ -20628,21 +22172,6 @@ index 0000000..005be96 |
8964 |
+ [] |
8965 |
+>; |
8966 |
+ |
8967 |
-+def SI_INTERP_CONST : InstSI < |
8968 |
-+ (outs VReg_32:$dst), |
8969 |
-+ (ins i32imm:$attr_chan, i32imm:$attr, SReg_32:$params), |
8970 |
-+ "SI_INTERP_CONST $dst, $attr_chan, $attr, $params", |
8971 |
-+ [(set VReg_32:$dst, (int_SI_fs_interp_constant imm:$attr_chan, |
8972 |
-+ imm:$attr, SReg_32:$params))] |
8973 |
-+>; |
8974 |
-+ |
8975 |
-+def SI_KIL : InstSI < |
8976 |
-+ (outs), |
8977 |
-+ (ins VReg_32:$src), |
8978 |
-+ "SI_KIL $src", |
8979 |
-+ [(int_AMDGPU_kill VReg_32:$src)] |
8980 |
-+>; |
8981 |
-+ |
8982 |
+def SI_WQM : InstSI < |
8983 |
+ (outs), |
8984 |
+ (ins), |
8985 |
@@ -20662,9 +22191,9 @@ index 0000000..005be96 |
8986 |
+ |
8987 |
+def SI_IF : InstSI < |
8988 |
+ (outs SReg_64:$dst), |
8989 |
-+ (ins SReg_1:$vcc, brtarget:$target), |
8990 |
++ (ins SReg_64:$vcc, brtarget:$target), |
8991 |
+ "SI_IF", |
8992 |
-+ [(set SReg_64:$dst, (int_SI_if SReg_1:$vcc, bb:$target))] |
8993 |
++ [(set SReg_64:$dst, (int_SI_if SReg_64:$vcc, bb:$target))] |
8994 |
+>; |
8995 |
+ |
8996 |
+def SI_ELSE : InstSI < |
8997 |
@@ -20694,9 +22223,9 @@ index 0000000..005be96 |
8998 |
+ |
8999 |
+def SI_IF_BREAK : InstSI < |
9000 |
+ (outs SReg_64:$dst), |
9001 |
-+ (ins SReg_1:$vcc, SReg_64:$src), |
9002 |
++ (ins SReg_64:$vcc, SReg_64:$src), |
9003 |
+ "SI_IF_BREAK", |
9004 |
-+ [(set SReg_64:$dst, (int_SI_if_break SReg_1:$vcc, SReg_64:$src))] |
9005 |
++ [(set SReg_64:$dst, (int_SI_if_break SReg_64:$vcc, SReg_64:$src))] |
9006 |
+>; |
9007 |
+ |
9008 |
+def SI_ELSE_BREAK : InstSI < |
9009 |
@@ -20713,18 +22242,35 @@ index 0000000..005be96 |
9010 |
+ [(int_SI_end_cf SReg_64:$saved)] |
9011 |
+>; |
9012 |
+ |
9013 |
++def SI_KILL : InstSI < |
9014 |
++ (outs), |
9015 |
++ (ins VReg_32:$src), |
9016 |
++ "SI_KIL $src", |
9017 |
++ [(int_AMDGPU_kill VReg_32:$src)] |
9018 |
++>; |
9019 |
++ |
9020 |
+} // end mayLoad = 1, mayStore = 1, hasSideEffects = 1 |
9021 |
+ // Uses = [EXEC], Defs = [EXEC] |
9022 |
+ |
9023 |
+} // end IsCodeGenOnly, isPseudo |
9024 |
+ |
9025 |
++def : Pat< |
9026 |
++ (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2), |
9027 |
++ (V_CNDMASK_B32_e64 VReg_32:$src2, VReg_32:$src1, (V_CMP_GT_F32_e64 0, VReg_32:$src0)) |
9028 |
++>; |
9029 |
++ |
9030 |
++def : Pat < |
9031 |
++ (int_AMDGPU_kilp), |
9032 |
++ (SI_KILL (V_MOV_B32_e32 0xbf800000)) |
9033 |
++>; |
9034 |
++ |
9035 |
+/* int_SI_vs_load_input */ |
9036 |
+def : Pat< |
9037 |
+ (int_SI_vs_load_input SReg_128:$tlst, IMM12bit:$attr_offset, |
9038 |
+ VReg_32:$buf_idx_vgpr), |
9039 |
+ (BUFFER_LOAD_FORMAT_XYZW imm:$attr_offset, 0, 1, 0, 0, 0, |
9040 |
+ VReg_32:$buf_idx_vgpr, SReg_128:$tlst, |
9041 |
-+ 0, 0, (i32 SREG_LIT_0)) |
9042 |
++ 0, 0, 0) |
9043 |
+>; |
9044 |
+ |
9045 |
+/* int_SI_export */ |
9046 |
@@ -20735,43 +22281,105 @@ index 0000000..005be96 |
9047 |
+ VReg_32:$src0, VReg_32:$src1, VReg_32:$src2, VReg_32:$src3) |
9048 |
+>; |
9049 |
+ |
9050 |
-+/* int_SI_sample */ |
9051 |
++ |
9052 |
++/* int_SI_sample for simple 1D texture lookup */ |
9053 |
+def : Pat < |
9054 |
-+ (int_SI_sample imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler), |
9055 |
-+ (IMAGE_SAMPLE imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord, |
9056 |
++ (int_SI_sample imm:$writemask, (v1i32 VReg_32:$addr), |
9057 |
++ SReg_256:$rsrc, SReg_128:$sampler, imm), |
9058 |
++ (IMAGE_SAMPLE imm:$writemask, 0, 0, 0, 0, 0, 0, 0, |
9059 |
++ (i32 (COPY_TO_REGCLASS VReg_32:$addr, VReg_32)), |
9060 |
+ SReg_256:$rsrc, SReg_128:$sampler) |
9061 |
+>; |
9062 |
+ |
9063 |
-+/* int_SI_sample_lod */ |
9064 |
-+def : Pat < |
9065 |
-+ (int_SI_sample_lod imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler), |
9066 |
-+ (IMAGE_SAMPLE_L imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord, |
9067 |
-+ SReg_256:$rsrc, SReg_128:$sampler) |
9068 |
++class SamplePattern<Intrinsic name, MIMG opcode, RegisterClass addr_class, |
9069 |
++ ValueType addr_type> : Pat < |
9070 |
++ (name imm:$writemask, (addr_type addr_class:$addr), |
9071 |
++ SReg_256:$rsrc, SReg_128:$sampler, imm), |
9072 |
++ (opcode imm:$writemask, 0, 0, 0, 0, 0, 0, 0, |
9073 |
++ (EXTRACT_SUBREG addr_class:$addr, sub0), |
9074 |
++ SReg_256:$rsrc, SReg_128:$sampler) |
9075 |
+>; |
9076 |
+ |
9077 |
-+/* int_SI_sample_bias */ |
9078 |
-+def : Pat < |
9079 |
-+ (int_SI_sample_bias imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler), |
9080 |
-+ (IMAGE_SAMPLE_B imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord, |
9081 |
-+ SReg_256:$rsrc, SReg_128:$sampler) |
9082 |
++class SampleRectPattern<Intrinsic name, MIMG opcode, RegisterClass addr_class, |
9083 |
++ ValueType addr_type> : Pat < |
9084 |
++ (name imm:$writemask, (addr_type addr_class:$addr), |
9085 |
++ SReg_256:$rsrc, SReg_128:$sampler, TEX_RECT), |
9086 |
++ (opcode imm:$writemask, 1, 0, 0, 0, 0, 0, 0, |
9087 |
++ (EXTRACT_SUBREG addr_class:$addr, sub0), |
9088 |
++ SReg_256:$rsrc, SReg_128:$sampler) |
9089 |
++>; |
9090 |
++ |
9091 |
++class SampleArrayPattern<Intrinsic name, MIMG opcode, RegisterClass addr_class, |
9092 |
++ ValueType addr_type> : Pat < |
9093 |
++ (name imm:$writemask, (addr_type addr_class:$addr), |
9094 |
++ SReg_256:$rsrc, SReg_128:$sampler, TEX_ARRAY), |
9095 |
++ (opcode imm:$writemask, 0, 0, 1, 0, 0, 0, 0, |
9096 |
++ (EXTRACT_SUBREG addr_class:$addr, sub0), |
9097 |
++ SReg_256:$rsrc, SReg_128:$sampler) |
9098 |
++>; |
9099 |
++ |
9100 |
++class SampleShadowPattern<Intrinsic name, MIMG opcode, |
9101 |
++ RegisterClass addr_class, ValueType addr_type> : Pat < |
9102 |
++ (name imm:$writemask, (addr_type addr_class:$addr), |
9103 |
++ SReg_256:$rsrc, SReg_128:$sampler, TEX_SHADOW), |
9104 |
++ (opcode imm:$writemask, 0, 0, 0, 0, 0, 0, 0, |
9105 |
++ (EXTRACT_SUBREG addr_class:$addr, sub0), |
9106 |
++ SReg_256:$rsrc, SReg_128:$sampler) |
9107 |
++>; |
9108 |
++ |
9109 |
++class SampleShadowArrayPattern<Intrinsic name, MIMG opcode, |
9110 |
++ RegisterClass addr_class, ValueType addr_type> : Pat < |
9111 |
++ (name imm:$writemask, (addr_type addr_class:$addr), |
9112 |
++ SReg_256:$rsrc, SReg_128:$sampler, TEX_SHADOW_ARRAY), |
9113 |
++ (opcode imm:$writemask, 0, 0, 1, 0, 0, 0, 0, |
9114 |
++ (EXTRACT_SUBREG addr_class:$addr, sub0), |
9115 |
++ SReg_256:$rsrc, SReg_128:$sampler) |
9116 |
+>; |
9117 |
+ |
9118 |
++/* int_SI_sample* for texture lookups consuming more address parameters */ |
9119 |
++multiclass SamplePatterns<RegisterClass addr_class, ValueType addr_type> { |
9120 |
++ def : SamplePattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>; |
9121 |
++ def : SampleRectPattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>; |
9122 |
++ def : SampleArrayPattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>; |
9123 |
++ def : SampleShadowPattern <int_SI_sample, IMAGE_SAMPLE_C, addr_class, addr_type>; |
9124 |
++ def : SampleShadowArrayPattern <int_SI_sample, IMAGE_SAMPLE_C, addr_class, addr_type>; |
9125 |
++ |
9126 |
++ def : SamplePattern <int_SI_samplel, IMAGE_SAMPLE_L, addr_class, addr_type>; |
9127 |
++ def : SampleArrayPattern <int_SI_samplel, IMAGE_SAMPLE_L, addr_class, addr_type>; |
9128 |
++ def : SampleShadowPattern <int_SI_samplel, IMAGE_SAMPLE_C_L, addr_class, addr_type>; |
9129 |
++ def : SampleShadowArrayPattern <int_SI_samplel, IMAGE_SAMPLE_C_L, addr_class, addr_type>; |
9130 |
++ |
9131 |
++ def : SamplePattern <int_SI_sampleb, IMAGE_SAMPLE_B, addr_class, addr_type>; |
9132 |
++ def : SampleArrayPattern <int_SI_sampleb, IMAGE_SAMPLE_B, addr_class, addr_type>; |
9133 |
++ def : SampleShadowPattern <int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_class, addr_type>; |
9134 |
++ def : SampleShadowArrayPattern <int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_class, addr_type>; |
9135 |
++} |
9136 |
++ |
9137 |
++defm : SamplePatterns<VReg_64, v2i32>; |
9138 |
++defm : SamplePatterns<VReg_128, v4i32>; |
9139 |
++defm : SamplePatterns<VReg_256, v8i32>; |
9140 |
++defm : SamplePatterns<VReg_512, v16i32>; |
9141 |
++ |
9142 |
+def CLAMP_SI : CLAMP<VReg_32>; |
9143 |
+def FABS_SI : FABS<VReg_32>; |
9144 |
+def FNEG_SI : FNEG<VReg_32>; |
9145 |
+ |
9146 |
-+def : Extract_Element <f32, v4f32, VReg_128, 0, sel_x>; |
9147 |
-+def : Extract_Element <f32, v4f32, VReg_128, 1, sel_y>; |
9148 |
-+def : Extract_Element <f32, v4f32, VReg_128, 2, sel_z>; |
9149 |
-+def : Extract_Element <f32, v4f32, VReg_128, 3, sel_w>; |
9150 |
++def : Extract_Element <f32, v4f32, VReg_128, 0, sub0>; |
9151 |
++def : Extract_Element <f32, v4f32, VReg_128, 1, sub1>; |
9152 |
++def : Extract_Element <f32, v4f32, VReg_128, 2, sub2>; |
9153 |
++def : Extract_Element <f32, v4f32, VReg_128, 3, sub3>; |
9154 |
+ |
9155 |
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 4, sel_x>; |
9156 |
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 5, sel_y>; |
9157 |
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 6, sel_z>; |
9158 |
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 7, sel_w>; |
9159 |
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 4, sub0>; |
9160 |
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 5, sub1>; |
9161 |
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 6, sub2>; |
9162 |
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 7, sub3>; |
9163 |
+ |
9164 |
++def : Vector1_Build <v1i32, VReg_32, i32, VReg_32>; |
9165 |
++def : Vector2_Build <v2i32, VReg_64, i32, VReg_32>; |
9166 |
+def : Vector_Build <v4f32, VReg_128, f32, VReg_32>; |
9167 |
-+def : Vector_Build <v4i32, SReg_128, i32, SReg_32>; |
9168 |
++def : Vector_Build <v4i32, VReg_128, i32, VReg_32>; |
9169 |
++def : Vector8_Build <v8i32, VReg_256, i32, VReg_32>; |
9170 |
++def : Vector16_Build <v16i32, VReg_512, i32, VReg_32>; |
9171 |
+ |
9172 |
+def : BitConvert <i32, f32, SReg_32>; |
9173 |
+def : BitConvert <i32, f32, VReg_32>; |
9174 |
@@ -20779,24 +22387,46 @@ index 0000000..005be96 |
9175 |
+def : BitConvert <f32, i32, SReg_32>; |
9176 |
+def : BitConvert <f32, i32, VReg_32>; |
9177 |
+ |
9178 |
++/********** ================== **********/ |
9179 |
++/********** Immediate Patterns **********/ |
9180 |
++/********** ================== **********/ |
9181 |
++ |
9182 |
++def : Pat < |
9183 |
++ (i1 imm:$imm), |
9184 |
++ (S_MOV_B64 imm:$imm) |
9185 |
++>; |
9186 |
++ |
9187 |
++def : Pat < |
9188 |
++ (i32 imm:$imm), |
9189 |
++ (V_MOV_B32_e32 imm:$imm) |
9190 |
++>; |
9191 |
++ |
9192 |
++def : Pat < |
9193 |
++ (f32 fpimm:$imm), |
9194 |
++ (V_MOV_B32_e32 fpimm:$imm) |
9195 |
++>; |
9196 |
++ |
9197 |
+def : Pat < |
9198 |
-+ (i64 (SIsreg1_bitcast SReg_1:$vcc)), |
9199 |
-+ (S_MOV_B64 (COPY_TO_REGCLASS SReg_1:$vcc, SReg_64)) |
9200 |
++ (i32 imm:$imm), |
9201 |
++ (S_MOV_B32 imm:$imm) |
9202 |
+>; |
9203 |
+ |
9204 |
+def : Pat < |
9205 |
-+ (i1 (SIsreg1_bitcast SReg_64:$vcc)), |
9206 |
-+ (COPY_TO_REGCLASS SReg_64:$vcc, SReg_1) |
9207 |
++ (f32 fpimm:$imm), |
9208 |
++ (S_MOV_B32 fpimm:$imm) |
9209 |
+>; |
9210 |
+ |
9211 |
+def : Pat < |
9212 |
-+ (i64 (SIvcc_bitcast VCCReg:$vcc)), |
9213 |
-+ (S_MOV_B64 (COPY_TO_REGCLASS VCCReg:$vcc, SReg_64)) |
9214 |
++ (i64 InlineImm<i64>:$imm), |
9215 |
++ (S_MOV_B64 InlineImm<i64>:$imm) |
9216 |
+>; |
9217 |
+ |
9218 |
++// i64 immediates aren't supported in hardware, split it into two 32bit values |
9219 |
+def : Pat < |
9220 |
-+ (i1 (SIvcc_bitcast SReg_64:$vcc)), |
9221 |
-+ (COPY_TO_REGCLASS SReg_64:$vcc, VCCReg) |
9222 |
++ (i64 imm:$imm), |
9223 |
++ (INSERT_SUBREG (INSERT_SUBREG (i64 (IMPLICIT_DEF)), |
9224 |
++ (S_MOV_B32 (i32 (LO32 imm:$imm))), sub0), |
9225 |
++ (S_MOV_B32 (i32 (HI32 imm:$imm))), sub1) |
9226 |
+>; |
9227 |
+ |
9228 |
+/********** ===================== **********/ |
9229 |
@@ -20804,6 +22434,12 @@ index 0000000..005be96 |
9230 |
+/********** ===================== **********/ |
9231 |
+ |
9232 |
+def : Pat < |
9233 |
++ (int_SI_fs_interp_constant imm:$attr_chan, imm:$attr, SReg_32:$params), |
9234 |
++ (V_INTERP_MOV_F32 INTERP.P0, imm:$attr_chan, imm:$attr, |
9235 |
++ (S_MOV_B32 SReg_32:$params)) |
9236 |
++>; |
9237 |
++ |
9238 |
++def : Pat < |
9239 |
+ (int_SI_fs_interp_linear_center imm:$attr_chan, imm:$attr, SReg_32:$params), |
9240 |
+ (SI_INTERP (f32 LINEAR_CENTER_I), (f32 LINEAR_CENTER_J), imm:$attr_chan, |
9241 |
+ imm:$attr, SReg_32:$params) |
9242 |
@@ -20861,56 +22497,95 @@ index 0000000..005be96 |
9243 |
+def : POW_Common <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_F32_e32, VReg_32>; |
9244 |
+ |
9245 |
+def : Pat < |
9246 |
-+ (int_AMDGPU_div AllReg_32:$src0, AllReg_32:$src1), |
9247 |
-+ (V_MUL_LEGACY_F32_e32 AllReg_32:$src0, (V_RCP_LEGACY_F32_e32 AllReg_32:$src1)) |
9248 |
++ (int_AMDGPU_div VSrc_32:$src0, VSrc_32:$src1), |
9249 |
++ (V_MUL_LEGACY_F32_e32 VSrc_32:$src0, (V_RCP_LEGACY_F32_e32 VSrc_32:$src1)) |
9250 |
+>; |
9251 |
+ |
9252 |
+def : Pat< |
9253 |
-+ (fdiv AllReg_32:$src0, AllReg_32:$src1), |
9254 |
-+ (V_MUL_F32_e32 AllReg_32:$src0, (V_RCP_F32_e32 AllReg_32:$src1)) |
9255 |
++ (fdiv VSrc_32:$src0, VSrc_32:$src1), |
9256 |
++ (V_MUL_F32_e32 VSrc_32:$src0, (V_RCP_F32_e32 VSrc_32:$src1)) |
9257 |
+>; |
9258 |
+ |
9259 |
+def : Pat < |
9260 |
-+ (int_AMDGPU_kilp), |
9261 |
-+ (SI_KIL (V_MOV_IMM_I32 0xbf800000)) |
9262 |
++ (fcos VSrc_32:$src0), |
9263 |
++ (V_COS_F32_e32 (V_MUL_F32_e32 VSrc_32:$src0, (V_MOV_B32_e32 CONST.TWO_PI_INV))) |
9264 |
++>; |
9265 |
++ |
9266 |
++def : Pat < |
9267 |
++ (fsin VSrc_32:$src0), |
9268 |
++ (V_SIN_F32_e32 (V_MUL_F32_e32 VSrc_32:$src0, (V_MOV_B32_e32 CONST.TWO_PI_INV))) |
9269 |
+>; |
9270 |
+ |
9271 |
+def : Pat < |
9272 |
+ (int_AMDGPU_cube VReg_128:$src), |
9273 |
+ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), |
9274 |
-+ (V_CUBETC_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x), |
9275 |
-+ (EXTRACT_SUBREG VReg_128:$src, sel_y), |
9276 |
-+ (EXTRACT_SUBREG VReg_128:$src, sel_z), |
9277 |
-+ 0, 0, 0, 0), sel_x), |
9278 |
-+ (V_CUBESC_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x), |
9279 |
-+ (EXTRACT_SUBREG VReg_128:$src, sel_y), |
9280 |
-+ (EXTRACT_SUBREG VReg_128:$src, sel_z), |
9281 |
-+ 0, 0, 0, 0), sel_y), |
9282 |
-+ (V_CUBEMA_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x), |
9283 |
-+ (EXTRACT_SUBREG VReg_128:$src, sel_y), |
9284 |
-+ (EXTRACT_SUBREG VReg_128:$src, sel_z), |
9285 |
-+ 0, 0, 0, 0), sel_z), |
9286 |
-+ (V_CUBEID_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x), |
9287 |
-+ (EXTRACT_SUBREG VReg_128:$src, sel_y), |
9288 |
-+ (EXTRACT_SUBREG VReg_128:$src, sel_z), |
9289 |
-+ 0, 0, 0, 0), sel_w) |
9290 |
++ (V_CUBETC_F32 (EXTRACT_SUBREG VReg_128:$src, sub0), |
9291 |
++ (EXTRACT_SUBREG VReg_128:$src, sub1), |
9292 |
++ (EXTRACT_SUBREG VReg_128:$src, sub2), |
9293 |
++ 0, 0, 0, 0), sub0), |
9294 |
++ (V_CUBESC_F32 (EXTRACT_SUBREG VReg_128:$src, sub0), |
9295 |
++ (EXTRACT_SUBREG VReg_128:$src, sub1), |
9296 |
++ (EXTRACT_SUBREG VReg_128:$src, sub2), |
9297 |
++ 0, 0, 0, 0), sub1), |
9298 |
++ (V_CUBEMA_F32 (EXTRACT_SUBREG VReg_128:$src, sub0), |
9299 |
++ (EXTRACT_SUBREG VReg_128:$src, sub1), |
9300 |
++ (EXTRACT_SUBREG VReg_128:$src, sub2), |
9301 |
++ 0, 0, 0, 0), sub2), |
9302 |
++ (V_CUBEID_F32 (EXTRACT_SUBREG VReg_128:$src, sub0), |
9303 |
++ (EXTRACT_SUBREG VReg_128:$src, sub1), |
9304 |
++ (EXTRACT_SUBREG VReg_128:$src, sub2), |
9305 |
++ 0, 0, 0, 0), sub3) |
9306 |
++>; |
9307 |
++ |
9308 |
++def : Pat < |
9309 |
++ (i32 (sext (i1 SReg_64:$src0))), |
9310 |
++ (V_CNDMASK_B32_e64 (i32 0), (i32 -1), SReg_64:$src0) |
9311 |
+>; |
9312 |
+ |
9313 |
+/********** ================== **********/ |
9314 |
+/********** VOP3 Patterns **********/ |
9315 |
+/********** ================== **********/ |
9316 |
+ |
9317 |
-+def : Pat <(f32 (IL_mad AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2)), |
9318 |
-+ (V_MAD_LEGACY_F32 AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2, |
9319 |
++def : Pat <(f32 (IL_mad VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2)), |
9320 |
++ (V_MAD_LEGACY_F32 VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2, |
9321 |
+ 0, 0, 0, 0)>; |
9322 |
+ |
9323 |
++/********** ================== **********/ |
9324 |
++/********** SMRD Patterns **********/ |
9325 |
++/********** ================== **********/ |
9326 |
++ |
9327 |
++multiclass SMRD_Pattern <SMRD Instr_IMM, SMRD Instr_SGPR, ValueType vt> { |
9328 |
++ // 1. Offset as 8bit DWORD immediate |
9329 |
++ def : Pat < |
9330 |
++ (constant_load (SIadd64bit32bit SReg_64:$sbase, IMM8bitDWORD:$offset)), |
9331 |
++ (vt (Instr_IMM SReg_64:$sbase, IMM8bitDWORD:$offset)) |
9332 |
++ >; |
9333 |
++ |
9334 |
++ // 2. Offset loaded in an 32bit SGPR |
9335 |
++ def : Pat < |
9336 |
++ (constant_load (SIadd64bit32bit SReg_64:$sbase, imm:$offset)), |
9337 |
++ (vt (Instr_SGPR SReg_64:$sbase, (S_MOV_B32 imm:$offset))) |
9338 |
++ >; |
9339 |
++ |
9340 |
++ // 3. No offset at all |
9341 |
++ def : Pat < |
9342 |
++ (constant_load SReg_64:$sbase), |
9343 |
++ (vt (Instr_IMM SReg_64:$sbase, 0)) |
9344 |
++ >; |
9345 |
++} |
9346 |
++ |
9347 |
++defm : SMRD_Pattern <S_LOAD_DWORD_IMM, S_LOAD_DWORD_SGPR, f32>; |
9348 |
++defm : SMRD_Pattern <S_LOAD_DWORD_IMM, S_LOAD_DWORD_SGPR, i32>; |
9349 |
++defm : SMRD_Pattern <S_LOAD_DWORDX4_IMM, S_LOAD_DWORDX4_SGPR, v4i32>; |
9350 |
++defm : SMRD_Pattern <S_LOAD_DWORDX8_IMM, S_LOAD_DWORDX8_SGPR, v8i32>; |
9351 |
++ |
9352 |
+} // End isSI predicate |
9353 |
diff --git a/lib/Target/R600/SIIntrinsics.td b/lib/Target/R600/SIIntrinsics.td |
9354 |
new file mode 100644 |
9355 |
-index 0000000..c322fef |
9356 |
+index 0000000..611b9c4 |
9357 |
--- /dev/null |
9358 |
+++ b/lib/Target/R600/SIIntrinsics.td |
9359 |
-@@ -0,0 +1,52 @@ |
9360 |
+@@ -0,0 +1,54 @@ |
9361 |
+//===-- SIIntrinsics.td - SI Intrinsic defs ----------------*- tablegen -*-===// |
9362 |
+// |
9363 |
+// The LLVM Compiler Infrastructure |
9364 |
@@ -20935,9 +22610,11 @@ index 0000000..c322fef |
9365 |
+ def int_SI_vs_load_input : Intrinsic <[llvm_v4f32_ty], [llvm_v4i32_ty, llvm_i16_ty, llvm_i32_ty], [IntrReadMem]> ; |
9366 |
+ def int_SI_wqm : Intrinsic <[], [], []>; |
9367 |
+ |
9368 |
-+ def int_SI_sample : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>; |
9369 |
-+ def int_SI_sample_bias : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>; |
9370 |
-+ def int_SI_sample_lod : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>; |
9371 |
++ class Sample : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_anyvector_ty, llvm_v8i32_ty, llvm_v4i32_ty, llvm_i32_ty], [IntrReadMem]>; |
9372 |
++ |
9373 |
++ def int_SI_sample : Sample; |
9374 |
++ def int_SI_sampleb : Sample; |
9375 |
++ def int_SI_samplel : Sample; |
9376 |
+ |
9377 |
+ /* Interpolation Intrinsics */ |
9378 |
+ |
9379 |
@@ -20965,10 +22642,10 @@ index 0000000..c322fef |
9380 |
+} |
9381 |
diff --git a/lib/Target/R600/SILowerControlFlow.cpp b/lib/Target/R600/SILowerControlFlow.cpp |
9382 |
new file mode 100644 |
9383 |
-index 0000000..3fbe653 |
9384 |
+index 0000000..2007d30 |
9385 |
--- /dev/null |
9386 |
+++ b/lib/Target/R600/SILowerControlFlow.cpp |
9387 |
-@@ -0,0 +1,331 @@ |
9388 |
+@@ -0,0 +1,372 @@ |
9389 |
+//===-- SILowerControlFlow.cpp - Use predicates for control flow ----------===// |
9390 |
+// |
9391 |
+// The LLVM Compiler Infrastructure |
9392 |
@@ -21039,7 +22716,10 @@ index 0000000..3fbe653 |
9393 |
+ static char ID; |
9394 |
+ const TargetInstrInfo *TII; |
9395 |
+ |
9396 |
-+ void Skip(MachineInstr &MI, MachineOperand &To); |
9397 |
++ bool shouldSkip(MachineBasicBlock *From, MachineBasicBlock *To); |
9398 |
++ |
9399 |
++ void Skip(MachineInstr &From, MachineOperand &To); |
9400 |
++ void SkipIfDead(MachineInstr &MI); |
9401 |
+ |
9402 |
+ void If(MachineInstr &MI); |
9403 |
+ void Else(MachineInstr &MI); |
9404 |
@@ -21049,6 +22729,7 @@ index 0000000..3fbe653 |
9405 |
+ void Loop(MachineInstr &MI); |
9406 |
+ void EndCf(MachineInstr &MI); |
9407 |
+ |
9408 |
++ void Kill(MachineInstr &MI); |
9409 |
+ void Branch(MachineInstr &MI); |
9410 |
+ |
9411 |
+public: |
9412 |
@@ -21071,22 +22752,29 @@ index 0000000..3fbe653 |
9413 |
+ return new SILowerControlFlowPass(tm); |
9414 |
+} |
9415 |
+ |
9416 |
-+void SILowerControlFlowPass::Skip(MachineInstr &From, MachineOperand &To) { |
9417 |
++bool SILowerControlFlowPass::shouldSkip(MachineBasicBlock *From, |
9418 |
++ MachineBasicBlock *To) { |
9419 |
++ |
9420 |
+ unsigned NumInstr = 0; |
9421 |
+ |
9422 |
-+ for (MachineBasicBlock *MBB = *From.getParent()->succ_begin(); |
9423 |
-+ NumInstr < SkipThreshold && MBB != To.getMBB() && !MBB->succ_empty(); |
9424 |
++ for (MachineBasicBlock *MBB = From; MBB != To && !MBB->succ_empty(); |
9425 |
+ MBB = *MBB->succ_begin()) { |
9426 |
+ |
9427 |
+ for (MachineBasicBlock::iterator I = MBB->begin(), E = MBB->end(); |
9428 |
+ NumInstr < SkipThreshold && I != E; ++I) { |
9429 |
+ |
9430 |
+ if (I->isBundle() || !I->isBundled()) |
9431 |
-+ ++NumInstr; |
9432 |
++ if (++NumInstr >= SkipThreshold) |
9433 |
++ return true; |
9434 |
+ } |
9435 |
+ } |
9436 |
+ |
9437 |
-+ if (NumInstr < SkipThreshold) |
9438 |
++ return false; |
9439 |
++} |
9440 |
++ |
9441 |
++void SILowerControlFlowPass::Skip(MachineInstr &From, MachineOperand &To) { |
9442 |
++ |
9443 |
++ if (!shouldSkip(*From.getParent()->succ_begin(), To.getMBB())) |
9444 |
+ return; |
9445 |
+ |
9446 |
+ DebugLoc DL = From.getDebugLoc(); |
9447 |
@@ -21095,6 +22783,38 @@ index 0000000..3fbe653 |
9448 |
+ .addReg(AMDGPU::EXEC); |
9449 |
+} |
9450 |
+ |
9451 |
++void SILowerControlFlowPass::SkipIfDead(MachineInstr &MI) { |
9452 |
++ |
9453 |
++ MachineBasicBlock &MBB = *MI.getParent(); |
9454 |
++ DebugLoc DL = MI.getDebugLoc(); |
9455 |
++ |
9456 |
++ if (!shouldSkip(&MBB, &MBB.getParent()->back())) |
9457 |
++ return; |
9458 |
++ |
9459 |
++ MachineBasicBlock::iterator Insert = &MI; |
9460 |
++ ++Insert; |
9461 |
++ |
9462 |
++ // If the exec mask is non-zero, skip the next two instructions |
9463 |
++ BuildMI(MBB, Insert, DL, TII->get(AMDGPU::S_CBRANCH_EXECNZ)) |
9464 |
++ .addImm(3) |
9465 |
++ .addReg(AMDGPU::EXEC); |
9466 |
++ |
9467 |
++ // Exec mask is zero: Export to NULL target... |
9468 |
++ BuildMI(MBB, Insert, DL, TII->get(AMDGPU::EXP)) |
9469 |
++ .addImm(0) |
9470 |
++ .addImm(0x09) // V_008DFC_SQ_EXP_NULL |
9471 |
++ .addImm(0) |
9472 |
++ .addImm(1) |
9473 |
++ .addImm(1) |
9474 |
++ .addReg(AMDGPU::VGPR0) |
9475 |
++ .addReg(AMDGPU::VGPR0) |
9476 |
++ .addReg(AMDGPU::VGPR0) |
9477 |
++ .addReg(AMDGPU::VGPR0); |
9478 |
++ |
9479 |
++ // ... and terminate wavefront |
9480 |
++ BuildMI(MBB, Insert, DL, TII->get(AMDGPU::S_ENDPGM)); |
9481 |
++} |
9482 |
++ |
9483 |
+void SILowerControlFlowPass::If(MachineInstr &MI) { |
9484 |
+ MachineBasicBlock &MBB = *MI.getParent(); |
9485 |
+ DebugLoc DL = MI.getDebugLoc(); |
9486 |
@@ -21213,8 +22933,28 @@ index 0000000..3fbe653 |
9487 |
+ assert(0); |
9488 |
+} |
9489 |
+ |
9490 |
++void SILowerControlFlowPass::Kill(MachineInstr &MI) { |
9491 |
++ |
9492 |
++ MachineBasicBlock &MBB = *MI.getParent(); |
9493 |
++ DebugLoc DL = MI.getDebugLoc(); |
9494 |
++ |
9495 |
++ // Kill is only allowed in pixel shaders |
9496 |
++ MachineFunction &MF = *MBB.getParent(); |
9497 |
++ SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>(); |
9498 |
++ assert(Info->ShaderType == ShaderType::PIXEL); |
9499 |
++ |
9500 |
++ // Clear this pixel from the exec mask if the operand is negative |
9501 |
++ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::V_CMPX_LE_F32_e32), AMDGPU::VCC) |
9502 |
++ .addImm(0) |
9503 |
++ .addOperand(MI.getOperand(0)); |
9504 |
++ |
9505 |
++ MI.eraseFromParent(); |
9506 |
++} |
9507 |
++ |
9508 |
+bool SILowerControlFlowPass::runOnMachineFunction(MachineFunction &MF) { |
9509 |
-+ bool HaveCf = false; |
9510 |
++ |
9511 |
++ bool HaveKill = false; |
9512 |
++ unsigned Depth = 0; |
9513 |
+ |
9514 |
+ for (MachineFunction::iterator BI = MF.begin(), BE = MF.end(); |
9515 |
+ BI != BE; ++BI) { |
9516 |
@@ -21228,6 +22968,7 @@ index 0000000..3fbe653 |
9517 |
+ switch (MI.getOpcode()) { |
9518 |
+ default: break; |
9519 |
+ case AMDGPU::SI_IF: |
9520 |
++ ++Depth; |
9521 |
+ If(MI); |
9522 |
+ break; |
9523 |
+ |
9524 |
@@ -21248,171 +22989,34 @@ index 0000000..3fbe653 |
9525 |
+ break; |
9526 |
+ |
9527 |
+ case AMDGPU::SI_LOOP: |
9528 |
++ ++Depth; |
9529 |
+ Loop(MI); |
9530 |
+ break; |
9531 |
+ |
9532 |
-+ case AMDGPU::SI_END_CF: |
9533 |
-+ HaveCf = true; |
9534 |
-+ EndCf(MI); |
9535 |
-+ break; |
9536 |
-+ |
9537 |
-+ case AMDGPU::S_BRANCH: |
9538 |
-+ Branch(MI); |
9539 |
-+ break; |
9540 |
-+ } |
9541 |
-+ } |
9542 |
-+ } |
9543 |
-+ |
9544 |
-+ // TODO: What is this good for? |
9545 |
-+ unsigned ShaderType = MF.getInfo<SIMachineFunctionInfo>()->ShaderType; |
9546 |
-+ if (HaveCf && ShaderType == ShaderType::PIXEL) { |
9547 |
-+ for (MachineFunction::iterator BI = MF.begin(), BE = MF.end(); |
9548 |
-+ BI != BE; ++BI) { |
9549 |
-+ |
9550 |
-+ MachineBasicBlock &MBB = *BI; |
9551 |
-+ if (MBB.succ_empty()) { |
9552 |
-+ |
9553 |
-+ MachineInstr &MI = *MBB.getFirstNonPHI(); |
9554 |
-+ DebugLoc DL = MI.getDebugLoc(); |
9555 |
-+ |
9556 |
-+ // If the exec mask is non-zero, skip the next two instructions |
9557 |
-+ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_CBRANCH_EXECNZ)) |
9558 |
-+ .addImm(3) |
9559 |
-+ .addReg(AMDGPU::EXEC); |
9560 |
-+ |
9561 |
-+ // Exec mask is zero: Export to NULL target... |
9562 |
-+ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::EXP)) |
9563 |
-+ .addImm(0) |
9564 |
-+ .addImm(0x09) // V_008DFC_SQ_EXP_NULL |
9565 |
-+ .addImm(0) |
9566 |
-+ .addImm(1) |
9567 |
-+ .addImm(1) |
9568 |
-+ .addReg(AMDGPU::SREG_LIT_0) |
9569 |
-+ .addReg(AMDGPU::SREG_LIT_0) |
9570 |
-+ .addReg(AMDGPU::SREG_LIT_0) |
9571 |
-+ .addReg(AMDGPU::SREG_LIT_0); |
9572 |
-+ |
9573 |
-+ // ... and terminate wavefront |
9574 |
-+ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_ENDPGM)); |
9575 |
-+ } |
9576 |
-+ } |
9577 |
-+ } |
9578 |
-+ |
9579 |
-+ return true; |
9580 |
-+} |
9581 |
-diff --git a/lib/Target/R600/SILowerLiteralConstants.cpp b/lib/Target/R600/SILowerLiteralConstants.cpp |
9582 |
-new file mode 100644 |
9583 |
-index 0000000..c0411e9 |
9584 |
---- /dev/null |
9585 |
-+++ b/lib/Target/R600/SILowerLiteralConstants.cpp |
9586 |
-@@ -0,0 +1,108 @@ |
9587 |
-+//===-- SILowerLiteralConstants.cpp - Lower intrs using literal constants--===// |
9588 |
-+// |
9589 |
-+// The LLVM Compiler Infrastructure |
9590 |
-+// |
9591 |
-+// This file is distributed under the University of Illinois Open Source |
9592 |
-+// License. See LICENSE.TXT for details. |
9593 |
-+// |
9594 |
-+//===----------------------------------------------------------------------===// |
9595 |
-+// |
9596 |
-+/// \file |
9597 |
-+/// \brief This pass performs the following transformation on instructions with |
9598 |
-+/// literal constants: |
9599 |
-+/// |
9600 |
-+/// %VGPR0 = V_MOV_IMM_I32 1 |
9601 |
-+/// |
9602 |
-+/// becomes: |
9603 |
-+/// |
9604 |
-+/// BUNDLE |
9605 |
-+/// * %VGPR = V_MOV_B32_32 SI_LITERAL_CONSTANT |
9606 |
-+/// * SI_LOAD_LITERAL 1 |
9607 |
-+/// |
9608 |
-+/// The resulting sequence matches exactly how the hardware handles immediate |
9609 |
-+/// operands, so this transformation greatly simplifies the code generator. |
9610 |
-+/// |
9611 |
-+/// Only the *_MOV_IMM_* support immediate operands at the moment, but when |
9612 |
-+/// support for immediate operands is added to other instructions, they |
9613 |
-+/// will be lowered here as well. |
9614 |
-+//===----------------------------------------------------------------------===// |
9615 |
-+ |
9616 |
-+#include "AMDGPU.h" |
9617 |
-+#include "llvm/CodeGen/MachineFunction.h" |
9618 |
-+#include "llvm/CodeGen/MachineFunctionPass.h" |
9619 |
-+#include "llvm/CodeGen/MachineInstrBuilder.h" |
9620 |
-+#include "llvm/CodeGen/MachineInstrBundle.h" |
9621 |
-+ |
9622 |
-+using namespace llvm; |
9623 |
-+ |
9624 |
-+namespace { |
9625 |
-+ |
9626 |
-+class SILowerLiteralConstantsPass : public MachineFunctionPass { |
9627 |
-+ |
9628 |
-+private: |
9629 |
-+ static char ID; |
9630 |
-+ const TargetInstrInfo *TII; |
9631 |
-+ |
9632 |
-+public: |
9633 |
-+ SILowerLiteralConstantsPass(TargetMachine &tm) : |
9634 |
-+ MachineFunctionPass(ID), TII(tm.getInstrInfo()) { } |
9635 |
-+ |
9636 |
-+ virtual bool runOnMachineFunction(MachineFunction &MF); |
9637 |
-+ |
9638 |
-+ const char *getPassName() const { |
9639 |
-+ return "SI Lower literal constants pass"; |
9640 |
-+ } |
9641 |
-+}; |
9642 |
-+ |
9643 |
-+} // End anonymous namespace |
9644 |
-+ |
9645 |
-+char SILowerLiteralConstantsPass::ID = 0; |
9646 |
-+ |
9647 |
-+FunctionPass *llvm::createSILowerLiteralConstantsPass(TargetMachine &tm) { |
9648 |
-+ return new SILowerLiteralConstantsPass(tm); |
9649 |
-+} |
9650 |
-+ |
9651 |
-+bool SILowerLiteralConstantsPass::runOnMachineFunction(MachineFunction &MF) { |
9652 |
-+ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end(); |
9653 |
-+ BB != BB_E; ++BB) { |
9654 |
-+ MachineBasicBlock &MBB = *BB; |
9655 |
-+ for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I); |
9656 |
-+ I != MBB.end(); I = Next) { |
9657 |
-+ Next = llvm::next(I); |
9658 |
-+ MachineInstr &MI = *I; |
9659 |
-+ switch (MI.getOpcode()) { |
9660 |
-+ default: break; |
9661 |
-+ case AMDGPU::S_MOV_IMM_I32: |
9662 |
-+ case AMDGPU::S_MOV_IMM_I64: |
9663 |
-+ case AMDGPU::V_MOV_IMM_F32: |
9664 |
-+ case AMDGPU::V_MOV_IMM_I32: { |
9665 |
-+ unsigned MovOpcode; |
9666 |
-+ unsigned LoadLiteralOpcode; |
9667 |
-+ MachineOperand LiteralOp = MI.getOperand(1); |
9668 |
-+ if (AMDGPU::VReg_32RegClass.contains(MI.getOperand(0).getReg())) { |
9669 |
-+ MovOpcode = AMDGPU::V_MOV_B32_e32; |
9670 |
-+ } else { |
9671 |
-+ MovOpcode = AMDGPU::S_MOV_B32; |
9672 |
-+ } |
9673 |
-+ if (LiteralOp.isImm()) { |
9674 |
-+ LoadLiteralOpcode = AMDGPU::SI_LOAD_LITERAL_I32; |
9675 |
-+ } else { |
9676 |
-+ LoadLiteralOpcode = AMDGPU::SI_LOAD_LITERAL_F32; |
9677 |
++ case AMDGPU::SI_END_CF: |
9678 |
++ if (--Depth == 0 && HaveKill) { |
9679 |
++ SkipIfDead(MI); |
9680 |
++ HaveKill = false; |
9681 |
+ } |
9682 |
-+ MIBundleBuilder Bundle(MBB, I); |
9683 |
-+ Bundle |
9684 |
-+ .append(BuildMI(MF, MBB.findDebugLoc(I), TII->get(MovOpcode), |
9685 |
-+ MI.getOperand(0).getReg()) |
9686 |
-+ .addReg(AMDGPU::SI_LITERAL_CONSTANT)) |
9687 |
-+ .append(BuildMI(MF, MBB.findDebugLoc(I), |
9688 |
-+ TII->get(LoadLiteralOpcode)) |
9689 |
-+ .addOperand(MI.getOperand(1))); |
9690 |
-+ llvm::finalizeBundle(MBB, Bundle.begin()); |
9691 |
-+ MI.eraseFromParent(); |
9692 |
++ EndCf(MI); |
9693 |
++ break; |
9694 |
++ |
9695 |
++ case AMDGPU::SI_KILL: |
9696 |
++ if (Depth == 0) |
9697 |
++ SkipIfDead(MI); |
9698 |
++ else |
9699 |
++ HaveKill = true; |
9700 |
++ Kill(MI); |
9701 |
++ break; |
9702 |
++ |
9703 |
++ case AMDGPU::S_BRANCH: |
9704 |
++ Branch(MI); |
9705 |
+ break; |
9706 |
-+ } |
9707 |
+ } |
9708 |
+ } |
9709 |
+ } |
9710 |
-+ return false; |
9711 |
++ |
9712 |
++ return true; |
9713 |
+} |
9714 |
diff --git a/lib/Target/R600/SIMachineFunctionInfo.cpp b/lib/Target/R600/SIMachineFunctionInfo.cpp |
9715 |
new file mode 100644 |
9716 |
@@ -21589,24 +23193,10 @@ index 0000000..40171e4 |
9717 |
+#endif // SIREGISTERINFO_H_ |
9718 |
diff --git a/lib/Target/R600/SIRegisterInfo.td b/lib/Target/R600/SIRegisterInfo.td |
9719 |
new file mode 100644 |
9720 |
-index 0000000..c3f1361 |
9721 |
+index 0000000..ab36b87 |
9722 |
--- /dev/null |
9723 |
+++ b/lib/Target/R600/SIRegisterInfo.td |
9724 |
-@@ -0,0 +1,167 @@ |
9725 |
-+ |
9726 |
-+let Namespace = "AMDGPU" in { |
9727 |
-+ def low : SubRegIndex; |
9728 |
-+ def high : SubRegIndex; |
9729 |
-+ |
9730 |
-+ def sub0 : SubRegIndex; |
9731 |
-+ def sub1 : SubRegIndex; |
9732 |
-+ def sub2 : SubRegIndex; |
9733 |
-+ def sub3 : SubRegIndex; |
9734 |
-+ def sub4 : SubRegIndex; |
9735 |
-+ def sub5 : SubRegIndex; |
9736 |
-+ def sub6 : SubRegIndex; |
9737 |
-+ def sub7 : SubRegIndex; |
9738 |
-+} |
9739 |
+@@ -0,0 +1,190 @@ |
9740 |
+ |
9741 |
+class SIReg <string n, bits<16> encoding = 0> : Register<n> { |
9742 |
+ let Namespace = "AMDGPU"; |
9743 |
@@ -21615,13 +23205,15 @@ index 0000000..c3f1361 |
9744 |
+ |
9745 |
+class SI_64 <string n, list<Register> subregs, bits<16> encoding> : RegisterWithSubRegs<n, subregs> { |
9746 |
+ let Namespace = "AMDGPU"; |
9747 |
-+ let SubRegIndices = [low, high]; |
9748 |
++ let SubRegIndices = [sub0, sub1]; |
9749 |
+ let HWEncoding = encoding; |
9750 |
+} |
9751 |
+ |
9752 |
+class SGPR_32 <bits<16> num, string name> : SIReg<name, num>; |
9753 |
+ |
9754 |
-+class VGPR_32 <bits<16> num, string name> : SIReg<name, num>; |
9755 |
++class VGPR_32 <bits<16> num, string name> : SIReg<name, num> { |
9756 |
++ let HWEncoding{8} = 1; |
9757 |
++} |
9758 |
+ |
9759 |
+// Special Registers |
9760 |
+def VCC : SIReg<"VCC", 106>; |
9761 |
@@ -21629,8 +23221,6 @@ index 0000000..c3f1361 |
9762 |
+def EXEC_HI : SIReg <"EXEC HI", 127>; |
9763 |
+def EXEC : SI_64<"EXEC", [EXEC_LO, EXEC_HI], 126>; |
9764 |
+def SCC : SIReg<"SCC", 253>; |
9765 |
-+def SREG_LIT_0 : SIReg <"S LIT 0", 128>; |
9766 |
-+def SI_LITERAL_CONSTANT : SIReg<"LITERAL CONSTANT", 255>; |
9767 |
+def M0 : SIReg <"M0", 124>; |
9768 |
+ |
9769 |
+//Interpolation registers |
9770 |
@@ -21668,12 +23258,12 @@ index 0000000..c3f1361 |
9771 |
+ (add (sequence "SGPR%u", 0, 101))>; |
9772 |
+ |
9773 |
+// SGPR 64-bit registers |
9774 |
-+def SGPR_64 : RegisterTuples<[low, high], |
9775 |
++def SGPR_64 : RegisterTuples<[sub0, sub1], |
9776 |
+ [(add (decimate SGPR_32, 2)), |
9777 |
+ (add(decimate (rotl SGPR_32, 1), 2))]>; |
9778 |
+ |
9779 |
+// SGPR 128-bit registers |
9780 |
-+def SGPR_128 : RegisterTuples<[sel_x, sel_y, sel_z, sel_w], |
9781 |
++def SGPR_128 : RegisterTuples<[sub0, sub1, sub2, sub3], |
9782 |
+ [(add (decimate SGPR_32, 4)), |
9783 |
+ (add (decimate (rotl SGPR_32, 1), 4)), |
9784 |
+ (add (decimate (rotl SGPR_32, 2), 4)), |
9785 |
@@ -21699,32 +23289,61 @@ index 0000000..c3f1361 |
9786 |
+ (add (sequence "VGPR%u", 0, 255))>; |
9787 |
+ |
9788 |
+// VGPR 64-bit registers |
9789 |
-+def VGPR_64 : RegisterTuples<[low, high], |
9790 |
++def VGPR_64 : RegisterTuples<[sub0, sub1], |
9791 |
+ [(add VGPR_32), |
9792 |
+ (add (rotl VGPR_32, 1))]>; |
9793 |
+ |
9794 |
+// VGPR 128-bit registers |
9795 |
-+def VGPR_128 : RegisterTuples<[sel_x, sel_y, sel_z, sel_w], |
9796 |
++def VGPR_128 : RegisterTuples<[sub0, sub1, sub2, sub3], |
9797 |
+ [(add VGPR_32), |
9798 |
+ (add (rotl VGPR_32, 1)), |
9799 |
+ (add (rotl VGPR_32, 2)), |
9800 |
+ (add (rotl VGPR_32, 3))]>; |
9801 |
+ |
9802 |
++// VGPR 256-bit registers |
9803 |
++def VGPR_256 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7], |
9804 |
++ [(add VGPR_32), |
9805 |
++ (add (rotl VGPR_32, 1)), |
9806 |
++ (add (rotl VGPR_32, 2)), |
9807 |
++ (add (rotl VGPR_32, 3)), |
9808 |
++ (add (rotl VGPR_32, 4)), |
9809 |
++ (add (rotl VGPR_32, 5)), |
9810 |
++ (add (rotl VGPR_32, 6)), |
9811 |
++ (add (rotl VGPR_32, 7))]>; |
9812 |
++ |
9813 |
++// VGPR 512-bit registers |
9814 |
++def VGPR_512 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7, |
9815 |
++ sub8, sub9, sub10, sub11, sub12, sub13, sub14, sub15], |
9816 |
++ [(add VGPR_32), |
9817 |
++ (add (rotl VGPR_32, 1)), |
9818 |
++ (add (rotl VGPR_32, 2)), |
9819 |
++ (add (rotl VGPR_32, 3)), |
9820 |
++ (add (rotl VGPR_32, 4)), |
9821 |
++ (add (rotl VGPR_32, 5)), |
9822 |
++ (add (rotl VGPR_32, 6)), |
9823 |
++ (add (rotl VGPR_32, 7)), |
9824 |
++ (add (rotl VGPR_32, 8)), |
9825 |
++ (add (rotl VGPR_32, 9)), |
9826 |
++ (add (rotl VGPR_32, 10)), |
9827 |
++ (add (rotl VGPR_32, 11)), |
9828 |
++ (add (rotl VGPR_32, 12)), |
9829 |
++ (add (rotl VGPR_32, 13)), |
9830 |
++ (add (rotl VGPR_32, 14)), |
9831 |
++ (add (rotl VGPR_32, 15))]>; |
9832 |
++ |
9833 |
+// Register class for all scalar registers (SGPRs + Special Registers) |
9834 |
+def SReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32, |
9835 |
-+ (add SGPR_32, SREG_LIT_0, M0, EXEC_LO, EXEC_HI) |
9836 |
++ (add SGPR_32, M0, EXEC_LO, EXEC_HI) |
9837 |
+>; |
9838 |
+ |
9839 |
-+def SReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add SGPR_64, VCC, EXEC)>; |
9840 |
-+ |
9841 |
-+def SReg_1 : RegisterClass<"AMDGPU", [i1], 1, (add VCC, SGPR_64, EXEC)>; |
9842 |
++def SReg_64 : RegisterClass<"AMDGPU", [i1, i64], 64, (add SGPR_64, VCC, EXEC)>; |
9843 |
+ |
9844 |
+def SReg_128 : RegisterClass<"AMDGPU", [v4f32, v4i32], 128, (add SGPR_128)>; |
9845 |
+ |
9846 |
+def SReg_256 : RegisterClass<"AMDGPU", [v8i32], 256, (add SGPR_256)>; |
9847 |
+ |
9848 |
+// Register class for all vector registers (VGPRs + Interploation Registers) |
9849 |
-+def VReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32, |
9850 |
++def VReg_32 : RegisterClass<"AMDGPU", [f32, i32, v1i32], 32, |
9851 |
+ (add VGPR_32, |
9852 |
+ PERSP_SAMPLE_I, PERSP_SAMPLE_J, |
9853 |
+ PERSP_CENTER_I, PERSP_CENTER_J, |
9854 |
@@ -21745,14 +23364,22 @@ index 0000000..c3f1361 |
9855 |
+ ) |
9856 |
+>; |
9857 |
+ |
9858 |
-+def VReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add VGPR_64)>; |
9859 |
++def VReg_64 : RegisterClass<"AMDGPU", [i64, v2i32], 64, (add VGPR_64)>; |
9860 |
++ |
9861 |
++def VReg_128 : RegisterClass<"AMDGPU", [v4f32, v4i32], 128, (add VGPR_128)>; |
9862 |
++ |
9863 |
++def VReg_256 : RegisterClass<"AMDGPU", [v8i32], 256, (add VGPR_256)>; |
9864 |
+ |
9865 |
-+def VReg_128 : RegisterClass<"AMDGPU", [v4f32], 128, (add VGPR_128)>; |
9866 |
++def VReg_512 : RegisterClass<"AMDGPU", [v16i32], 512, (add VGPR_512)>; |
9867 |
+ |
9868 |
-+// AllReg_* - A set of all scalar and vector registers of a given width. |
9869 |
-+def AllReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32, (add VReg_32, SReg_32)>; |
9870 |
++// [SV]Src_* operands can have either an immediate or an register |
9871 |
++def SSrc_32 : RegisterClass<"AMDGPU", [i32, f32], 32, (add SReg_32)>; |
9872 |
+ |
9873 |
-+def AllReg_64 : RegisterClass<"AMDGPU", [f64, i64], 64, (add SReg_64, VReg_64)>; |
9874 |
++def SSrc_64 : RegisterClass<"AMDGPU", [i1, i64], 64, (add SReg_64)>; |
9875 |
++ |
9876 |
++def VSrc_32 : RegisterClass<"AMDGPU", [i32, f32], 32, (add VReg_32, SReg_32)>; |
9877 |
++ |
9878 |
++def VSrc_64 : RegisterClass<"AMDGPU", [i64], 64, (add SReg_64, VReg_64)>; |
9879 |
+ |
9880 |
+// Special register classes for predicates and the M0 register |
9881 |
+def SCCReg : RegisterClass<"AMDGPU", [i1], 1, (add SCC)>; |
9882 |
@@ -21876,6 +23503,30 @@ index 0000000..b8ac4e7 |
9883 |
+CPPFLAGS = -I$(PROJ_OBJ_DIR)/.. -I$(PROJ_SRC_DIR)/.. |
9884 |
+ |
9885 |
+include $(LEVEL)/Makefile.common |
9886 |
+diff --git a/test/CodeGen/R600/128bit-kernel-args.ll b/test/CodeGen/R600/128bit-kernel-args.ll |
9887 |
+new file mode 100644 |
9888 |
+index 0000000..114f9e7 |
9889 |
+--- /dev/null |
9890 |
++++ b/test/CodeGen/R600/128bit-kernel-args.ll |
9891 |
+@@ -0,0 +1,18 @@ |
9892 |
++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
9893 |
++ |
9894 |
++; CHECK: @v4i32_kernel_arg |
9895 |
++; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40 |
9896 |
++ |
9897 |
++define void @v4i32_kernel_arg(<4 x i32> addrspace(1)* %out, <4 x i32> %in) { |
9898 |
++entry: |
9899 |
++ store <4 x i32> %in, <4 x i32> addrspace(1)* %out |
9900 |
++ ret void |
9901 |
++} |
9902 |
++ |
9903 |
++; CHECK: @v4f32_kernel_arg |
9904 |
++; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40 |
9905 |
++define void @v4f32_kernel_args(<4 x float> addrspace(1)* %out, <4 x float> %in) { |
9906 |
++entry: |
9907 |
++ store <4 x float> %in, <4 x float> addrspace(1)* %out |
9908 |
++ ret void |
9909 |
++} |
9910 |
diff --git a/test/CodeGen/R600/add.v4i32.ll b/test/CodeGen/R600/add.v4i32.ll |
9911 |
new file mode 100644 |
9912 |
index 0000000..ac4a874 |
9913 |
@@ -21918,6 +23569,82 @@ index 0000000..662085e |
9914 |
+ store <4 x i32> %result, <4 x i32> addrspace(1)* %out |
9915 |
+ ret void |
9916 |
+} |
9917 |
+diff --git a/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll b/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll |
9918 |
+new file mode 100644 |
9919 |
+index 0000000..fd958b3 |
9920 |
+--- /dev/null |
9921 |
++++ b/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll |
9922 |
+@@ -0,0 +1,36 @@ |
9923 |
++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
9924 |
++ |
9925 |
++; This test is for a bug in |
9926 |
++; DAGCombiner::reduceBuildVecConvertToConvertBuildVec() where |
9927 |
++; the wrong type was being passed to |
9928 |
++; TargetLowering::getOperationAction() when checking the legality of |
9929 |
++; ISD::UINT_TO_FP and ISD::SINT_TO_FP opcodes. |
9930 |
++ |
9931 |
++ |
9932 |
++; CHECK: @sint |
9933 |
++; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
9934 |
++ |
9935 |
++define void @sint(<4 x float> addrspace(1)* %out, i32 addrspace(1)* %in) { |
9936 |
++entry: |
9937 |
++ %ptr = getelementptr i32 addrspace(1)* %in, i32 1 |
9938 |
++ %sint = load i32 addrspace(1) * %in |
9939 |
++ %conv = sitofp i32 %sint to float |
9940 |
++ %0 = insertelement <4 x float> undef, float %conv, i32 0 |
9941 |
++ %splat = shufflevector <4 x float> %0, <4 x float> undef, <4 x i32> zeroinitializer |
9942 |
++ store <4 x float> %splat, <4 x float> addrspace(1)* %out |
9943 |
++ ret void |
9944 |
++} |
9945 |
++ |
9946 |
++;CHECK: @uint |
9947 |
++;CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
9948 |
++ |
9949 |
++define void @uint(<4 x float> addrspace(1)* %out, i32 addrspace(1)* %in) { |
9950 |
++entry: |
9951 |
++ %ptr = getelementptr i32 addrspace(1)* %in, i32 1 |
9952 |
++ %uint = load i32 addrspace(1) * %in |
9953 |
++ %conv = uitofp i32 %uint to float |
9954 |
++ %0 = insertelement <4 x float> undef, float %conv, i32 0 |
9955 |
++ %splat = shufflevector <4 x float> %0, <4 x float> undef, <4 x i32> zeroinitializer |
9956 |
++ store <4 x float> %splat, <4 x float> addrspace(1)* %out |
9957 |
++ ret void |
9958 |
++} |
9959 |
+diff --git a/test/CodeGen/R600/disconnected-predset-break-bug.ll b/test/CodeGen/R600/disconnected-predset-break-bug.ll |
9960 |
+new file mode 100644 |
9961 |
+index 0000000..a586742 |
9962 |
+--- /dev/null |
9963 |
++++ b/test/CodeGen/R600/disconnected-predset-break-bug.ll |
9964 |
+@@ -0,0 +1,28 @@ |
9965 |
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
9966 |
++ |
9967 |
++; PRED_SET* instructions must be tied to any instruction that uses their |
9968 |
++; result. This tests that there are no instructions between the PRED_SET* |
9969 |
++; and the PREDICATE_BREAK in this loop. |
9970 |
++ |
9971 |
++; CHECK: @loop_ge |
9972 |
++; CHECK: WHILE |
9973 |
++; CHECK: PRED_SET |
9974 |
++; CHECK-NEXT: PREDICATED_BREAK |
9975 |
++define void @loop_ge(i32 addrspace(1)* nocapture %out, i32 %iterations) nounwind { |
9976 |
++entry: |
9977 |
++ %cmp5 = icmp sgt i32 %iterations, 0 |
9978 |
++ br i1 %cmp5, label %for.body, label %for.end |
9979 |
++ |
9980 |
++for.body: ; preds = %for.body, %entry |
9981 |
++ %i.07.in = phi i32 [ %i.07, %for.body ], [ %iterations, %entry ] |
9982 |
++ %ai.06 = phi i32 [ %add, %for.body ], [ 0, %entry ] |
9983 |
++ %i.07 = add nsw i32 %i.07.in, -1 |
9984 |
++ %arrayidx = getelementptr inbounds i32 addrspace(1)* %out, i32 %ai.06 |
9985 |
++ store i32 %i.07, i32 addrspace(1)* %arrayidx, align 4 |
9986 |
++ %add = add nsw i32 %ai.06, 1 |
9987 |
++ %exitcond = icmp eq i32 %add, %iterations |
9988 |
++ br i1 %exitcond, label %for.end, label %for.body |
9989 |
++ |
9990 |
++for.end: ; preds = %for.body, %entry |
9991 |
++ ret void |
9992 |
++} |
9993 |
diff --git a/test/CodeGen/R600/fabs.ll b/test/CodeGen/R600/fabs.ll |
9994 |
new file mode 100644 |
9995 |
index 0000000..0407533 |
9996 |
@@ -22027,15 +23754,13 @@ index 0000000..5c981ef |
9997 |
+} |
9998 |
diff --git a/test/CodeGen/R600/fcmp.ll b/test/CodeGen/R600/fcmp.ll |
9999 |
new file mode 100644 |
10000 |
-index 0000000..1dcd07c |
10001 |
+index 0000000..89f5e9e |
10002 |
--- /dev/null |
10003 |
+++ b/test/CodeGen/R600/fcmp.ll |
10004 |
-@@ -0,0 +1,16 @@ |
10005 |
+@@ -0,0 +1,14 @@ |
10006 |
+;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
10007 |
+ |
10008 |
-+;CHECK: SETE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10009 |
-+;CHECK: MOV T{{[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}} |
10010 |
-+;CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10011 |
++;CHECK: SETE_DX10 T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10012 |
+ |
10013 |
+define void @test(i32 addrspace(1)* %out, float addrspace(1)* %in) { |
10014 |
+entry: |
10015 |
@@ -22183,14 +23908,13 @@ index 0000000..6d44a0c |
10016 |
+} |
10017 |
diff --git a/test/CodeGen/R600/fsub.ll b/test/CodeGen/R600/fsub.ll |
10018 |
new file mode 100644 |
10019 |
-index 0000000..0ec1c37 |
10020 |
+index 0000000..591aa52 |
10021 |
--- /dev/null |
10022 |
+++ b/test/CodeGen/R600/fsub.ll |
10023 |
-@@ -0,0 +1,17 @@ |
10024 |
+@@ -0,0 +1,16 @@ |
10025 |
+;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
10026 |
+ |
10027 |
-+; CHECK: MOV T{{[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}} |
10028 |
-+; CHECK: ADD T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10029 |
++; CHECK: ADD T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}} |
10030 |
+ |
10031 |
+define void @test() { |
10032 |
+ %r0 = call float @llvm.R600.load.input(i32 0) |
10033 |
@@ -22266,6 +23990,64 @@ index 0000000..aad44d9 |
10034 |
+ store i32 %value, i32 addrspace(1)* %out |
10035 |
+ ret void |
10036 |
+} |
10037 |
+diff --git a/test/CodeGen/R600/kcache-fold.ll b/test/CodeGen/R600/kcache-fold.ll |
10038 |
+new file mode 100644 |
10039 |
+index 0000000..382f78c |
10040 |
+--- /dev/null |
10041 |
++++ b/test/CodeGen/R600/kcache-fold.ll |
10042 |
+@@ -0,0 +1,52 @@ |
10043 |
++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
10044 |
++ |
10045 |
++; CHECK: MOV T{{[0-9]+\.[XYZW], CBuf0\[[0-9]+\]\.[XYZW]}} |
10046 |
++ |
10047 |
++define void @main() { |
10048 |
++main_body: |
10049 |
++ %0 = load <4 x float> addrspace(9)* null |
10050 |
++ %1 = extractelement <4 x float> %0, i32 0 |
10051 |
++ %2 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1) |
10052 |
++ %3 = extractelement <4 x float> %2, i32 0 |
10053 |
++ %4 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2) |
10054 |
++ %5 = extractelement <4 x float> %4, i32 0 |
10055 |
++ %6 = fcmp ult float %1, 0.000000e+00 |
10056 |
++ %7 = select i1 %6, float %3, float %5 |
10057 |
++ %8 = load <4 x float> addrspace(9)* null |
10058 |
++ %9 = extractelement <4 x float> %8, i32 1 |
10059 |
++ %10 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1) |
10060 |
++ %11 = extractelement <4 x float> %10, i32 1 |
10061 |
++ %12 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2) |
10062 |
++ %13 = extractelement <4 x float> %12, i32 1 |
10063 |
++ %14 = fcmp ult float %9, 0.000000e+00 |
10064 |
++ %15 = select i1 %14, float %11, float %13 |
10065 |
++ %16 = load <4 x float> addrspace(9)* null |
10066 |
++ %17 = extractelement <4 x float> %16, i32 2 |
10067 |
++ %18 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1) |
10068 |
++ %19 = extractelement <4 x float> %18, i32 2 |
10069 |
++ %20 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2) |
10070 |
++ %21 = extractelement <4 x float> %20, i32 2 |
10071 |
++ %22 = fcmp ult float %17, 0.000000e+00 |
10072 |
++ %23 = select i1 %22, float %19, float %21 |
10073 |
++ %24 = load <4 x float> addrspace(9)* null |
10074 |
++ %25 = extractelement <4 x float> %24, i32 3 |
10075 |
++ %26 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1) |
10076 |
++ %27 = extractelement <4 x float> %26, i32 3 |
10077 |
++ %28 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2) |
10078 |
++ %29 = extractelement <4 x float> %28, i32 3 |
10079 |
++ %30 = fcmp ult float %25, 0.000000e+00 |
10080 |
++ %31 = select i1 %30, float %27, float %29 |
10081 |
++ %32 = call float @llvm.AMDIL.clamp.(float %7, float 0.000000e+00, float 1.000000e+00) |
10082 |
++ %33 = call float @llvm.AMDIL.clamp.(float %15, float 0.000000e+00, float 1.000000e+00) |
10083 |
++ %34 = call float @llvm.AMDIL.clamp.(float %23, float 0.000000e+00, float 1.000000e+00) |
10084 |
++ %35 = call float @llvm.AMDIL.clamp.(float %31, float 0.000000e+00, float 1.000000e+00) |
10085 |
++ %36 = insertelement <4 x float> undef, float %32, i32 0 |
10086 |
++ %37 = insertelement <4 x float> %36, float %33, i32 1 |
10087 |
++ %38 = insertelement <4 x float> %37, float %34, i32 2 |
10088 |
++ %39 = insertelement <4 x float> %38, float %35, i32 3 |
10089 |
++ call void @llvm.R600.store.swizzle(<4 x float> %39, i32 0, i32 0) |
10090 |
++ ret void |
10091 |
++} |
10092 |
++ |
10093 |
++declare float @llvm.AMDIL.clamp.(float, float, float) readnone |
10094 |
++declare void @llvm.R600.store.swizzle(<4 x float>, i32, i32) |
10095 |
diff --git a/test/CodeGen/R600/lit.local.cfg b/test/CodeGen/R600/lit.local.cfg |
10096 |
new file mode 100644 |
10097 |
index 0000000..36ee493 |
10098 |
@@ -22287,10 +24069,10 @@ index 0000000..36ee493 |
10099 |
+ |
10100 |
diff --git a/test/CodeGen/R600/literals.ll b/test/CodeGen/R600/literals.ll |
10101 |
new file mode 100644 |
10102 |
-index 0000000..4c731b2 |
10103 |
+index 0000000..be62342 |
10104 |
--- /dev/null |
10105 |
+++ b/test/CodeGen/R600/literals.ll |
10106 |
-@@ -0,0 +1,30 @@ |
10107 |
+@@ -0,0 +1,32 @@ |
10108 |
+; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
10109 |
+ |
10110 |
+; Test using an integer literal constant. |
10111 |
@@ -22299,6 +24081,7 @@ index 0000000..4c731b2 |
10112 |
+; or |
10113 |
+; ADD_INT literal.x REG, 5 |
10114 |
+ |
10115 |
++; CHECK; @i32_literal |
10116 |
+; CHECK: ADD_INT {{[A-Z0-9,. ]*}}literal.x,{{[A-Z0-9,. ]*}} 5 |
10117 |
+define void @i32_literal(i32 addrspace(1)* %out, i32 %in) { |
10118 |
+entry: |
10119 |
@@ -22313,6 +24096,7 @@ index 0000000..4c731b2 |
10120 |
+; or |
10121 |
+; ADD literal.x REG, 5.0 |
10122 |
+ |
10123 |
++; CHECK: @float_literal |
10124 |
+; CHECK: ADD {{[A-Z0-9,. ]*}}literal.x,{{[A-Z0-9,. ]*}} {{[0-9]+}}(5.0 |
10125 |
+define void @float_literal(float addrspace(1)* %out, float %in) { |
10126 |
+entry: |
10127 |
@@ -22366,6 +24150,35 @@ index 0000000..fac957f |
10128 |
+declare void @llvm.AMDGPU.store.output(float, i32) |
10129 |
+ |
10130 |
+declare float @llvm.AMDGPU.trunc(float ) readnone |
10131 |
+diff --git a/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll b/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll |
10132 |
+new file mode 100644 |
10133 |
+index 0000000..0c19f14 |
10134 |
+--- /dev/null |
10135 |
++++ b/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll |
10136 |
+@@ -0,0 +1,23 @@ |
10137 |
++;RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s |
10138 |
++ |
10139 |
++;CHECK: S_MOV_B32 |
10140 |
++;CHECK-NEXT: V_INTERP_MOV_F32 |
10141 |
++ |
10142 |
++define void @main() { |
10143 |
++main_body: |
10144 |
++ call void @llvm.AMDGPU.shader.type(i32 0) |
10145 |
++ %0 = load i32 addrspace(8)* inttoptr (i32 6 to i32 addrspace(8)*) |
10146 |
++ %1 = call float @llvm.SI.fs.interp.constant(i32 0, i32 0, i32 %0) |
10147 |
++ %2 = call i32 @llvm.SI.packf16(float %1, float %1) |
10148 |
++ %3 = bitcast i32 %2 to float |
10149 |
++ call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %3, float %3, float %3, float %3) |
10150 |
++ ret void |
10151 |
++} |
10152 |
++ |
10153 |
++declare void @llvm.AMDGPU.shader.type(i32) |
10154 |
++ |
10155 |
++declare float @llvm.SI.fs.interp.constant(i32, i32, i32) readonly |
10156 |
++ |
10157 |
++declare i32 @llvm.SI.packf16(float, float) readnone |
10158 |
++ |
10159 |
++declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float) |
10160 |
diff --git a/test/CodeGen/R600/llvm.cos.ll b/test/CodeGen/R600/llvm.cos.ll |
10161 |
new file mode 100644 |
10162 |
index 0000000..dc120bf |
10163 |
@@ -22466,6 +24279,112 @@ index 0000000..b070dcd |
10164 |
+ store i32 %2, i32 addrspace(1)* %out |
10165 |
+ ret void |
10166 |
+} |
10167 |
+diff --git a/test/CodeGen/R600/predicates.ll b/test/CodeGen/R600/predicates.ll |
10168 |
+new file mode 100644 |
10169 |
+index 0000000..18895a4 |
10170 |
+--- /dev/null |
10171 |
++++ b/test/CodeGen/R600/predicates.ll |
10172 |
+@@ -0,0 +1,100 @@ |
10173 |
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
10174 |
++ |
10175 |
++; These tests make sure the compiler is optimizing branches using predicates |
10176 |
++; when it is legal to do so. |
10177 |
++ |
10178 |
++; CHECK: @simple_if |
10179 |
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred, |
10180 |
++; CHECK: LSHL T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel |
10181 |
++define void @simple_if(i32 addrspace(1)* %out, i32 %in) { |
10182 |
++entry: |
10183 |
++ %0 = icmp sgt i32 %in, 0 |
10184 |
++ br i1 %0, label %IF, label %ENDIF |
10185 |
++ |
10186 |
++IF: |
10187 |
++ %1 = shl i32 %in, 1 |
10188 |
++ br label %ENDIF |
10189 |
++ |
10190 |
++ENDIF: |
10191 |
++ %2 = phi i32 [ %in, %entry ], [ %1, %IF ] |
10192 |
++ store i32 %2, i32 addrspace(1)* %out |
10193 |
++ ret void |
10194 |
++} |
10195 |
++ |
10196 |
++; CHECK: @simple_if_else |
10197 |
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred, |
10198 |
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel |
10199 |
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel |
10200 |
++define void @simple_if_else(i32 addrspace(1)* %out, i32 %in) { |
10201 |
++entry: |
10202 |
++ %0 = icmp sgt i32 %in, 0 |
10203 |
++ br i1 %0, label %IF, label %ELSE |
10204 |
++ |
10205 |
++IF: |
10206 |
++ %1 = shl i32 %in, 1 |
10207 |
++ br label %ENDIF |
10208 |
++ |
10209 |
++ELSE: |
10210 |
++ %2 = lshr i32 %in, 1 |
10211 |
++ br label %ENDIF |
10212 |
++ |
10213 |
++ENDIF: |
10214 |
++ %3 = phi i32 [ %1, %IF ], [ %2, %ELSE ] |
10215 |
++ store i32 %3, i32 addrspace(1)* %out |
10216 |
++ ret void |
10217 |
++} |
10218 |
++ |
10219 |
++; CHECK: @nested_if |
10220 |
++; CHECK: IF_PREDICATE_SET |
10221 |
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred, |
10222 |
++; CHECK: LSHL T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel |
10223 |
++; CHECK: ENDIF |
10224 |
++define void @nested_if(i32 addrspace(1)* %out, i32 %in) { |
10225 |
++entry: |
10226 |
++ %0 = icmp sgt i32 %in, 0 |
10227 |
++ br i1 %0, label %IF0, label %ENDIF |
10228 |
++ |
10229 |
++IF0: |
10230 |
++ %1 = add i32 %in, 10 |
10231 |
++ %2 = icmp sgt i32 %1, 0 |
10232 |
++ br i1 %2, label %IF1, label %ENDIF |
10233 |
++ |
10234 |
++IF1: |
10235 |
++ %3 = shl i32 %1, 1 |
10236 |
++ br label %ENDIF |
10237 |
++ |
10238 |
++ENDIF: |
10239 |
++ %4 = phi i32 [%in, %entry], [%1, %IF0], [%3, %IF1] |
10240 |
++ store i32 %4, i32 addrspace(1)* %out |
10241 |
++ ret void |
10242 |
++} |
10243 |
++ |
10244 |
++; CHECK: @nested_if_else |
10245 |
++; CHECK: IF_PREDICATE_SET |
10246 |
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred, |
10247 |
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel |
10248 |
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel |
10249 |
++; CHECK: ENDIF |
10250 |
++define void @nested_if_else(i32 addrspace(1)* %out, i32 %in) { |
10251 |
++entry: |
10252 |
++ %0 = icmp sgt i32 %in, 0 |
10253 |
++ br i1 %0, label %IF0, label %ENDIF |
10254 |
++ |
10255 |
++IF0: |
10256 |
++ %1 = add i32 %in, 10 |
10257 |
++ %2 = icmp sgt i32 %1, 0 |
10258 |
++ br i1 %2, label %IF1, label %ELSE1 |
10259 |
++ |
10260 |
++IF1: |
10261 |
++ %3 = shl i32 %1, 1 |
10262 |
++ br label %ENDIF |
10263 |
++ |
10264 |
++ELSE1: |
10265 |
++ %4 = lshr i32 %in, 1 |
10266 |
++ br label %ENDIF |
10267 |
++ |
10268 |
++ENDIF: |
10269 |
++ %5 = phi i32 [%in, %entry], [%3, %IF1], [%4, %ELSE1] |
10270 |
++ store i32 %5, i32 addrspace(1)* %out |
10271 |
++ ret void |
10272 |
++} |
10273 |
diff --git a/test/CodeGen/R600/reciprocal.ll b/test/CodeGen/R600/reciprocal.ll |
10274 |
new file mode 100644 |
10275 |
index 0000000..6838c1a |
10276 |
@@ -22517,7 +24436,7 @@ index 0000000..3556fac |
10277 |
+} |
10278 |
diff --git a/test/CodeGen/R600/selectcc-icmp-select-float.ll b/test/CodeGen/R600/selectcc-icmp-select-float.ll |
10279 |
new file mode 100644 |
10280 |
-index 0000000..f65a300 |
10281 |
+index 0000000..359ca1e |
10282 |
--- /dev/null |
10283 |
+++ b/test/CodeGen/R600/selectcc-icmp-select-float.ll |
10284 |
@@ -0,0 +1,15 @@ |
10285 |
@@ -22525,7 +24444,7 @@ index 0000000..f65a300 |
10286 |
+ |
10287 |
+; Note additional optimizations may cause this SGT to be replaced with a |
10288 |
+; CND* instruction. |
10289 |
-+; CHECK: SGT_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal.x, -1}} |
10290 |
++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal.x, -1}} |
10291 |
+; Test a selectcc with i32 LHS/RHS and float True/False |
10292 |
+ |
10293 |
+define void @test(float addrspace(1)* %out, i32 addrspace(1)* %in) { |
10294 |
@@ -22570,6 +24489,149 @@ index 0000000..b38078e |
10295 |
+ store i32 %3, i32 addrspace(1)* %out |
10296 |
+ ret void |
10297 |
+} |
10298 |
+diff --git a/test/CodeGen/R600/set-dx10.ll b/test/CodeGen/R600/set-dx10.ll |
10299 |
+new file mode 100644 |
10300 |
+index 0000000..54febcf |
10301 |
+--- /dev/null |
10302 |
++++ b/test/CodeGen/R600/set-dx10.ll |
10303 |
+@@ -0,0 +1,137 @@ |
10304 |
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
10305 |
++ |
10306 |
++; These tests check that floating point comparisons which are used by select |
10307 |
++; to store integer true (-1) and false (0) values are lowered to one of the |
10308 |
++; SET*DX10 instructions. |
10309 |
++ |
10310 |
++; CHECK: @fcmp_une_select_fptosi |
10311 |
++; CHECK: SETNE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00) |
10312 |
++define void @fcmp_une_select_fptosi(i32 addrspace(1)* %out, float %in) { |
10313 |
++entry: |
10314 |
++ %0 = fcmp une float %in, 5.0 |
10315 |
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00 |
10316 |
++ %2 = fsub float -0.000000e+00, %1 |
10317 |
++ %3 = fptosi float %2 to i32 |
10318 |
++ store i32 %3, i32 addrspace(1)* %out |
10319 |
++ ret void |
10320 |
++} |
10321 |
++ |
10322 |
++; CHECK: @fcmp_une_select_i32 |
10323 |
++; CHECK: SETNE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00) |
10324 |
++define void @fcmp_une_select_i32(i32 addrspace(1)* %out, float %in) { |
10325 |
++entry: |
10326 |
++ %0 = fcmp une float %in, 5.0 |
10327 |
++ %1 = select i1 %0, i32 -1, i32 0 |
10328 |
++ store i32 %1, i32 addrspace(1)* %out |
10329 |
++ ret void |
10330 |
++} |
10331 |
++ |
10332 |
++; CHECK: @fcmp_ueq_select_fptosi |
10333 |
++; CHECK: SETE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00) |
10334 |
++define void @fcmp_ueq_select_fptosi(i32 addrspace(1)* %out, float %in) { |
10335 |
++entry: |
10336 |
++ %0 = fcmp ueq float %in, 5.0 |
10337 |
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00 |
10338 |
++ %2 = fsub float -0.000000e+00, %1 |
10339 |
++ %3 = fptosi float %2 to i32 |
10340 |
++ store i32 %3, i32 addrspace(1)* %out |
10341 |
++ ret void |
10342 |
++} |
10343 |
++ |
10344 |
++; CHECK: @fcmp_ueq_select_i32 |
10345 |
++; CHECK: SETE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00) |
10346 |
++define void @fcmp_ueq_select_i32(i32 addrspace(1)* %out, float %in) { |
10347 |
++entry: |
10348 |
++ %0 = fcmp ueq float %in, 5.0 |
10349 |
++ %1 = select i1 %0, i32 -1, i32 0 |
10350 |
++ store i32 %1, i32 addrspace(1)* %out |
10351 |
++ ret void |
10352 |
++} |
10353 |
++ |
10354 |
++; CHECK: @fcmp_ugt_select_fptosi |
10355 |
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00) |
10356 |
++define void @fcmp_ugt_select_fptosi(i32 addrspace(1)* %out, float %in) { |
10357 |
++entry: |
10358 |
++ %0 = fcmp ugt float %in, 5.0 |
10359 |
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00 |
10360 |
++ %2 = fsub float -0.000000e+00, %1 |
10361 |
++ %3 = fptosi float %2 to i32 |
10362 |
++ store i32 %3, i32 addrspace(1)* %out |
10363 |
++ ret void |
10364 |
++} |
10365 |
++ |
10366 |
++; CHECK: @fcmp_ugt_select_i32 |
10367 |
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00) |
10368 |
++define void @fcmp_ugt_select_i32(i32 addrspace(1)* %out, float %in) { |
10369 |
++entry: |
10370 |
++ %0 = fcmp ugt float %in, 5.0 |
10371 |
++ %1 = select i1 %0, i32 -1, i32 0 |
10372 |
++ store i32 %1, i32 addrspace(1)* %out |
10373 |
++ ret void |
10374 |
++} |
10375 |
++ |
10376 |
++; CHECK: @fcmp_uge_select_fptosi |
10377 |
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00) |
10378 |
++define void @fcmp_uge_select_fptosi(i32 addrspace(1)* %out, float %in) { |
10379 |
++entry: |
10380 |
++ %0 = fcmp uge float %in, 5.0 |
10381 |
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00 |
10382 |
++ %2 = fsub float -0.000000e+00, %1 |
10383 |
++ %3 = fptosi float %2 to i32 |
10384 |
++ store i32 %3, i32 addrspace(1)* %out |
10385 |
++ ret void |
10386 |
++} |
10387 |
++ |
10388 |
++; CHECK: @fcmp_uge_select_i32 |
10389 |
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00) |
10390 |
++define void @fcmp_uge_select_i32(i32 addrspace(1)* %out, float %in) { |
10391 |
++entry: |
10392 |
++ %0 = fcmp uge float %in, 5.0 |
10393 |
++ %1 = select i1 %0, i32 -1, i32 0 |
10394 |
++ store i32 %1, i32 addrspace(1)* %out |
10395 |
++ ret void |
10396 |
++} |
10397 |
++ |
10398 |
++; CHECK: @fcmp_ule_select_fptosi |
10399 |
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00) |
10400 |
++define void @fcmp_ule_select_fptosi(i32 addrspace(1)* %out, float %in) { |
10401 |
++entry: |
10402 |
++ %0 = fcmp ule float %in, 5.0 |
10403 |
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00 |
10404 |
++ %2 = fsub float -0.000000e+00, %1 |
10405 |
++ %3 = fptosi float %2 to i32 |
10406 |
++ store i32 %3, i32 addrspace(1)* %out |
10407 |
++ ret void |
10408 |
++} |
10409 |
++ |
10410 |
++; CHECK: @fcmp_ule_select_i32 |
10411 |
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00) |
10412 |
++define void @fcmp_ule_select_i32(i32 addrspace(1)* %out, float %in) { |
10413 |
++entry: |
10414 |
++ %0 = fcmp ule float %in, 5.0 |
10415 |
++ %1 = select i1 %0, i32 -1, i32 0 |
10416 |
++ store i32 %1, i32 addrspace(1)* %out |
10417 |
++ ret void |
10418 |
++} |
10419 |
++ |
10420 |
++; CHECK: @fcmp_ult_select_fptosi |
10421 |
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00) |
10422 |
++define void @fcmp_ult_select_fptosi(i32 addrspace(1)* %out, float %in) { |
10423 |
++entry: |
10424 |
++ %0 = fcmp ult float %in, 5.0 |
10425 |
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00 |
10426 |
++ %2 = fsub float -0.000000e+00, %1 |
10427 |
++ %3 = fptosi float %2 to i32 |
10428 |
++ store i32 %3, i32 addrspace(1)* %out |
10429 |
++ ret void |
10430 |
++} |
10431 |
++ |
10432 |
++; CHECK: @fcmp_ult_select_i32 |
10433 |
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00) |
10434 |
++define void @fcmp_ult_select_i32(i32 addrspace(1)* %out, float %in) { |
10435 |
++entry: |
10436 |
++ %0 = fcmp ult float %in, 5.0 |
10437 |
++ %1 = select i1 %0, i32 -1, i32 0 |
10438 |
++ store i32 %1, i32 addrspace(1)* %out |
10439 |
++ ret void |
10440 |
++} |
10441 |
diff --git a/test/CodeGen/R600/setcc.v4i32.ll b/test/CodeGen/R600/setcc.v4i32.ll |
10442 |
new file mode 100644 |
10443 |
index 0000000..0752f2e |
10444 |
@@ -22590,12 +24652,13 @@ index 0000000..0752f2e |
10445 |
+} |
10446 |
diff --git a/test/CodeGen/R600/short-args.ll b/test/CodeGen/R600/short-args.ll |
10447 |
new file mode 100644 |
10448 |
-index 0000000..1070250 |
10449 |
+index 0000000..b69e327 |
10450 |
--- /dev/null |
10451 |
+++ b/test/CodeGen/R600/short-args.ll |
10452 |
-@@ -0,0 +1,37 @@ |
10453 |
+@@ -0,0 +1,41 @@ |
10454 |
+; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
10455 |
+ |
10456 |
++; CHECK: @i8_arg |
10457 |
+; CHECK: VTX_READ_8 T{{[0-9]+\.X, T[0-9]+\.X}} |
10458 |
+ |
10459 |
+define void @i8_arg(i32 addrspace(1)* nocapture %out, i8 %in) nounwind { |
10460 |
@@ -22605,6 +24668,7 @@ index 0000000..1070250 |
10461 |
+ ret void |
10462 |
+} |
10463 |
+ |
10464 |
++; CHECK: @i8_zext_arg |
10465 |
+; CHECK: VTX_READ_8 T{{[0-9]+\.X, T[0-9]+\.X}} |
10466 |
+ |
10467 |
+define void @i8_zext_arg(i32 addrspace(1)* nocapture %out, i8 zeroext %in) nounwind { |
10468 |
@@ -22614,6 +24678,7 @@ index 0000000..1070250 |
10469 |
+ ret void |
10470 |
+} |
10471 |
+ |
10472 |
++; CHECK: @i16_arg |
10473 |
+; CHECK: VTX_READ_16 T{{[0-9]+\.X, T[0-9]+\.X}} |
10474 |
+ |
10475 |
+define void @i16_arg(i32 addrspace(1)* nocapture %out, i16 %in) nounwind { |
10476 |
@@ -22623,6 +24688,7 @@ index 0000000..1070250 |
10477 |
+ ret void |
10478 |
+} |
10479 |
+ |
10480 |
++; CHECK: @i16_zext_arg |
10481 |
+; CHECK: VTX_READ_16 T{{[0-9]+\.X, T[0-9]+\.X}} |
10482 |
+ |
10483 |
+define void @i16_zext_arg(i32 addrspace(1)* nocapture %out, i16 zeroext %in) nounwind { |
10484 |
@@ -22682,6 +24748,95 @@ index 0000000..47657a6 |
10485 |
+ store <4 x i32> %result, <4 x i32> addrspace(1)* %out |
10486 |
+ ret void |
10487 |
+} |
10488 |
+diff --git a/test/CodeGen/R600/unsupported-cc.ll b/test/CodeGen/R600/unsupported-cc.ll |
10489 |
+new file mode 100644 |
10490 |
+index 0000000..b48c591 |
10491 |
+--- /dev/null |
10492 |
++++ b/test/CodeGen/R600/unsupported-cc.ll |
10493 |
+@@ -0,0 +1,83 @@ |
10494 |
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
10495 |
++ |
10496 |
++; These tests are for condition codes that are not supported by the hardware |
10497 |
++ |
10498 |
++; CHECK: @slt |
10499 |
++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 5(7.006492e-45) |
10500 |
++define void @slt(i32 addrspace(1)* %out, i32 %in) { |
10501 |
++entry: |
10502 |
++ %0 = icmp slt i32 %in, 5 |
10503 |
++ %1 = select i1 %0, i32 -1, i32 0 |
10504 |
++ store i32 %1, i32 addrspace(1)* %out |
10505 |
++ ret void |
10506 |
++} |
10507 |
++ |
10508 |
++; CHECK: @ult_i32 |
10509 |
++; CHECK: SETGT_UINT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 5(7.006492e-45) |
10510 |
++define void @ult_i32(i32 addrspace(1)* %out, i32 %in) { |
10511 |
++entry: |
10512 |
++ %0 = icmp ult i32 %in, 5 |
10513 |
++ %1 = select i1 %0, i32 -1, i32 0 |
10514 |
++ store i32 %1, i32 addrspace(1)* %out |
10515 |
++ ret void |
10516 |
++} |
10517 |
++ |
10518 |
++; CHECK: @ult_float |
10519 |
++; CHECK: SETGT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00) |
10520 |
++define void @ult_float(float addrspace(1)* %out, float %in) { |
10521 |
++entry: |
10522 |
++ %0 = fcmp ult float %in, 5.0 |
10523 |
++ %1 = select i1 %0, float 1.0, float 0.0 |
10524 |
++ store float %1, float addrspace(1)* %out |
10525 |
++ ret void |
10526 |
++} |
10527 |
++ |
10528 |
++; CHECK: @olt |
10529 |
++; CHECK: SETGT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00) |
10530 |
++define void @olt(float addrspace(1)* %out, float %in) { |
10531 |
++entry: |
10532 |
++ %0 = fcmp olt float %in, 5.0 |
10533 |
++ %1 = select i1 %0, float 1.0, float 0.0 |
10534 |
++ store float %1, float addrspace(1)* %out |
10535 |
++ ret void |
10536 |
++} |
10537 |
++ |
10538 |
++; CHECK: @sle |
10539 |
++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 6(8.407791e-45) |
10540 |
++define void @sle(i32 addrspace(1)* %out, i32 %in) { |
10541 |
++entry: |
10542 |
++ %0 = icmp sle i32 %in, 5 |
10543 |
++ %1 = select i1 %0, i32 -1, i32 0 |
10544 |
++ store i32 %1, i32 addrspace(1)* %out |
10545 |
++ ret void |
10546 |
++} |
10547 |
++ |
10548 |
++; CHECK: @ule_i32 |
10549 |
++; CHECK: SETGT_UINT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 6(8.407791e-45) |
10550 |
++define void @ule_i32(i32 addrspace(1)* %out, i32 %in) { |
10551 |
++entry: |
10552 |
++ %0 = icmp ule i32 %in, 5 |
10553 |
++ %1 = select i1 %0, i32 -1, i32 0 |
10554 |
++ store i32 %1, i32 addrspace(1)* %out |
10555 |
++ ret void |
10556 |
++} |
10557 |
++ |
10558 |
++; CHECK: @ule_float |
10559 |
++; CHECK: SETGE T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00) |
10560 |
++define void @ule_float(float addrspace(1)* %out, float %in) { |
10561 |
++entry: |
10562 |
++ %0 = fcmp ule float %in, 5.0 |
10563 |
++ %1 = select i1 %0, float 1.0, float 0.0 |
10564 |
++ store float %1, float addrspace(1)* %out |
10565 |
++ ret void |
10566 |
++} |
10567 |
++ |
10568 |
++; CHECK: @ole |
10569 |
++; CHECK: SETGE T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00) |
10570 |
++define void @ole(float addrspace(1)* %out, float %in) { |
10571 |
++entry: |
10572 |
++ %0 = fcmp ole float %in, 5.0 |
10573 |
++ %1 = select i1 %0, float 1.0, float 0.0 |
10574 |
++ store float %1, float addrspace(1)* %out |
10575 |
++ ret void |
10576 |
++} |
10577 |
diff --git a/test/CodeGen/R600/urem.v4i32.ll b/test/CodeGen/R600/urem.v4i32.ll |
10578 |
new file mode 100644 |
10579 |
index 0000000..2e7388c |
10580 |
@@ -22705,15 +24860,13 @@ index 0000000..2e7388c |
10581 |
+} |
10582 |
diff --git a/test/CodeGen/R600/vec4-expand.ll b/test/CodeGen/R600/vec4-expand.ll |
10583 |
new file mode 100644 |
10584 |
-index 0000000..47cbf82 |
10585 |
+index 0000000..8f62bc6 |
10586 |
--- /dev/null |
10587 |
+++ b/test/CodeGen/R600/vec4-expand.ll |
10588 |
-@@ -0,0 +1,52 @@ |
10589 |
-+; There are bugs in the DAGCombiner that prevent this test from passing. |
10590 |
-+; XFAIL: * |
10591 |
-+ |
10592 |
+@@ -0,0 +1,53 @@ |
10593 |
+; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s |
10594 |
+ |
10595 |
++; CHECK: @fp_to_sint |
10596 |
+; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10597 |
+; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10598 |
+; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10599 |
@@ -22726,6 +24879,7 @@ index 0000000..47cbf82 |
10600 |
+ ret void |
10601 |
+} |
10602 |
+ |
10603 |
++; CHECK: @fp_to_uint |
10604 |
+; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10605 |
+; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10606 |
+; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10607 |
@@ -22738,6 +24892,7 @@ index 0000000..47cbf82 |
10608 |
+ ret void |
10609 |
+} |
10610 |
+ |
10611 |
++; CHECK: @sint_to_fp |
10612 |
+; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10613 |
+; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10614 |
+; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10615 |
@@ -22750,6 +24905,7 @@ index 0000000..47cbf82 |
10616 |
+ ret void |
10617 |
+} |
10618 |
+ |
10619 |
++; CHECK: @uint_to_fp |
10620 |
+; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10621 |
+; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10622 |
+; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} |
10623 |
@@ -22804,6 +24960,15 @@ index 0000000..62cdcf5 |
10624 |
+declare <4 x float> @llvm.SI.vs.load.input(<4 x i32>, i32, i32) |
10625 |
+ |
10626 |
+declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float) |
10627 |
--- |
10628 |
-1.8.0.2 |
10629 |
- |
10630 |
+diff --git a/test/CodeGen/X86/cvtv2f32.ll b/test/CodeGen/X86/cvtv2f32.ll |
10631 |
+index 466b096..d11bb9e 100644 |
10632 |
+--- a/test/CodeGen/X86/cvtv2f32.ll |
10633 |
++++ b/test/CodeGen/X86/cvtv2f32.ll |
10634 |
+@@ -1,3 +1,7 @@ |
10635 |
++; A bug fix in the DAGCombiner made this test fail, so marking as xfail |
10636 |
++; until this can be investigated further. |
10637 |
++; XFAIL: * |
10638 |
++ |
10639 |
+ ; RUN: llc < %s -mtriple=i686-linux-pc -mcpu=corei7 | FileCheck %s |
10640 |
+ |
10641 |
+ define <2 x float> @foo(i32 %x, i32 %y, <2 x float> %v) { |
10642 |
|
10643 |
diff --git a/sys-devel/llvm/llvm-3.2.ebuild b/sys-devel/llvm/llvm-3.2.ebuild |
10644 |
index 7171bfc..ceb16bb 100644 |
10645 |
--- a/sys-devel/llvm/llvm-3.2.ebuild |
10646 |
+++ b/sys-devel/llvm/llvm-3.2.ebuild |
10647 |
@@ -1,33 +1,38 @@ |
10648 |
-# Copyright 1999-2012 Gentoo Foundation |
10649 |
+# Copyright 1999-2013 Gentoo Foundation |
10650 |
# Distributed under the terms of the GNU General Public License v2 |
10651 |
-# $Header: /var/cvsroot/gentoo-x86/sys-devel/llvm/llvm-3.2.ebuild,v 1.1 2012/12/21 09:18:12 voyageur Exp $ |
10652 |
+# $Header: /var/cvsroot/gentoo-x86/sys-devel/llvm/llvm-3.2.ebuild,v 1.6 2013/02/27 06:02:15 zmedico Exp $ |
10653 |
|
10654 |
EAPI=5 |
10655 |
-PYTHON_DEPEND="2" |
10656 |
-inherit eutils flag-o-matic multilib toolchain-funcs python pax-utils |
10657 |
+ |
10658 |
+# pypy gives me around 1700 unresolved tests due to open file limit |
10659 |
+# being exceeded. probably GC does not close them fast enough. |
10660 |
+PYTHON_COMPAT=( python{2_5,2_6,2_7} ) |
10661 |
+ |
10662 |
+inherit eutils flag-o-matic multilib python-any-r1 toolchain-funcs pax-utils |
10663 |
|
10664 |
DESCRIPTION="Low Level Virtual Machine" |
10665 |
HOMEPAGE="http://llvm.org/" |
10666 |
-SRC_URI="http://llvm.org/releases/${PV}/${P}.src.tar.gz" |
10667 |
+SRC_URI="http://llvm.org/releases/${PV}/${P}.src.tar.gz |
10668 |
+ !doc? ( http://dev.gentoo.org/~voyageur/distfiles/${P}-manpages.tar.bz2 )" |
10669 |
|
10670 |
LICENSE="UoI-NCSA" |
10671 |
SLOT="0" |
10672 |
-KEYWORDS="~amd64 ~arm ~ppc ~x86 ~amd64-fbsd ~x86-fbsd ~x64-freebsd ~amd64-linux ~x86-linux ~ppc-macos ~x64-macos" |
10673 |
+KEYWORDS="~amd64 ~arm ~ppc ~x86 ~amd64-fbsd ~x86-fbsd ~x64-freebsd ~amd64-linux ~arm-linux ~x86-linux ~ppc-macos ~x64-macos" |
10674 |
IUSE="debug doc gold +libffi multitarget ocaml test udis86 vim-syntax" |
10675 |
|
10676 |
DEPEND="dev-lang/perl |
10677 |
- dev-python/sphinx |
10678 |
>=sys-devel/make-3.79 |
10679 |
>=sys-devel/flex-2.5.4 |
10680 |
>=sys-devel/bison-1.875d |
10681 |
|| ( >=sys-devel/gcc-3.0 >=sys-devel/gcc-apple-4.2.1 ) |
10682 |
|| ( >=sys-devel/binutils-2.18 >=sys-devel/binutils-apple-3.2.3 ) |
10683 |
+ doc? ( dev-python/sphinx ) |
10684 |
gold? ( >=sys-devel/binutils-2.22[cxx] ) |
10685 |
libffi? ( virtual/pkgconfig |
10686 |
virtual/libffi ) |
10687 |
ocaml? ( dev-lang/ocaml ) |
10688 |
- udis86? ( amd64? ( dev-libs/udis86[pic] ) |
10689 |
- !amd64? ( dev-libs/udis86 ) )" |
10690 |
+ udis86? ( dev-libs/udis86[pic(+)] ) |
10691 |
+ ${PYTHON_DEPS}" |
10692 |
RDEPEND="dev-lang/perl |
10693 |
libffi? ( virtual/libffi ) |
10694 |
vim-syntax? ( || ( app-editors/vim app-editors/gvim ) )" |
10695 |
@@ -36,8 +41,7 @@ S=${WORKDIR}/${P}.src |
10696 |
|
10697 |
pkg_setup() { |
10698 |
# Required for test and build |
10699 |
- python_set_active_version 2 |
10700 |
- python_pkg_setup |
10701 |
+ python-any-r1_pkg_setup |
10702 |
|
10703 |
# need to check if the active compiler is ok |
10704 |
|
10705 |
@@ -64,12 +68,12 @@ pkg_setup() { |
10706 |
|
10707 |
if [[ ${CHOST} == x86_64-* && ${broken_gcc_amd64} == *" ${version} "* ]]; |
10708 |
then |
10709 |
- elog "Your version of gcc is known to miscompile llvm in amd64" |
10710 |
- elog "architectures. Check" |
10711 |
- elog "http://www.llvm.org/docs/GettingStarted.html for possible" |
10712 |
- elog "solutions." |
10713 |
+ elog "Your version of gcc is known to miscompile llvm in amd64" |
10714 |
+ elog "architectures. Check" |
10715 |
+ elog "http://www.llvm.org/docs/GettingStarted.html for possible" |
10716 |
+ elog "solutions." |
10717 |
die "Your currently active version of gcc is known to miscompile llvm" |
10718 |
- fi |
10719 |
+ fi |
10720 |
} |
10721 |
|
10722 |
src_prepare() { |
10723 |
@@ -96,12 +100,9 @@ src_prepare() { |
10724 |
sed -e "/NO_INSTALL = 1/s/^/#/" -i utils/FileCheck/Makefile \ |
10725 |
|| die "FileCheck Makefile sed failed" |
10726 |
|
10727 |
- # Specify python version |
10728 |
- python_convert_shebangs -r 2 test/Scripts |
10729 |
- |
10730 |
epatch "${FILESDIR}"/${PN}-3.2-nodoctargz.patch |
10731 |
epatch "${FILESDIR}"/${PN}-3.0-PPC_macro.patch |
10732 |
- epatch "${FILESDIR}"/0001-Add-R600-backend.patch |
10733 |
+ epatch "${FILESDIR}"/R600-Mesa-9.1.patch |
10734 |
|
10735 |
# User patches |
10736 |
epatch_user |
10737 |
@@ -150,20 +151,28 @@ src_configure() { |
10738 |
src_compile() { |
10739 |
emake VERBOSE=1 KEEP_SYMBOLS=1 REQUIRES_RTTI=1 |
10740 |
|
10741 |
- emake -C docs -f Makefile.sphinx man |
10742 |
- use doc && emake -C docs -f Makefile.sphinx html |
10743 |
+ if use doc; then |
10744 |
+ emake -C docs -f Makefile.sphinx man html |
10745 |
+ fi |
10746 |
+ # emake -C docs -f Makefile.sphinx html |
10747 |
|
10748 |
pax-mark m Release/bin/lli |
10749 |
if use test; then |
10750 |
pax-mark m unittests/ExecutionEngine/JIT/Release/JITTests |
10751 |
+ pax-mark m unittests/ExecutionEngine/MCJIT/Release/MCJITTests |
10752 |
+ pax-mark m unittests/Support/Release/SupportTests |
10753 |
fi |
10754 |
} |
10755 |
|
10756 |
src_install() { |
10757 |
emake KEEP_SYMBOLS=1 DESTDIR="${D}" install |
10758 |
|
10759 |
- doman docs/_build/man/*.1 |
10760 |
- use doc && dohtml -r docs/_build/html/ |
10761 |
+ if use doc; then |
10762 |
+ doman docs/_build/man/*.1 |
10763 |
+ dohtml -r docs/_build/html/ |
10764 |
+ else |
10765 |
+ doman "${WORKDIR}"/${P}-manpages/*.1 |
10766 |
+ fi |
10767 |
|
10768 |
if use vim-syntax; then |
10769 |
insinto /usr/share/vim/vimfiles/syntax |
10770 |
|
10771 |
diff --git a/sys-devel/llvm/metadata.xml b/sys-devel/llvm/metadata.xml |
10772 |
index e5a362b..38e16d8 100644 |
10773 |
--- a/sys-devel/llvm/metadata.xml |
10774 |
+++ b/sys-devel/llvm/metadata.xml |
10775 |
@@ -16,7 +16,6 @@ |
10776 |
4. LLVM does not imply things that you would expect from a high-level virtual machine. It does not require garbage collection or run-time code generation (In fact, LLVM makes a great static compiler!). Note that optional LLVM components can be used to build high-level virtual machines and other systems that need these services.</longdescription> |
10777 |
<use> |
10778 |
<flag name='gold'>Build the gold linker plugin</flag> |
10779 |
- <flag name='llvm-gcc'>Build LLVM with <pkg>sys-devel/llvm-gcc</pkg></flag> |
10780 |
<flag name='multitarget'>Build all host targets (default: host only)</flag> |
10781 |
<flag name='udis86'>Enable support for <pkg>dev-libs/udis86</pkg> disassembler library</flag> |
10782 |
</use> |