Gentoo Archives: gentoo-commits

From: Alexey Shvetsov <alexxy@g.o>
To: gentoo-commits@l.g.o
Subject: [gentoo-commits] proj/x11:opencl commit in: sys-devel/llvm/files/, sys-devel/llvm/
Date: Tue, 05 Mar 2013 05:38:28
Message-Id: 1362461694.9597d3a8e121c0e0961de5642b1f550700dc8496.alexxy@gentoo
1 commit: 9597d3a8e121c0e0961de5642b1f550700dc8496
2 Author: Alexey Shvetsov <alexxy <AT> gentoo <DOT> org>
3 AuthorDate: Tue Mar 5 05:34:54 2013 +0000
4 Commit: Alexey Shvetsov <alexxy <AT> gentoo <DOT> org>
5 CommitDate: Tue Mar 5 05:34:54 2013 +0000
6 URL: http://git.overlays.gentoo.org/gitweb/?p=proj/x11.git;a=commit;h=9597d3a8
7
8 Update llvm R600 patch
9
10 Package-Manager: portage-2.2.0_alpha166
11 RepoMan-Options: --force
12
13 ---
14 ...-Add-R600-backend.patch => R600-Mesa-9.1.patch} | 7809 +++++++++++++-------
15 sys-devel/llvm/llvm-3.2.ebuild | 57 +-
16 sys-devel/llvm/metadata.xml | 1 -
17 3 files changed, 5020 insertions(+), 2847 deletions(-)
18
19 diff --git a/sys-devel/llvm/files/0001-Add-R600-backend.patch b/sys-devel/llvm/files/R600-Mesa-9.1.patch
20 similarity index 81%
21 rename from sys-devel/llvm/files/0001-Add-R600-backend.patch
22 rename to sys-devel/llvm/files/R600-Mesa-9.1.patch
23 index 4ebe499..9b9e1f5 100644
24 --- a/sys-devel/llvm/files/0001-Add-R600-backend.patch
25 +++ b/sys-devel/llvm/files/R600-Mesa-9.1.patch
26 @@ -1,517 +1,46 @@
27 -From 07d146158af424e4c0aa85a3de49516d97affbb9 Mon Sep 17 00:00:00 2001
28 -From: Tom Stellard <thomas.stellard@×××.com>
29 -Date: Tue, 11 Dec 2012 21:25:42 +0000
30 -Subject: [PATCH] Add R600 backend
31 -MIME-Version: 1.0
32 -Content-Type: text/plain; charset=UTF-8
33 -Content-Transfer-Encoding: 8bit
34 -
35 -A new backend supporting AMD GPUs: Radeon HD2XXX - HD7XXX
36 -
37 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169915 91177308-0d34-0410-b5e6-96231b3b80d8
38 -
39 -Conflicts:
40 - lib/Target/LLVMBuild.txt
41 -
42 -[CMake] Fixup R600.
43 -
44 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169962 91177308-0d34-0410-b5e6-96231b3b80d8
45 -
46 -Avoid setIsInsideBundle in Target/R600.
47 -
48 -This function is going to be removed.
49 -
50 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170064 91177308-0d34-0410-b5e6-96231b3b80d8
51 -
52 -R600: remove nonsense setPrefLoopAlignment
53 -
54 -The Align parameter is a power of two, so 16 results in 64K
55 -alignment. Additional to that even 16 byte alignment doesn't
56 -make any sense, so just remove it.
57 -
58 -Patch by: Christian König
59 -
60 -Reviewed-by: Tom Stellard <thomas.stellard@×××.com>
61 -Tested-by: Michel Dänzer <michel.daenzer@×××.com>
62 -Signed-off-by: Christian König <deathsimple@××××××××.de>
63 -
64 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170341 91177308-0d34-0410-b5e6-96231b3b80d8
65 -
66 -R600: BB operand support for SI
67 -
68 -Patch by: Christian König
69 -
70 -Reviewed-by: Tom Stellard <thomas.stellard@×××.com>
71 -Tested-by: Michel Dänzer <michel.daenzer@×××.com>
72 -Signed-off-by: Christian König <deathsimple@××××××××.de>
73 -
74 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170342 91177308-0d34-0410-b5e6-96231b3b80d8
75 -
76 -R600: enable S_*N2_* instructions
77 -
78 -They seem to work fine.
79 -
80 -Patch by: Christian König
81 -
82 -Reviewed-by: Tom Stellard <thomas.stellard@×××.com>
83 -Tested-by: Michel Dänzer <michel.daenzer@×××.com>
84 -Signed-off-by: Christian König <deathsimple@××××××××.de>
85 -
86 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170343 91177308-0d34-0410-b5e6-96231b3b80d8
87 -
88 -R600: New control flow for SI v2
89 -
90 -This patch replaces the control flow handling with a new
91 -pass which structurize the graph before transforming it to
92 -machine instruction. This has a couple of different advantages
93 -and currently fixes 20 piglit tests without a single regression.
94 -
95 -It is now a general purpose transformation that could be not
96 -only be used for SI/R6xx, but also for other hardware
97 -implementations that use a form of structurized control flow.
98 -
99 -v2: further cleanup, fixes and documentation
100 -
101 -Patch by: Christian König
102 -
103 -Signed-off-by: Christian König <deathsimple@××××××××.de>
104 -Reviewed-by: Tom Stellard <thomas.stellard@×××.com>
105 -Tested-by: Michel Dänzer <michel.daenzer@×××.com>
106 -
107 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170591 91177308-0d34-0410-b5e6-96231b3b80d8
108 -
109 -R600: control flow optimization
110 -
111 -Branch if we have enough instructions so that it makes sense.
112 -Also remove branches if they don't make sense.
113 -
114 -Patch by: Christian König
115 -
116 -Reviewed-by: Tom Stellard <thomas.stellard@×××.com>
117 -Tested-by: Michel Dänzer <michel.daenzer@×××.com>
118 -Signed-off-by: Christian König <deathsimple@××××××××.de>
119 -
120 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170592 91177308-0d34-0410-b5e6-96231b3b80d8
121 -
122 -R600: Remove unecessary VREG alignment.
123 -
124 -Unlike SGPRs VGPRs doesn't need to be aligned.
125 -
126 -Patch by: Christian König
127 -
128 -Reviewed-by: Tom Stellard <thomas.stellard@×××.com>
129 -Tested-by: Michel Dänzer <michel.daenzer@×××.com>
130 -Signed-off-by: Christian König <deathsimple@××××××××.de>
131 -
132 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170593 91177308-0d34-0410-b5e6-96231b3b80d8
133 -
134 -R600: Add entry in CODE_OWNERS.TXT
135 -
136 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170594 91177308-0d34-0410-b5e6-96231b3b80d8
137 -
138 -Conflicts:
139 - CODE_OWNERS.TXT
140 -
141 -Target/R600: Update MIB according to r170588.
142 -
143 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170620 91177308-0d34-0410-b5e6-96231b3b80d8
144 -
145 -R600: Expand vec4 INT <-> FP conversions
146 -
147 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170901 91177308-0d34-0410-b5e6-96231b3b80d8
148 -
149 -R600: Add SHADOWCUBE to TEX_SHADOW pattern
150 -
151 -Patch by: Vadim Girlin
152 -
153 -Reviewed-by: Michel Dänzer <michel.daenzer@×××.com>
154 -
155 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170921 91177308-0d34-0410-b5e6-96231b3b80d8
156 -
157 -R600: Fix MAX_UINT definition
158 -
159 -Patch by: Vadim Girlin
160 -
161 -Reviewed-by: Michel Dänzer <michel.daenzer@×××.com>
162 -
163 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170922 91177308-0d34-0410-b5e6-96231b3b80d8
164 -
165 -R600: Coding style - remove empty spaces from the beginning of functions
166 -
167 -No functionality change.
168 -
169 -git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170923 91177308-0d34-0410-b5e6-96231b3b80d8
170 ----
171 - CODE_OWNERS.TXT | 14 +
172 - include/llvm/Intrinsics.td | 1 +
173 - include/llvm/IntrinsicsR600.td | 36 +
174 - lib/Target/LLVMBuild.txt | 2 +-
175 - lib/Target/R600/AMDGPU.h | 49 +
176 - lib/Target/R600/AMDGPU.td | 40 +
177 - lib/Target/R600/AMDGPUAsmPrinter.cpp | 138 +
178 - lib/Target/R600/AMDGPUAsmPrinter.h | 44 +
179 - lib/Target/R600/AMDGPUCodeEmitter.h | 49 +
180 - lib/Target/R600/AMDGPUConvertToISA.cpp | 62 +
181 - lib/Target/R600/AMDGPUISelLowering.cpp | 417 +++
182 - lib/Target/R600/AMDGPUISelLowering.h | 144 +
183 - lib/Target/R600/AMDGPUInstrInfo.cpp | 257 ++
184 - lib/Target/R600/AMDGPUInstrInfo.h | 149 +
185 - lib/Target/R600/AMDGPUInstrInfo.td | 74 +
186 - lib/Target/R600/AMDGPUInstructions.td | 190 ++
187 - lib/Target/R600/AMDGPUIntrinsics.td | 62 +
188 - lib/Target/R600/AMDGPUMCInstLower.cpp | 83 +
189 - lib/Target/R600/AMDGPUMCInstLower.h | 34 +
190 - lib/Target/R600/AMDGPURegisterInfo.cpp | 51 +
191 - lib/Target/R600/AMDGPURegisterInfo.h | 63 +
192 - lib/Target/R600/AMDGPURegisterInfo.td | 22 +
193 - lib/Target/R600/AMDGPUStructurizeCFG.cpp | 714 +++++
194 - lib/Target/R600/AMDGPUSubtarget.cpp | 87 +
195 - lib/Target/R600/AMDGPUSubtarget.h | 65 +
196 - lib/Target/R600/AMDGPUTargetMachine.cpp | 142 +
197 - lib/Target/R600/AMDGPUTargetMachine.h | 70 +
198 - lib/Target/R600/AMDIL.h | 106 +
199 - lib/Target/R600/AMDIL7XXDevice.cpp | 115 +
200 - lib/Target/R600/AMDIL7XXDevice.h | 72 +
201 - lib/Target/R600/AMDILBase.td | 85 +
202 - lib/Target/R600/AMDILCFGStructurizer.cpp | 3049 ++++++++++++++++++++
203 - lib/Target/R600/AMDILDevice.cpp | 124 +
204 - lib/Target/R600/AMDILDevice.h | 117 +
205 - lib/Target/R600/AMDILDeviceInfo.cpp | 94 +
206 - lib/Target/R600/AMDILDeviceInfo.h | 88 +
207 - lib/Target/R600/AMDILDevices.h | 19 +
208 - lib/Target/R600/AMDILEvergreenDevice.cpp | 169 ++
209 - lib/Target/R600/AMDILEvergreenDevice.h | 93 +
210 - lib/Target/R600/AMDILFrameLowering.cpp | 47 +
211 - lib/Target/R600/AMDILFrameLowering.h | 40 +
212 - lib/Target/R600/AMDILISelDAGToDAG.cpp | 485 ++++
213 - lib/Target/R600/AMDILISelLowering.cpp | 651 +++++
214 - lib/Target/R600/AMDILInstrInfo.td | 208 ++
215 - lib/Target/R600/AMDILIntrinsicInfo.cpp | 79 +
216 - lib/Target/R600/AMDILIntrinsicInfo.h | 49 +
217 - lib/Target/R600/AMDILIntrinsics.td | 242 ++
218 - lib/Target/R600/AMDILNIDevice.cpp | 65 +
219 - lib/Target/R600/AMDILNIDevice.h | 57 +
220 - lib/Target/R600/AMDILPeepholeOptimizer.cpp | 1215 ++++++++
221 - lib/Target/R600/AMDILRegisterInfo.td | 107 +
222 - lib/Target/R600/AMDILSIDevice.cpp | 45 +
223 - lib/Target/R600/AMDILSIDevice.h | 39 +
224 - lib/Target/R600/CMakeLists.txt | 55 +
225 - lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp | 132 +
226 - lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h | 52 +
227 - lib/Target/R600/InstPrinter/CMakeLists.txt | 7 +
228 - lib/Target/R600/InstPrinter/LLVMBuild.txt | 24 +
229 - lib/Target/R600/InstPrinter/Makefile | 15 +
230 - lib/Target/R600/LLVMBuild.txt | 32 +
231 - lib/Target/R600/MCTargetDesc/AMDGPUAsmBackend.cpp | 90 +
232 - lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.cpp | 85 +
233 - lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.h | 30 +
234 - lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h | 60 +
235 - .../R600/MCTargetDesc/AMDGPUMCTargetDesc.cpp | 113 +
236 - lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.h | 55 +
237 - lib/Target/R600/MCTargetDesc/CMakeLists.txt | 10 +
238 - lib/Target/R600/MCTargetDesc/LLVMBuild.txt | 23 +
239 - lib/Target/R600/MCTargetDesc/Makefile | 16 +
240 - lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp | 575 ++++
241 - lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp | 298 ++
242 - lib/Target/R600/Makefile | 23 +
243 - lib/Target/R600/Processors.td | 29 +
244 - lib/Target/R600/R600Defines.h | 79 +
245 - lib/Target/R600/R600ExpandSpecialInstrs.cpp | 334 +++
246 - lib/Target/R600/R600ISelLowering.cpp | 909 ++++++
247 - lib/Target/R600/R600ISelLowering.h | 72 +
248 - lib/Target/R600/R600InstrInfo.cpp | 665 +++++
249 - lib/Target/R600/R600InstrInfo.h | 169 ++
250 - lib/Target/R600/R600Instructions.td | 1724 +++++++++++
251 - lib/Target/R600/R600Intrinsics.td | 32 +
252 - lib/Target/R600/R600MachineFunctionInfo.cpp | 34 +
253 - lib/Target/R600/R600MachineFunctionInfo.h | 39 +
254 - lib/Target/R600/R600RegisterInfo.cpp | 89 +
255 - lib/Target/R600/R600RegisterInfo.h | 55 +
256 - lib/Target/R600/R600RegisterInfo.td | 107 +
257 - lib/Target/R600/R600Schedule.td | 36 +
258 - lib/Target/R600/SIAnnotateControlFlow.cpp | 330 +++
259 - lib/Target/R600/SIAssignInterpRegs.cpp | 152 +
260 - lib/Target/R600/SIISelLowering.cpp | 512 ++++
261 - lib/Target/R600/SIISelLowering.h | 62 +
262 - lib/Target/R600/SIInstrFormats.td | 146 +
263 - lib/Target/R600/SIInstrInfo.cpp | 90 +
264 - lib/Target/R600/SIInstrInfo.h | 62 +
265 - lib/Target/R600/SIInstrInfo.td | 589 ++++
266 - lib/Target/R600/SIInstructions.td | 1351 +++++++++
267 - lib/Target/R600/SIIntrinsics.td | 52 +
268 - lib/Target/R600/SILowerControlFlow.cpp | 331 +++
269 - lib/Target/R600/SILowerLiteralConstants.cpp | 108 +
270 - lib/Target/R600/SIMachineFunctionInfo.cpp | 20 +
271 - lib/Target/R600/SIMachineFunctionInfo.h | 34 +
272 - lib/Target/R600/SIRegisterInfo.cpp | 48 +
273 - lib/Target/R600/SIRegisterInfo.h | 47 +
274 - lib/Target/R600/SIRegisterInfo.td | 167 ++
275 - lib/Target/R600/SISchedule.td | 15 +
276 - lib/Target/R600/TargetInfo/AMDGPUTargetInfo.cpp | 26 +
277 - lib/Target/R600/TargetInfo/CMakeLists.txt | 7 +
278 - lib/Target/R600/TargetInfo/LLVMBuild.txt | 23 +
279 - lib/Target/R600/TargetInfo/Makefile | 15 +
280 - test/CodeGen/R600/add.v4i32.ll | 15 +
281 - test/CodeGen/R600/and.v4i32.ll | 15 +
282 - test/CodeGen/R600/fabs.ll | 16 +
283 - test/CodeGen/R600/fadd.ll | 16 +
284 - test/CodeGen/R600/fadd.v4f32.ll | 15 +
285 - test/CodeGen/R600/fcmp-cnd.ll | 14 +
286 - test/CodeGen/R600/fcmp-cnde-int-args.ll | 16 +
287 - test/CodeGen/R600/fcmp.ll | 16 +
288 - test/CodeGen/R600/fdiv.v4f32.ll | 19 +
289 - test/CodeGen/R600/floor.ll | 16 +
290 - test/CodeGen/R600/fmax.ll | 16 +
291 - test/CodeGen/R600/fmin.ll | 16 +
292 - test/CodeGen/R600/fmul.ll | 16 +
293 - test/CodeGen/R600/fmul.v4f32.ll | 15 +
294 - test/CodeGen/R600/fsub.ll | 17 +
295 - test/CodeGen/R600/fsub.v4f32.ll | 15 +
296 - test/CodeGen/R600/i8_to_double_to_float.ll | 11 +
297 - test/CodeGen/R600/icmp-select-sete-reverse-args.ll | 18 +
298 - test/CodeGen/R600/lit.local.cfg | 13 +
299 - test/CodeGen/R600/literals.ll | 30 +
300 - test/CodeGen/R600/llvm.AMDGPU.mul.ll | 17 +
301 - test/CodeGen/R600/llvm.AMDGPU.trunc.ll | 16 +
302 - test/CodeGen/R600/llvm.cos.ll | 16 +
303 - test/CodeGen/R600/llvm.pow.ll | 19 +
304 - test/CodeGen/R600/llvm.sin.ll | 16 +
305 - test/CodeGen/R600/load.constant_addrspace.f32.ll | 9 +
306 - test/CodeGen/R600/load.i8.ll | 10 +
307 - test/CodeGen/R600/reciprocal.ll | 16 +
308 - test/CodeGen/R600/sdiv.ll | 21 +
309 - test/CodeGen/R600/selectcc-icmp-select-float.ll | 15 +
310 - test/CodeGen/R600/selectcc_cnde.ll | 11 +
311 - test/CodeGen/R600/selectcc_cnde_int.ll | 11 +
312 - test/CodeGen/R600/setcc.v4i32.ll | 12 +
313 - test/CodeGen/R600/short-args.ll | 37 +
314 - test/CodeGen/R600/store.v4f32.ll | 9 +
315 - test/CodeGen/R600/store.v4i32.ll | 9 +
316 - test/CodeGen/R600/udiv.v4i32.ll | 15 +
317 - test/CodeGen/R600/urem.v4i32.ll | 15 +
318 - test/CodeGen/R600/vec4-expand.ll | 52 +
319 - test/CodeGen/SI/sanity.ll | 37 +
320 - 149 files changed, 21461 insertions(+), 1 deletion(-)
321 - create mode 100644 include/llvm/IntrinsicsR600.td
322 - create mode 100644 lib/Target/R600/AMDGPU.h
323 - create mode 100644 lib/Target/R600/AMDGPU.td
324 - create mode 100644 lib/Target/R600/AMDGPUAsmPrinter.cpp
325 - create mode 100644 lib/Target/R600/AMDGPUAsmPrinter.h
326 - create mode 100644 lib/Target/R600/AMDGPUCodeEmitter.h
327 - create mode 100644 lib/Target/R600/AMDGPUConvertToISA.cpp
328 - create mode 100644 lib/Target/R600/AMDGPUISelLowering.cpp
329 - create mode 100644 lib/Target/R600/AMDGPUISelLowering.h
330 - create mode 100644 lib/Target/R600/AMDGPUInstrInfo.cpp
331 - create mode 100644 lib/Target/R600/AMDGPUInstrInfo.h
332 - create mode 100644 lib/Target/R600/AMDGPUInstrInfo.td
333 - create mode 100644 lib/Target/R600/AMDGPUInstructions.td
334 - create mode 100644 lib/Target/R600/AMDGPUIntrinsics.td
335 - create mode 100644 lib/Target/R600/AMDGPUMCInstLower.cpp
336 - create mode 100644 lib/Target/R600/AMDGPUMCInstLower.h
337 - create mode 100644 lib/Target/R600/AMDGPURegisterInfo.cpp
338 - create mode 100644 lib/Target/R600/AMDGPURegisterInfo.h
339 - create mode 100644 lib/Target/R600/AMDGPURegisterInfo.td
340 - create mode 100644 lib/Target/R600/AMDGPUStructurizeCFG.cpp
341 - create mode 100644 lib/Target/R600/AMDGPUSubtarget.cpp
342 - create mode 100644 lib/Target/R600/AMDGPUSubtarget.h
343 - create mode 100644 lib/Target/R600/AMDGPUTargetMachine.cpp
344 - create mode 100644 lib/Target/R600/AMDGPUTargetMachine.h
345 - create mode 100644 lib/Target/R600/AMDIL.h
346 - create mode 100644 lib/Target/R600/AMDIL7XXDevice.cpp
347 - create mode 100644 lib/Target/R600/AMDIL7XXDevice.h
348 - create mode 100644 lib/Target/R600/AMDILBase.td
349 - create mode 100644 lib/Target/R600/AMDILCFGStructurizer.cpp
350 - create mode 100644 lib/Target/R600/AMDILDevice.cpp
351 - create mode 100644 lib/Target/R600/AMDILDevice.h
352 - create mode 100644 lib/Target/R600/AMDILDeviceInfo.cpp
353 - create mode 100644 lib/Target/R600/AMDILDeviceInfo.h
354 - create mode 100644 lib/Target/R600/AMDILDevices.h
355 - create mode 100644 lib/Target/R600/AMDILEvergreenDevice.cpp
356 - create mode 100644 lib/Target/R600/AMDILEvergreenDevice.h
357 - create mode 100644 lib/Target/R600/AMDILFrameLowering.cpp
358 - create mode 100644 lib/Target/R600/AMDILFrameLowering.h
359 - create mode 100644 lib/Target/R600/AMDILISelDAGToDAG.cpp
360 - create mode 100644 lib/Target/R600/AMDILISelLowering.cpp
361 - create mode 100644 lib/Target/R600/AMDILInstrInfo.td
362 - create mode 100644 lib/Target/R600/AMDILIntrinsicInfo.cpp
363 - create mode 100644 lib/Target/R600/AMDILIntrinsicInfo.h
364 - create mode 100644 lib/Target/R600/AMDILIntrinsics.td
365 - create mode 100644 lib/Target/R600/AMDILNIDevice.cpp
366 - create mode 100644 lib/Target/R600/AMDILNIDevice.h
367 - create mode 100644 lib/Target/R600/AMDILPeepholeOptimizer.cpp
368 - create mode 100644 lib/Target/R600/AMDILRegisterInfo.td
369 - create mode 100644 lib/Target/R600/AMDILSIDevice.cpp
370 - create mode 100644 lib/Target/R600/AMDILSIDevice.h
371 - create mode 100644 lib/Target/R600/CMakeLists.txt
372 - create mode 100644 lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp
373 - create mode 100644 lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h
374 - create mode 100644 lib/Target/R600/InstPrinter/CMakeLists.txt
375 - create mode 100644 lib/Target/R600/InstPrinter/LLVMBuild.txt
376 - create mode 100644 lib/Target/R600/InstPrinter/Makefile
377 - create mode 100644 lib/Target/R600/LLVMBuild.txt
378 - create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUAsmBackend.cpp
379 - create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.cpp
380 - create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.h
381 - create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h
382 - create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.cpp
383 - create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.h
384 - create mode 100644 lib/Target/R600/MCTargetDesc/CMakeLists.txt
385 - create mode 100644 lib/Target/R600/MCTargetDesc/LLVMBuild.txt
386 - create mode 100644 lib/Target/R600/MCTargetDesc/Makefile
387 - create mode 100644 lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
388 - create mode 100644 lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp
389 - create mode 100644 lib/Target/R600/Makefile
390 - create mode 100644 lib/Target/R600/Processors.td
391 - create mode 100644 lib/Target/R600/R600Defines.h
392 - create mode 100644 lib/Target/R600/R600ExpandSpecialInstrs.cpp
393 - create mode 100644 lib/Target/R600/R600ISelLowering.cpp
394 - create mode 100644 lib/Target/R600/R600ISelLowering.h
395 - create mode 100644 lib/Target/R600/R600InstrInfo.cpp
396 - create mode 100644 lib/Target/R600/R600InstrInfo.h
397 - create mode 100644 lib/Target/R600/R600Instructions.td
398 - create mode 100644 lib/Target/R600/R600Intrinsics.td
399 - create mode 100644 lib/Target/R600/R600MachineFunctionInfo.cpp
400 - create mode 100644 lib/Target/R600/R600MachineFunctionInfo.h
401 - create mode 100644 lib/Target/R600/R600RegisterInfo.cpp
402 - create mode 100644 lib/Target/R600/R600RegisterInfo.h
403 - create mode 100644 lib/Target/R600/R600RegisterInfo.td
404 - create mode 100644 lib/Target/R600/R600Schedule.td
405 - create mode 100644 lib/Target/R600/SIAnnotateControlFlow.cpp
406 - create mode 100644 lib/Target/R600/SIAssignInterpRegs.cpp
407 - create mode 100644 lib/Target/R600/SIISelLowering.cpp
408 - create mode 100644 lib/Target/R600/SIISelLowering.h
409 - create mode 100644 lib/Target/R600/SIInstrFormats.td
410 - create mode 100644 lib/Target/R600/SIInstrInfo.cpp
411 - create mode 100644 lib/Target/R600/SIInstrInfo.h
412 - create mode 100644 lib/Target/R600/SIInstrInfo.td
413 - create mode 100644 lib/Target/R600/SIInstructions.td
414 - create mode 100644 lib/Target/R600/SIIntrinsics.td
415 - create mode 100644 lib/Target/R600/SILowerControlFlow.cpp
416 - create mode 100644 lib/Target/R600/SILowerLiteralConstants.cpp
417 - create mode 100644 lib/Target/R600/SIMachineFunctionInfo.cpp
418 - create mode 100644 lib/Target/R600/SIMachineFunctionInfo.h
419 - create mode 100644 lib/Target/R600/SIRegisterInfo.cpp
420 - create mode 100644 lib/Target/R600/SIRegisterInfo.h
421 - create mode 100644 lib/Target/R600/SIRegisterInfo.td
422 - create mode 100644 lib/Target/R600/SISchedule.td
423 - create mode 100644 lib/Target/R600/TargetInfo/AMDGPUTargetInfo.cpp
424 - create mode 100644 lib/Target/R600/TargetInfo/CMakeLists.txt
425 - create mode 100644 lib/Target/R600/TargetInfo/LLVMBuild.txt
426 - create mode 100644 lib/Target/R600/TargetInfo/Makefile
427 - create mode 100644 test/CodeGen/R600/add.v4i32.ll
428 - create mode 100644 test/CodeGen/R600/and.v4i32.ll
429 - create mode 100644 test/CodeGen/R600/fabs.ll
430 - create mode 100644 test/CodeGen/R600/fadd.ll
431 - create mode 100644 test/CodeGen/R600/fadd.v4f32.ll
432 - create mode 100644 test/CodeGen/R600/fcmp-cnd.ll
433 - create mode 100644 test/CodeGen/R600/fcmp-cnde-int-args.ll
434 - create mode 100644 test/CodeGen/R600/fcmp.ll
435 - create mode 100644 test/CodeGen/R600/fdiv.v4f32.ll
436 - create mode 100644 test/CodeGen/R600/floor.ll
437 - create mode 100644 test/CodeGen/R600/fmax.ll
438 - create mode 100644 test/CodeGen/R600/fmin.ll
439 - create mode 100644 test/CodeGen/R600/fmul.ll
440 - create mode 100644 test/CodeGen/R600/fmul.v4f32.ll
441 - create mode 100644 test/CodeGen/R600/fsub.ll
442 - create mode 100644 test/CodeGen/R600/fsub.v4f32.ll
443 - create mode 100644 test/CodeGen/R600/i8_to_double_to_float.ll
444 - create mode 100644 test/CodeGen/R600/icmp-select-sete-reverse-args.ll
445 - create mode 100644 test/CodeGen/R600/lit.local.cfg
446 - create mode 100644 test/CodeGen/R600/literals.ll
447 - create mode 100644 test/CodeGen/R600/llvm.AMDGPU.mul.ll
448 - create mode 100644 test/CodeGen/R600/llvm.AMDGPU.trunc.ll
449 - create mode 100644 test/CodeGen/R600/llvm.cos.ll
450 - create mode 100644 test/CodeGen/R600/llvm.pow.ll
451 - create mode 100644 test/CodeGen/R600/llvm.sin.ll
452 - create mode 100644 test/CodeGen/R600/load.constant_addrspace.f32.ll
453 - create mode 100644 test/CodeGen/R600/load.i8.ll
454 - create mode 100644 test/CodeGen/R600/reciprocal.ll
455 - create mode 100644 test/CodeGen/R600/sdiv.ll
456 - create mode 100644 test/CodeGen/R600/selectcc-icmp-select-float.ll
457 - create mode 100644 test/CodeGen/R600/selectcc_cnde.ll
458 - create mode 100644 test/CodeGen/R600/selectcc_cnde_int.ll
459 - create mode 100644 test/CodeGen/R600/setcc.v4i32.ll
460 - create mode 100644 test/CodeGen/R600/short-args.ll
461 - create mode 100644 test/CodeGen/R600/store.v4f32.ll
462 - create mode 100644 test/CodeGen/R600/store.v4i32.ll
463 - create mode 100644 test/CodeGen/R600/udiv.v4i32.ll
464 - create mode 100644 test/CodeGen/R600/urem.v4i32.ll
465 - create mode 100644 test/CodeGen/R600/vec4-expand.ll
466 - create mode 100644 test/CodeGen/SI/sanity.ll
467 -
468 -diff --git a/CODE_OWNERS.TXT b/CODE_OWNERS.TXT
469 -index fd7bcda..90285be 100644
470 ---- a/CODE_OWNERS.TXT
471 -+++ b/CODE_OWNERS.TXT
472 -@@ -49,3 +49,17 @@ D: Register allocators and TableGen
473 - N: Duncan Sands
474 - E: baldrick@××××.fr
475 - D: DragonEgg
476 -+
477 -+N: Tom Stellard
478 -+E: thomas.stellard@×××.com
479 -+E: mesa-dev@×××××××××××××××××.org
480 -+D: R600 Backend
481 -+
482 -+N: Andrew Trick
483 -+E: atrick@×××××.com
484 -+D: IndVar Simplify, Loop Strength Reduction, Instruction Scheduling
485 -+
486 -+N: Bill Wendling
487 -+E: wendling@×××××.com
488 -+D: libLTO & IR Linker
489 -+
490 -diff --git a/include/llvm/Intrinsics.td b/include/llvm/Intrinsics.td
491 -index 2e1597f..059bd80 100644
492 ---- a/include/llvm/Intrinsics.td
493 -+++ b/include/llvm/Intrinsics.td
494 -@@ -469,3 +469,4 @@ include "llvm/IntrinsicsXCore.td"
495 - include "llvm/IntrinsicsHexagon.td"
496 - include "llvm/IntrinsicsNVVM.td"
497 - include "llvm/IntrinsicsMips.td"
498 -+include "llvm/IntrinsicsR600.td"
499 -diff --git a/include/llvm/IntrinsicsR600.td b/include/llvm/IntrinsicsR600.td
500 -new file mode 100644
501 -index 0000000..ecb5668
502 ---- /dev/null
503 -+++ b/include/llvm/IntrinsicsR600.td
504 -@@ -0,0 +1,36 @@
505 -+//===- IntrinsicsR600.td - Defines R600 intrinsics ---------*- tablegen -*-===//
506 -+//
507 -+// The LLVM Compiler Infrastructure
508 -+//
509 -+// This file is distributed under the University of Illinois Open Source
510 -+// License. See LICENSE.TXT for details.
511 -+//
512 -+//===----------------------------------------------------------------------===//
513 -+//
514 -+// This file defines all of the R600-specific intrinsics.
515 -+//
516 -+//===----------------------------------------------------------------------===//
517 -+
518 -+let TargetPrefix = "r600" in {
519 -+
520 -+class R600ReadPreloadRegisterIntrinsic<string name>
521 -+ : Intrinsic<[llvm_i32_ty], [], [IntrNoMem]>,
522 -+ GCCBuiltin<name>;
523 -+
524 -+multiclass R600ReadPreloadRegisterIntrinsic_xyz<string prefix> {
525 -+ def _x : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_x")>;
526 -+ def _y : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_y")>;
527 -+ def _z : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_z")>;
528 -+}
529 -+
530 -+defm int_r600_read_global_size : R600ReadPreloadRegisterIntrinsic_xyz <
531 -+ "__builtin_r600_read_global_size">;
532 -+defm int_r600_read_local_size : R600ReadPreloadRegisterIntrinsic_xyz <
533 -+ "__builtin_r600_read_local_size">;
534 -+defm int_r600_read_ngroups : R600ReadPreloadRegisterIntrinsic_xyz <
535 -+ "__builtin_r600_read_ngroups">;
536 -+defm int_r600_read_tgid : R600ReadPreloadRegisterIntrinsic_xyz <
537 -+ "__builtin_r600_read_tgid">;
538 -+defm int_r600_read_tidig : R600ReadPreloadRegisterIntrinsic_xyz <
539 -+ "__builtin_r600_read_tidig">;
540 -+} // End TargetPrefix = "r600"
541 +diff --git a/autoconf/configure.ac b/autoconf/configure.ac
542 +index 7715531..1330c36 100644
543 +--- a/autoconf/configure.ac
544 ++++ b/autoconf/configure.ac
545 +@@ -751,6 +751,11 @@ AC_ARG_ENABLE([experimental-targets],AS_HELP_STRING([--enable-experimental-targe
546 +
547 + if test ${enableval} != "disable"
548 + then
549 ++ if test ${enableval} = "AMDGPU"
550 ++ then
551 ++ AC_MSG_ERROR([The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600])
552 ++ enableval="R600"
553 ++ fi
554 + TARGETS_TO_BUILD="$enableval $TARGETS_TO_BUILD"
555 + fi
556 +
557 +diff --git a/configure b/configure
558 +index 4fa0705..02012b9 100755
559 +--- a/configure
560 ++++ b/configure
561 +@@ -5473,6 +5473,13 @@ fi
562 +
563 + if test ${enableval} != "disable"
564 + then
565 ++ if test ${enableval} = "AMDGPU"
566 ++ then
567 ++ { { echo "$as_me:$LINENO: error: The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600" >&5
568 ++echo "$as_me: error: The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600" >&2;}
569 ++ { (exit 1); exit 1; }; }
570 ++ enableval="R600"
571 ++ fi
572 + TARGETS_TO_BUILD="$enableval $TARGETS_TO_BUILD"
573 + fi
574 +
575 +@@ -10316,7 +10323,7 @@ else
576 + lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
577 + lt_status=$lt_dlunknown
578 + cat > conftest.$ac_ext <<EOF
579 +-#line 10317 "configure"
580 ++#line 10326 "configure"
581 + #include "confdefs.h"
582 +
583 + #if HAVE_DLFCN_H
584 diff --git a/lib/Target/LLVMBuild.txt b/lib/Target/LLVMBuild.txt
585 index 8995080..84c4111 100644
586 --- a/lib/Target/LLVMBuild.txt
587 @@ -527,10 +56,10 @@ index 8995080..84c4111 100644
588 ; with the best execution engine (the native JIT, if available, or the
589 diff --git a/lib/Target/R600/AMDGPU.h b/lib/Target/R600/AMDGPU.h
590 new file mode 100644
591 -index 0000000..0f5125d
592 +index 0000000..ba87918
593 --- /dev/null
594 +++ b/lib/Target/R600/AMDGPU.h
595 -@@ -0,0 +1,49 @@
596 +@@ -0,0 +1,51 @@
597 +//===-- AMDGPU.h - MachineFunction passes hw codegen --------------*- C++ -*-=//
598 +//
599 +// The LLVM Compiler Infrastructure
600 @@ -556,17 +85,19 @@ index 0000000..0f5125d
601 +// R600 Passes
602 +FunctionPass* createR600KernelParametersPass(const DataLayout *TD);
603 +FunctionPass *createR600ExpandSpecialInstrsPass(TargetMachine &tm);
604 ++FunctionPass *createR600LowerConstCopy(TargetMachine &tm);
605 +
606 +// SI Passes
607 +FunctionPass *createSIAnnotateControlFlowPass();
608 +FunctionPass *createSIAssignInterpRegsPass(TargetMachine &tm);
609 +FunctionPass *createSILowerControlFlowPass(TargetMachine &tm);
610 +FunctionPass *createSICodeEmitterPass(formatted_raw_ostream &OS);
611 -+FunctionPass *createSILowerLiteralConstantsPass(TargetMachine &tm);
612 ++FunctionPass *createSIInsertWaits(TargetMachine &tm);
613 +
614 +// Passes common to R600 and SI
615 +Pass *createAMDGPUStructurizeCFGPass();
616 +FunctionPass *createAMDGPUConvertToISAPass(TargetMachine &tm);
617 ++FunctionPass* createAMDGPUIndirectAddressingPass(TargetMachine &tm);
618 +
619 +} // End namespace llvm
620 +
621 @@ -628,10 +159,10 @@ index 0000000..40f4741
622 +include "AMDGPUInstructions.td"
623 diff --git a/lib/Target/R600/AMDGPUAsmPrinter.cpp b/lib/Target/R600/AMDGPUAsmPrinter.cpp
624 new file mode 100644
625 -index 0000000..4553c45
626 +index 0000000..254e62e
627 --- /dev/null
628 +++ b/lib/Target/R600/AMDGPUAsmPrinter.cpp
629 -@@ -0,0 +1,138 @@
630 +@@ -0,0 +1,145 @@
631 +//===-- AMDGPUAsmPrinter.cpp - AMDGPU Assebly printer --------------------===//
632 +//
633 +// The LLVM Compiler Infrastructure
634 @@ -681,6 +212,9 @@ index 0000000..4553c45
635 +#endif
636 + }
637 + SetupMachineFunction(MF);
638 ++ if (OutStreamer.hasRawTextSupport()) {
639 ++ OutStreamer.EmitRawText("@" + MF.getName() + ":");
640 ++ }
641 + OutStreamer.SwitchSection(getObjFileLowering().getTextSection());
642 + if (STM.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) {
643 + EmitProgramInfo(MF);
644 @@ -722,8 +256,6 @@ index 0000000..4553c45
645 + switch (reg) {
646 + default: break;
647 + case AMDGPU::EXEC:
648 -+ case AMDGPU::SI_LITERAL_CONSTANT:
649 -+ case AMDGPU::SREG_LIT_0:
650 + case AMDGPU::M0:
651 + continue;
652 + }
653 @@ -749,10 +281,16 @@ index 0000000..4553c45
654 + } else if (AMDGPU::SReg_256RegClass.contains(reg)) {
655 + isSGPR = true;
656 + width = 8;
657 ++ } else if (AMDGPU::VReg_256RegClass.contains(reg)) {
658 ++ isSGPR = false;
659 ++ width = 8;
660 ++ } else if (AMDGPU::VReg_512RegClass.contains(reg)) {
661 ++ isSGPR = false;
662 ++ width = 16;
663 + } else {
664 + assert(!"Unknown register class");
665 + }
666 -+ hwReg = RI->getEncodingValue(reg);
667 ++ hwReg = RI->getEncodingValue(reg) & 0xff;
668 + maxUsed = hwReg + width - 1;
669 + if (isSGPR) {
670 + MaxSGPR = maxUsed > MaxSGPR ? maxUsed : MaxSGPR;
671 @@ -820,61 +358,6 @@ index 0000000..3812282
672 +} // End anonymous llvm
673 +
674 +#endif //AMDGPU_ASMPRINTER_H
675 -diff --git a/lib/Target/R600/AMDGPUCodeEmitter.h b/lib/Target/R600/AMDGPUCodeEmitter.h
676 -new file mode 100644
677 -index 0000000..84f3588
678 ---- /dev/null
679 -+++ b/lib/Target/R600/AMDGPUCodeEmitter.h
680 -@@ -0,0 +1,49 @@
681 -+//===-- AMDGPUCodeEmitter.h - AMDGPU Code Emitter interface -----------------===//
682 -+//
683 -+// The LLVM Compiler Infrastructure
684 -+//
685 -+// This file is distributed under the University of Illinois Open Source
686 -+// License. See LICENSE.TXT for details.
687 -+//
688 -+//===----------------------------------------------------------------------===//
689 -+//
690 -+/// \file
691 -+/// \brief CodeEmitter interface for R600 and SI codegen.
692 -+//
693 -+//===----------------------------------------------------------------------===//
694 -+
695 -+#ifndef AMDGPUCODEEMITTER_H
696 -+#define AMDGPUCODEEMITTER_H
697 -+
698 -+namespace llvm {
699 -+
700 -+class AMDGPUCodeEmitter {
701 -+public:
702 -+ uint64_t getBinaryCodeForInstr(const MachineInstr &MI) const;
703 -+ virtual uint64_t getMachineOpValue(const MachineInstr &MI,
704 -+ const MachineOperand &MO) const { return 0; }
705 -+ virtual unsigned GPR4AlignEncode(const MachineInstr &MI,
706 -+ unsigned OpNo) const {
707 -+ return 0;
708 -+ }
709 -+ virtual unsigned GPR2AlignEncode(const MachineInstr &MI,
710 -+ unsigned OpNo) const {
711 -+ return 0;
712 -+ }
713 -+ virtual uint64_t VOPPostEncode(const MachineInstr &MI,
714 -+ uint64_t Value) const {
715 -+ return Value;
716 -+ }
717 -+ virtual uint64_t i32LiteralEncode(const MachineInstr &MI,
718 -+ unsigned OpNo) const {
719 -+ return 0;
720 -+ }
721 -+ virtual uint32_t SMRDmemriEncode(const MachineInstr &MI, unsigned OpNo)
722 -+ const {
723 -+ return 0;
724 -+ }
725 -+};
726 -+
727 -+} // End namespace llvm
728 -+
729 -+#endif // AMDGPUCODEEMITTER_H
730 diff --git a/lib/Target/R600/AMDGPUConvertToISA.cpp b/lib/Target/R600/AMDGPUConvertToISA.cpp
731 new file mode 100644
732 index 0000000..50297d1
733 @@ -943,12 +426,190 @@ index 0000000..50297d1
734 + }
735 + return false;
736 +}
737 +diff --git a/lib/Target/R600/AMDGPUFrameLowering.cpp b/lib/Target/R600/AMDGPUFrameLowering.cpp
738 +new file mode 100644
739 +index 0000000..a3b6936
740 +--- /dev/null
741 ++++ b/lib/Target/R600/AMDGPUFrameLowering.cpp
742 +@@ -0,0 +1,122 @@
743 ++//===----------------------- AMDGPUFrameLowering.cpp ----------------------===//
744 ++//
745 ++// The LLVM Compiler Infrastructure
746 ++//
747 ++// This file is distributed under the University of Illinois Open Source
748 ++// License. See LICENSE.TXT for details.
749 ++//
750 ++//==-----------------------------------------------------------------------===//
751 ++//
752 ++// Interface to describe a layout of a stack frame on a AMDIL target machine
753 ++//
754 ++//===----------------------------------------------------------------------===//
755 ++#include "AMDGPUFrameLowering.h"
756 ++#include "AMDGPURegisterInfo.h"
757 ++#include "R600MachineFunctionInfo.h"
758 ++#include "llvm/CodeGen/MachineFrameInfo.h"
759 ++#include "llvm/CodeGen/MachineRegisterInfo.h"
760 ++#include "llvm/Instructions.h"
761 ++
762 ++using namespace llvm;
763 ++AMDGPUFrameLowering::AMDGPUFrameLowering(StackDirection D, unsigned StackAl,
764 ++ int LAO, unsigned TransAl)
765 ++ : TargetFrameLowering(D, StackAl, LAO, TransAl) { }
766 ++
767 ++AMDGPUFrameLowering::~AMDGPUFrameLowering() { }
768 ++
769 ++unsigned AMDGPUFrameLowering::getStackWidth(const MachineFunction &MF) const {
770 ++
771 ++ // XXX: Hardcoding to 1 for now.
772 ++ //
773 ++ // I think the StackWidth should stored as metadata associated with the
774 ++ // MachineFunction. This metadata can either be added by a frontend, or
775 ++ // calculated by a R600 specific LLVM IR pass.
776 ++ //
777 ++ // The StackWidth determines how stack objects are laid out in memory.
778 ++ // For a vector stack variable, like: int4 stack[2], the data will be stored
779 ++ // in the following ways depending on the StackWidth.
780 ++ //
781 ++ // StackWidth = 1:
782 ++ //
783 ++ // T0.X = stack[0].x
784 ++ // T1.X = stack[0].y
785 ++ // T2.X = stack[0].z
786 ++ // T3.X = stack[0].w
787 ++ // T4.X = stack[1].x
788 ++ // T5.X = stack[1].y
789 ++ // T6.X = stack[1].z
790 ++ // T7.X = stack[1].w
791 ++ //
792 ++ // StackWidth = 2:
793 ++ //
794 ++ // T0.X = stack[0].x
795 ++ // T0.Y = stack[0].y
796 ++ // T1.X = stack[0].z
797 ++ // T1.Y = stack[0].w
798 ++ // T2.X = stack[1].x
799 ++ // T2.Y = stack[1].y
800 ++ // T3.X = stack[1].z
801 ++ // T3.Y = stack[1].w
802 ++ //
803 ++ // StackWidth = 4:
804 ++ // T0.X = stack[0].x
805 ++ // T0.Y = stack[0].y
806 ++ // T0.Z = stack[0].z
807 ++ // T0.W = stack[0].w
808 ++ // T1.X = stack[1].x
809 ++ // T1.Y = stack[1].y
810 ++ // T1.Z = stack[1].z
811 ++ // T1.W = stack[1].w
812 ++ return 1;
813 ++}
814 ++
815 ++/// \returns The number of registers allocated for \p FI.
816 ++int AMDGPUFrameLowering::getFrameIndexOffset(const MachineFunction &MF,
817 ++ int FI) const {
818 ++ const MachineFrameInfo *MFI = MF.getFrameInfo();
819 ++ unsigned Offset = 0;
820 ++ int UpperBound = FI == -1 ? MFI->getNumObjects() : FI;
821 ++
822 ++ for (int i = MFI->getObjectIndexBegin(); i < UpperBound; ++i) {
823 ++ const AllocaInst *Alloca = MFI->getObjectAllocation(i);
824 ++ unsigned ArrayElements;
825 ++ const Type *AllocaType = Alloca->getAllocatedType();
826 ++ const Type *ElementType;
827 ++
828 ++ if (AllocaType->isArrayTy()) {
829 ++ ArrayElements = AllocaType->getArrayNumElements();
830 ++ ElementType = AllocaType->getArrayElementType();
831 ++ } else {
832 ++ ArrayElements = 1;
833 ++ ElementType = AllocaType;
834 ++ }
835 ++
836 ++ unsigned VectorElements;
837 ++ if (ElementType->isVectorTy()) {
838 ++ VectorElements = ElementType->getVectorNumElements();
839 ++ } else {
840 ++ VectorElements = 1;
841 ++ }
842 ++
843 ++ Offset += (VectorElements / getStackWidth(MF)) * ArrayElements;
844 ++ }
845 ++ return Offset;
846 ++}
847 ++
848 ++const TargetFrameLowering::SpillSlot *
849 ++AMDGPUFrameLowering::getCalleeSavedSpillSlots(unsigned &NumEntries) const {
850 ++ NumEntries = 0;
851 ++ return 0;
852 ++}
853 ++void
854 ++AMDGPUFrameLowering::emitPrologue(MachineFunction &MF) const {
855 ++}
856 ++void
857 ++AMDGPUFrameLowering::emitEpilogue(MachineFunction &MF,
858 ++ MachineBasicBlock &MBB) const {
859 ++}
860 ++
861 ++bool
862 ++AMDGPUFrameLowering::hasFP(const MachineFunction &MF) const {
863 ++ return false;
864 ++}
865 +diff --git a/lib/Target/R600/AMDGPUFrameLowering.h b/lib/Target/R600/AMDGPUFrameLowering.h
866 +new file mode 100644
867 +index 0000000..cf5742e
868 +--- /dev/null
869 ++++ b/lib/Target/R600/AMDGPUFrameLowering.h
870 +@@ -0,0 +1,44 @@
871 ++//===--------------------- AMDGPUFrameLowering.h ----------------*- C++ -*-===//
872 ++//
873 ++// The LLVM Compiler Infrastructure
874 ++//
875 ++// This file is distributed under the University of Illinois Open Source
876 ++// License. See LICENSE.TXT for details.
877 ++//
878 ++//===----------------------------------------------------------------------===//
879 ++//
880 ++/// \file
881 ++/// \brief Interface to describe a layout of a stack frame on a AMDIL target
882 ++/// machine.
883 ++//
884 ++//===----------------------------------------------------------------------===//
885 ++#ifndef AMDILFRAME_LOWERING_H
886 ++#define AMDILFRAME_LOWERING_H
887 ++
888 ++#include "llvm/CodeGen/MachineFunction.h"
889 ++#include "llvm/Target/TargetFrameLowering.h"
890 ++
891 ++namespace llvm {
892 ++
893 ++/// \brief Information about the stack frame layout on the AMDGPU targets.
894 ++///
895 ++/// It holds the direction of the stack growth, the known stack alignment on
896 ++/// entry to each function, and the offset to the locals area.
897 ++/// See TargetFrameInfo for more comments.
898 ++class AMDGPUFrameLowering : public TargetFrameLowering {
899 ++public:
900 ++ AMDGPUFrameLowering(StackDirection D, unsigned StackAl, int LAO,
901 ++ unsigned TransAl = 1);
902 ++ virtual ~AMDGPUFrameLowering();
903 ++
904 ++ /// \returns The number of 32-bit sub-registers that are used when storing
905 ++ /// values to the stack.
906 ++ virtual unsigned getStackWidth(const MachineFunction &MF) const;
907 ++ virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const;
908 ++ virtual const SpillSlot *getCalleeSavedSpillSlots(unsigned &NumEntries) const;
909 ++ virtual void emitPrologue(MachineFunction &MF) const;
910 ++ virtual void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const;
911 ++ virtual bool hasFP(const MachineFunction &MF) const;
912 ++};
913 ++} // namespace llvm
914 ++#endif // AMDILFRAME_LOWERING_H
915 diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
916 new file mode 100644
917 -index 0000000..473dac4
918 +index 0000000..d0d23d6
919 --- /dev/null
920 +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
921 -@@ -0,0 +1,417 @@
922 +@@ -0,0 +1,418 @@
923 +//===-- AMDGPUISelLowering.cpp - AMDGPU Common DAG lowering functions -----===//
924 +//
925 +// The LLVM Compiler Infrastructure
926 @@ -1361,17 +1022,18 @@ index 0000000..473dac4
927 + NODE_NAME_CASE(SMIN)
928 + NODE_NAME_CASE(UMIN)
929 + NODE_NAME_CASE(URECIP)
930 -+ NODE_NAME_CASE(INTERP)
931 -+ NODE_NAME_CASE(INTERP_P0)
932 + NODE_NAME_CASE(EXPORT)
933 ++ NODE_NAME_CASE(CONST_ADDRESS)
934 ++ NODE_NAME_CASE(REGISTER_LOAD)
935 ++ NODE_NAME_CASE(REGISTER_STORE)
936 + }
937 +}
938 diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h
939 new file mode 100644
940 -index 0000000..c7abaf6
941 +index 0000000..99a11ff
942 --- /dev/null
943 +++ b/lib/Target/R600/AMDGPUISelLowering.h
944 -@@ -0,0 +1,144 @@
945 +@@ -0,0 +1,140 @@
946 +//===-- AMDGPUISelLowering.h - AMDGPU Lowering Interface --------*- C++ -*-===//
947 +//
948 +// The LLVM Compiler Infrastructure
949 @@ -1427,6 +1089,11 @@ index 0000000..c7abaf6
950 + const SmallVectorImpl<ISD::OutputArg> &Outs,
951 + const SmallVectorImpl<SDValue> &OutVals,
952 + DebugLoc DL, SelectionDAG &DAG) const;
953 ++ virtual SDValue LowerCall(CallLoweringInfo &CLI,
954 ++ SmallVectorImpl<SDValue> &InVals) const {
955 ++ CLI.Callee.dump();
956 ++ llvm_unreachable("Undefined function");
957 ++ }
958 +
959 + virtual SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const;
960 + SDValue LowerIntrinsicIABS(SDValue Op, SelectionDAG &DAG) const;
961 @@ -1494,35 +1161,26 @@ index 0000000..c7abaf6
962 + SMIN,
963 + UMIN,
964 + URECIP,
965 -+ INTERP,
966 -+ INTERP_P0,
967 + EXPORT,
968 ++ CONST_ADDRESS,
969 ++ REGISTER_LOAD,
970 ++ REGISTER_STORE,
971 + LAST_AMDGPU_ISD_NUMBER
972 +};
973 +
974 +
975 +} // End namespace AMDGPUISD
976 +
977 -+namespace SIISD {
978 -+
979 -+enum {
980 -+ SI_FIRST = AMDGPUISD::LAST_AMDGPU_ISD_NUMBER,
981 -+ VCC_AND,
982 -+ VCC_BITCAST
983 -+};
984 -+
985 -+} // End namespace SIISD
986 -+
987 +} // End namespace llvm
988 +
989 +#endif // AMDGPUISELLOWERING_H
990 -diff --git a/lib/Target/R600/AMDGPUInstrInfo.cpp b/lib/Target/R600/AMDGPUInstrInfo.cpp
991 +diff --git a/lib/Target/R600/AMDGPUIndirectAddressing.cpp b/lib/Target/R600/AMDGPUIndirectAddressing.cpp
992 new file mode 100644
993 -index 0000000..e42a46d
994 +index 0000000..15840b3
995 --- /dev/null
996 -+++ b/lib/Target/R600/AMDGPUInstrInfo.cpp
997 -@@ -0,0 +1,257 @@
998 -+//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===//
999 ++++ b/lib/Target/R600/AMDGPUIndirectAddressing.cpp
1000 +@@ -0,0 +1,344 @@
1001 ++//===-- AMDGPUIndirectAddressing.cpp - Indirect Adressing Support ---------===//
1002 +//
1003 +// The LLVM Compiler Infrastructure
1004 +//
1005 @@ -1532,60 +1190,410 @@ index 0000000..e42a46d
1006 +//===----------------------------------------------------------------------===//
1007 +//
1008 +/// \file
1009 -+/// \brief Implementation of the TargetInstrInfo class that is common to all
1010 -+/// AMD GPUs.
1011 ++///
1012 ++/// Instructions can use indirect addressing to index the register file as if it
1013 ++/// were memory. This pass lowers RegisterLoad and RegisterStore instructions
1014 ++/// to either a COPY or a MOV that uses indirect addressing.
1015 +//
1016 +//===----------------------------------------------------------------------===//
1017 +
1018 -+#include "AMDGPUInstrInfo.h"
1019 -+#include "AMDGPURegisterInfo.h"
1020 -+#include "AMDGPUTargetMachine.h"
1021 -+#include "AMDIL.h"
1022 -+#include "llvm/CodeGen/MachineFrameInfo.h"
1023 ++#include "AMDGPU.h"
1024 ++#include "R600InstrInfo.h"
1025 ++#include "R600MachineFunctionInfo.h"
1026 ++#include "llvm/CodeGen/MachineFunction.h"
1027 ++#include "llvm/CodeGen/MachineFunctionPass.h"
1028 +#include "llvm/CodeGen/MachineInstrBuilder.h"
1029 +#include "llvm/CodeGen/MachineRegisterInfo.h"
1030 -+
1031 -+#define GET_INSTRINFO_CTOR
1032 -+#include "AMDGPUGenInstrInfo.inc"
1033 ++#include "llvm/Support/Debug.h"
1034 +
1035 +using namespace llvm;
1036 +
1037 -+AMDGPUInstrInfo::AMDGPUInstrInfo(TargetMachine &tm)
1038 -+ : AMDGPUGenInstrInfo(0,0), RI(tm, *this), TM(tm) { }
1039 ++namespace {
1040 +
1041 -+const AMDGPURegisterInfo &AMDGPUInstrInfo::getRegisterInfo() const {
1042 -+ return RI;
1043 -+}
1044 ++class AMDGPUIndirectAddressingPass : public MachineFunctionPass {
1045 +
1046 -+bool AMDGPUInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,
1047 -+ unsigned &SrcReg, unsigned &DstReg,
1048 -+ unsigned &SubIdx) const {
1049 -+// TODO: Implement this function
1050 -+ return false;
1051 -+}
1052 ++private:
1053 ++ static char ID;
1054 ++ const AMDGPUInstrInfo *TII;
1055 +
1056 -+unsigned AMDGPUInstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
1057 -+ int &FrameIndex) const {
1058 -+// TODO: Implement this function
1059 -+ return 0;
1060 -+}
1061 ++ bool regHasExplicitDef(MachineRegisterInfo &MRI, unsigned Reg) const;
1062 +
1063 -+unsigned AMDGPUInstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI,
1064 -+ int &FrameIndex) const {
1065 -+// TODO: Implement this function
1066 -+ return 0;
1067 -+}
1068 ++public:
1069 ++ AMDGPUIndirectAddressingPass(TargetMachine &tm) :
1070 ++ MachineFunctionPass(ID),
1071 ++ TII(static_cast<const AMDGPUInstrInfo*>(tm.getInstrInfo()))
1072 ++ { }
1073 +
1074 -+bool AMDGPUInstrInfo::hasLoadFromStackSlot(const MachineInstr *MI,
1075 -+ const MachineMemOperand *&MMO,
1076 -+ int &FrameIndex) const {
1077 -+// TODO: Implement this function
1078 -+ return false;
1079 ++ virtual bool runOnMachineFunction(MachineFunction &MF);
1080 ++
1081 ++ const char *getPassName() const { return "R600 Handle indirect addressing"; }
1082 ++
1083 ++};
1084 ++
1085 ++} // End anonymous namespace
1086 ++
1087 ++char AMDGPUIndirectAddressingPass::ID = 0;
1088 ++
1089 ++FunctionPass *llvm::createAMDGPUIndirectAddressingPass(TargetMachine &tm) {
1090 ++ return new AMDGPUIndirectAddressingPass(tm);
1091 +}
1092 -+unsigned AMDGPUInstrInfo::isStoreFromStackSlot(const MachineInstr *MI,
1093 -+ int &FrameIndex) const {
1094 -+// TODO: Implement this function
1095 -+ return 0;
1096 ++
1097 ++bool AMDGPUIndirectAddressingPass::runOnMachineFunction(MachineFunction &MF) {
1098 ++ MachineRegisterInfo &MRI = MF.getRegInfo();
1099 ++
1100 ++ int IndirectBegin = TII->getIndirectIndexBegin(MF);
1101 ++ int IndirectEnd = TII->getIndirectIndexEnd(MF);
1102 ++
1103 ++ if (IndirectBegin == -1) {
1104 ++ // No indirect addressing, we can skip this pass
1105 ++ assert(IndirectEnd == -1);
1106 ++ return false;
1107 ++ }
1108 ++
1109 ++ // The map keeps track of the indirect address that is represented by
1110 ++ // each virtual register. The key is the register and the value is the
1111 ++ // indirect address it uses.
1112 ++ std::map<unsigned, unsigned> RegisterAddressMap;
1113 ++
1114 ++ // First pass - Lower all of the RegisterStore instructions and track which
1115 ++ // registers are live.
1116 ++ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
1117 ++ BB != BB_E; ++BB) {
1118 ++ // This map keeps track of the current live indirect registers.
1119 ++ // The key is the address and the value is the register
1120 ++ std::map<unsigned, unsigned> LiveAddressRegisterMap;
1121 ++ MachineBasicBlock &MBB = *BB;
1122 ++
1123 ++ for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
1124 ++ I != MBB.end(); I = Next) {
1125 ++ Next = llvm::next(I);
1126 ++ MachineInstr &MI = *I;
1127 ++
1128 ++ if (!TII->isRegisterStore(MI)) {
1129 ++ continue;
1130 ++ }
1131 ++
1132 ++ // Lower RegisterStore
1133 ++
1134 ++ unsigned RegIndex = MI.getOperand(2).getImm();
1135 ++ unsigned Channel = MI.getOperand(3).getImm();
1136 ++ unsigned Address = TII->calculateIndirectAddress(RegIndex, Channel);
1137 ++ const TargetRegisterClass *IndirectStoreRegClass =
1138 ++ TII->getIndirectAddrStoreRegClass(MI.getOperand(0).getReg());
1139 ++
1140 ++ if (MI.getOperand(1).getReg() == AMDGPU::INDIRECT_BASE_ADDR) {
1141 ++ // Direct register access.
1142 ++ unsigned DstReg = MRI.createVirtualRegister(IndirectStoreRegClass);
1143 ++
1144 ++ BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY), DstReg)
1145 ++ .addOperand(MI.getOperand(0));
1146 ++
1147 ++ RegisterAddressMap[DstReg] = Address;
1148 ++ LiveAddressRegisterMap[Address] = DstReg;
1149 ++ } else {
1150 ++ // Indirect register access.
1151 ++ MachineInstrBuilder MOV = TII->buildIndirectWrite(BB, I,
1152 ++ MI.getOperand(0).getReg(), // Value
1153 ++ Address,
1154 ++ MI.getOperand(1).getReg()); // Offset
1155 ++ for (int i = IndirectBegin; i <= IndirectEnd; ++i) {
1156 ++ unsigned Addr = TII->calculateIndirectAddress(i, Channel);
1157 ++ unsigned DstReg = MRI.createVirtualRegister(IndirectStoreRegClass);
1158 ++ MOV.addReg(DstReg, RegState::Define | RegState::Implicit);
1159 ++ RegisterAddressMap[DstReg] = Addr;
1160 ++ LiveAddressRegisterMap[Addr] = DstReg;
1161 ++ }
1162 ++ }
1163 ++ MI.eraseFromParent();
1164 ++ }
1165 ++
1166 ++ // Update the live-ins of the succesor blocks
1167 ++ for (MachineBasicBlock::succ_iterator Succ = MBB.succ_begin(),
1168 ++ SuccEnd = MBB.succ_end();
1169 ++ SuccEnd != Succ; ++Succ) {
1170 ++ std::map<unsigned, unsigned>::const_iterator Key, KeyEnd;
1171 ++ for (Key = LiveAddressRegisterMap.begin(),
1172 ++ KeyEnd = LiveAddressRegisterMap.end(); KeyEnd != Key; ++Key) {
1173 ++ (*Succ)->addLiveIn(Key->second);
1174 ++ }
1175 ++ }
1176 ++ }
1177 ++
1178 ++ // Second pass - Lower the RegisterLoad instructions
1179 ++ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
1180 ++ BB != BB_E; ++BB) {
1181 ++ // Key is the address and the value is the register
1182 ++ std::map<unsigned, unsigned> LiveAddressRegisterMap;
1183 ++ MachineBasicBlock &MBB = *BB;
1184 ++
1185 ++ MachineBasicBlock::livein_iterator LI = MBB.livein_begin();
1186 ++ while (LI != MBB.livein_end()) {
1187 ++ std::vector<unsigned> PhiRegisters;
1188 ++
1189 ++ // Make sure this live in is used for indirect addressing
1190 ++ if (RegisterAddressMap.find(*LI) == RegisterAddressMap.end()) {
1191 ++ ++LI;
1192 ++ continue;
1193 ++ }
1194 ++
1195 ++ unsigned Address = RegisterAddressMap[*LI];
1196 ++ LiveAddressRegisterMap[Address] = *LI;
1197 ++ PhiRegisters.push_back(*LI);
1198 ++
1199 ++ // Check if there are other live in registers which map to the same
1200 ++ // indirect address.
1201 ++ for (MachineBasicBlock::livein_iterator LJ = llvm::next(LI),
1202 ++ LE = MBB.livein_end();
1203 ++ LJ != LE; ++LJ) {
1204 ++ unsigned Reg = *LJ;
1205 ++ if (RegisterAddressMap.find(Reg) == RegisterAddressMap.end()) {
1206 ++ continue;
1207 ++ }
1208 ++
1209 ++ if (RegisterAddressMap[Reg] == Address) {
1210 ++ PhiRegisters.push_back(Reg);
1211 ++ }
1212 ++ }
1213 ++
1214 ++ if (PhiRegisters.size() == 1) {
1215 ++ // We don't need to insert a Phi instruction, so we can just add the
1216 ++ // registers to the live list for the block.
1217 ++ LiveAddressRegisterMap[Address] = *LI;
1218 ++ MBB.removeLiveIn(*LI);
1219 ++ } else {
1220 ++ // We need to insert a PHI, because we have the same address being
1221 ++ // written in multiple predecessor blocks.
1222 ++ const TargetRegisterClass *PhiDstClass =
1223 ++ TII->getIndirectAddrStoreRegClass(*(PhiRegisters.begin()));
1224 ++ unsigned PhiDstReg = MRI.createVirtualRegister(PhiDstClass);
1225 ++ MachineInstrBuilder Phi = BuildMI(MBB, MBB.begin(),
1226 ++ MBB.findDebugLoc(MBB.begin()),
1227 ++ TII->get(AMDGPU::PHI), PhiDstReg);
1228 ++
1229 ++ for (std::vector<unsigned>::const_iterator RI = PhiRegisters.begin(),
1230 ++ RE = PhiRegisters.end();
1231 ++ RI != RE; ++RI) {
1232 ++ unsigned Reg = *RI;
1233 ++ MachineInstr *DefInst = MRI.getVRegDef(Reg);
1234 ++ assert(DefInst);
1235 ++ MachineBasicBlock *RegBlock = DefInst->getParent();
1236 ++ Phi.addReg(Reg);
1237 ++ Phi.addMBB(RegBlock);
1238 ++ MBB.removeLiveIn(Reg);
1239 ++ }
1240 ++ RegisterAddressMap[PhiDstReg] = Address;
1241 ++ LiveAddressRegisterMap[Address] = PhiDstReg;
1242 ++ }
1243 ++ LI = MBB.livein_begin();
1244 ++ }
1245 ++
1246 ++ for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
1247 ++ I != MBB.end(); I = Next) {
1248 ++ Next = llvm::next(I);
1249 ++ MachineInstr &MI = *I;
1250 ++
1251 ++ if (!TII->isRegisterLoad(MI)) {
1252 ++ if (MI.getOpcode() == AMDGPU::PHI) {
1253 ++ continue;
1254 ++ }
1255 ++ // Check for indirect register defs
1256 ++ for (unsigned OpIdx = 0, NumOperands = MI.getNumOperands();
1257 ++ OpIdx < NumOperands; ++OpIdx) {
1258 ++ MachineOperand &MO = MI.getOperand(OpIdx);
1259 ++ if (MO.isReg() && MO.isDef() &&
1260 ++ RegisterAddressMap.find(MO.getReg()) != RegisterAddressMap.end()) {
1261 ++ unsigned Reg = MO.getReg();
1262 ++ unsigned LiveAddress = RegisterAddressMap[Reg];
1263 ++ // Chain the live-ins
1264 ++ if (LiveAddressRegisterMap.find(LiveAddress) !=
1265 ++ RegisterAddressMap.end()) {
1266 ++ MI.addOperand(MachineOperand::CreateReg(
1267 ++ LiveAddressRegisterMap[LiveAddress],
1268 ++ false, // isDef
1269 ++ true, // isImp
1270 ++ true)); // isKill
1271 ++ }
1272 ++ LiveAddressRegisterMap[LiveAddress] = Reg;
1273 ++ }
1274 ++ }
1275 ++ continue;
1276 ++ }
1277 ++
1278 ++ const TargetRegisterClass *SuperIndirectRegClass =
1279 ++ TII->getSuperIndirectRegClass();
1280 ++ const TargetRegisterClass *IndirectLoadRegClass =
1281 ++ TII->getIndirectAddrLoadRegClass();
1282 ++ unsigned IndirectReg = MRI.createVirtualRegister(SuperIndirectRegClass);
1283 ++
1284 ++ unsigned RegIndex = MI.getOperand(2).getImm();
1285 ++ unsigned Channel = MI.getOperand(3).getImm();
1286 ++ unsigned Address = TII->calculateIndirectAddress(RegIndex, Channel);
1287 ++
1288 ++ if (MI.getOperand(1).getReg() == AMDGPU::INDIRECT_BASE_ADDR) {
1289 ++ // Direct register access
1290 ++ unsigned Reg = LiveAddressRegisterMap[Address];
1291 ++ unsigned AddrReg = IndirectLoadRegClass->getRegister(Address);
1292 ++
1293 ++ if (regHasExplicitDef(MRI, Reg)) {
1294 ++ // If the register we are reading from has an explicit def, then that
1295 ++ // means it was written via a direct register access (i.e. COPY
1296 ++ // or other instruction that doesn't use indirect addressing). In
1297 ++ // this case we know where the value has been stored, so we can just
1298 ++ // issue a copy.
1299 ++ BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY),
1300 ++ MI.getOperand(0).getReg())
1301 ++ .addReg(Reg);
1302 ++ } else {
1303 ++ // If the register we are reading has an implicit def, then that
1304 ++ // means it was written by an indirect register access (i.e. An
1305 ++ // instruction that uses indirect addressing.
1306 ++ BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY),
1307 ++ MI.getOperand(0).getReg())
1308 ++ .addReg(AddrReg)
1309 ++ .addReg(Reg, RegState::Implicit);
1310 ++ }
1311 ++ } else {
1312 ++ // Indirect register access
1313 ++
1314 ++ // Note on REQ_SEQUENCE instructons: You can't actually use the register
1315 ++ // it defines unless you have an instruction that takes the defined
1316 ++ // register class as an operand.
1317 ++
1318 ++ MachineInstrBuilder Sequence = BuildMI(MBB, I, MBB.findDebugLoc(I),
1319 ++ TII->get(AMDGPU::REG_SEQUENCE),
1320 ++ IndirectReg);
1321 ++ for (int i = IndirectBegin; i <= IndirectEnd; ++i) {
1322 ++ unsigned Addr = TII->calculateIndirectAddress(i, Channel);
1323 ++ if (LiveAddressRegisterMap.find(Addr) == LiveAddressRegisterMap.end()) {
1324 ++ continue;
1325 ++ }
1326 ++ unsigned Reg = LiveAddressRegisterMap[Addr];
1327 ++
1328 ++ // We only need to use REG_SEQUENCE for explicit defs, since the
1329 ++ // register coalescer won't do anything with the implicit defs.
1330 ++ MachineInstr *DefInstr = MRI.getVRegDef(Reg);
1331 ++ if (!regHasExplicitDef(MRI, Reg)) {
1332 ++ continue;
1333 ++ }
1334 ++
1335 ++ // Insert a REQ_SEQUENCE instruction to force the register allocator
1336 ++ // to allocate the virtual register to the correct physical register.
1337 ++ Sequence.addReg(LiveAddressRegisterMap[Addr]);
1338 ++ Sequence.addImm(TII->getRegisterInfo().getIndirectSubReg(Addr));
1339 ++ }
1340 ++ MachineInstrBuilder Mov = TII->buildIndirectRead(BB, I,
1341 ++ MI.getOperand(0).getReg(), // Value
1342 ++ Address,
1343 ++ MI.getOperand(1).getReg()); // Offset
1344 ++
1345 ++
1346 ++
1347 ++ Mov.addReg(IndirectReg, RegState::Implicit | RegState::Kill);
1348 ++ Mov.addReg(LiveAddressRegisterMap[Address], RegState::Implicit);
1349 ++
1350 ++ }
1351 ++ MI.eraseFromParent();
1352 ++ }
1353 ++ }
1354 ++ return false;
1355 ++}
1356 ++
1357 ++bool AMDGPUIndirectAddressingPass::regHasExplicitDef(MachineRegisterInfo &MRI,
1358 ++ unsigned Reg) const {
1359 ++ MachineInstr *DefInstr = MRI.getVRegDef(Reg);
1360 ++
1361 ++ if (!DefInstr) {
1362 ++ return false;
1363 ++ }
1364 ++
1365 ++ if (DefInstr->getOpcode() == AMDGPU::PHI) {
1366 ++ bool Explicit = false;
1367 ++ for (MachineInstr::const_mop_iterator I = DefInstr->operands_begin(),
1368 ++ E = DefInstr->operands_end();
1369 ++ I != E; ++I) {
1370 ++ const MachineOperand &MO = *I;
1371 ++ if (!MO.isReg() || MO.isDef()) {
1372 ++ continue;
1373 ++ }
1374 ++
1375 ++ Explicit = Explicit || regHasExplicitDef(MRI, MO.getReg());
1376 ++ }
1377 ++ return Explicit;
1378 ++ }
1379 ++
1380 ++ return DefInstr->getOperand(0).isReg() &&
1381 ++ DefInstr->getOperand(0).getReg() == Reg;
1382 ++}
1383 +diff --git a/lib/Target/R600/AMDGPUInstrInfo.cpp b/lib/Target/R600/AMDGPUInstrInfo.cpp
1384 +new file mode 100644
1385 +index 0000000..640707d
1386 +--- /dev/null
1387 ++++ b/lib/Target/R600/AMDGPUInstrInfo.cpp
1388 +@@ -0,0 +1,266 @@
1389 ++//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===//
1390 ++//
1391 ++// The LLVM Compiler Infrastructure
1392 ++//
1393 ++// This file is distributed under the University of Illinois Open Source
1394 ++// License. See LICENSE.TXT for details.
1395 ++//
1396 ++//===----------------------------------------------------------------------===//
1397 ++//
1398 ++/// \file
1399 ++/// \brief Implementation of the TargetInstrInfo class that is common to all
1400 ++/// AMD GPUs.
1401 ++//
1402 ++//===----------------------------------------------------------------------===//
1403 ++
1404 ++#include "AMDGPUInstrInfo.h"
1405 ++#include "AMDGPURegisterInfo.h"
1406 ++#include "AMDGPUTargetMachine.h"
1407 ++#include "AMDIL.h"
1408 ++#include "llvm/CodeGen/MachineFrameInfo.h"
1409 ++#include "llvm/CodeGen/MachineInstrBuilder.h"
1410 ++#include "llvm/CodeGen/MachineRegisterInfo.h"
1411 ++
1412 ++#define GET_INSTRINFO_CTOR
1413 ++#include "AMDGPUGenInstrInfo.inc"
1414 ++
1415 ++using namespace llvm;
1416 ++
1417 ++AMDGPUInstrInfo::AMDGPUInstrInfo(TargetMachine &tm)
1418 ++ : AMDGPUGenInstrInfo(0,0), RI(tm, *this), TM(tm) { }
1419 ++
1420 ++const AMDGPURegisterInfo &AMDGPUInstrInfo::getRegisterInfo() const {
1421 ++ return RI;
1422 ++}
1423 ++
1424 ++bool AMDGPUInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,
1425 ++ unsigned &SrcReg, unsigned &DstReg,
1426 ++ unsigned &SubIdx) const {
1427 ++// TODO: Implement this function
1428 ++ return false;
1429 ++}
1430 ++
1431 ++unsigned AMDGPUInstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
1432 ++ int &FrameIndex) const {
1433 ++// TODO: Implement this function
1434 ++ return 0;
1435 ++}
1436 ++
1437 ++unsigned AMDGPUInstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI,
1438 ++ int &FrameIndex) const {
1439 ++// TODO: Implement this function
1440 ++ return 0;
1441 ++}
1442 ++
1443 ++bool AMDGPUInstrInfo::hasLoadFromStackSlot(const MachineInstr *MI,
1444 ++ const MachineMemOperand *&MMO,
1445 ++ int &FrameIndex) const {
1446 ++// TODO: Implement this function
1447 ++ return false;
1448 ++}
1449 ++unsigned AMDGPUInstrInfo::isStoreFromStackSlot(const MachineInstr *MI,
1450 ++ int &FrameIndex) const {
1451 ++// TODO: Implement this function
1452 ++ return 0;
1453 +}
1454 +unsigned AMDGPUInstrInfo::isStoreFromStackSlotPostFE(const MachineInstr *MI,
1455 + int &FrameIndex) const {
1456 @@ -1758,7 +1766,16 @@ index 0000000..e42a46d
1457 + // TODO: Implement this function
1458 + return true;
1459 +}
1460 -+
1461 ++
1462 ++bool AMDGPUInstrInfo::isRegisterStore(const MachineInstr &MI) const {
1463 ++ return get(MI.getOpcode()).TSFlags & AMDGPU_FLAG_REGISTER_STORE;
1464 ++}
1465 ++
1466 ++bool AMDGPUInstrInfo::isRegisterLoad(const MachineInstr &MI) const {
1467 ++ return get(MI.getOpcode()).TSFlags & AMDGPU_FLAG_REGISTER_LOAD;
1468 ++}
1469 ++
1470 ++
1471 +void AMDGPUInstrInfo::convertToISA(MachineInstr & MI, MachineFunction &MF,
1472 + DebugLoc DL) const {
1473 + MachineRegisterInfo &MRI = MF.getRegInfo();
1474 @@ -1781,10 +1798,10 @@ index 0000000..e42a46d
1475 +}
1476 diff --git a/lib/Target/R600/AMDGPUInstrInfo.h b/lib/Target/R600/AMDGPUInstrInfo.h
1477 new file mode 100644
1478 -index 0000000..32ac691
1479 +index 0000000..5220aa0
1480 --- /dev/null
1481 +++ b/lib/Target/R600/AMDGPUInstrInfo.h
1482 -@@ -0,0 +1,149 @@
1483 +@@ -0,0 +1,207 @@
1484 +//===-- AMDGPUInstrInfo.h - AMDGPU Instruction Information ------*- C++ -*-===//
1485 +//
1486 +// The LLVM Compiler Infrastructure
1487 @@ -1828,9 +1845,10 @@ index 0000000..32ac691
1488 +class AMDGPUInstrInfo : public AMDGPUGenInstrInfo {
1489 +private:
1490 + const AMDGPURegisterInfo RI;
1491 -+ TargetMachine &TM;
1492 + bool getNextBranchInstr(MachineBasicBlock::iterator &iter,
1493 + MachineBasicBlock &MBB) const;
1494 ++protected:
1495 ++ TargetMachine &TM;
1496 +public:
1497 + explicit AMDGPUInstrInfo(TargetMachine &tm);
1498 +
1499 @@ -1918,12 +1936,66 @@ index 0000000..32ac691
1500 + bool isAExtLoadInst(llvm::MachineInstr *MI) const;
1501 + bool isStoreInst(llvm::MachineInstr *MI) const;
1502 + bool isTruncStoreInst(llvm::MachineInstr *MI) const;
1503 ++ bool isRegisterStore(const MachineInstr &MI) const;
1504 ++ bool isRegisterLoad(const MachineInstr &MI) const;
1505 ++
1506 ++//===---------------------------------------------------------------------===//
1507 ++// Pure virtual funtions to be implemented by sub-classes.
1508 ++//===---------------------------------------------------------------------===//
1509 +
1510 + virtual MachineInstr* getMovImmInstr(MachineFunction *MF, unsigned DstReg,
1511 + int64_t Imm) const = 0;
1512 + virtual unsigned getIEQOpcode() const = 0;
1513 + virtual bool isMov(unsigned opcode) const = 0;
1514 +
1515 ++ /// \returns the smallest register index that will be accessed by an indirect
1516 ++ /// read or write or -1 if indirect addressing is not used by this program.
1517 ++ virtual int getIndirectIndexBegin(const MachineFunction &MF) const = 0;
1518 ++
1519 ++ /// \returns the largest register index that will be accessed by an indirect
1520 ++ /// read or write or -1 if indirect addressing is not used by this program.
1521 ++ virtual int getIndirectIndexEnd(const MachineFunction &MF) const = 0;
1522 ++
1523 ++ /// \brief Calculate the "Indirect Address" for the given \p RegIndex and
1524 ++ /// \p Channel
1525 ++ ///
1526 ++ /// We model indirect addressing using a virtual address space that can be
1527 ++ /// accesed with loads and stores. The "Indirect Address" is the memory
1528 ++ /// address in this virtual address space that maps to the given \p RegIndex
1529 ++ /// and \p Channel.
1530 ++ virtual unsigned calculateIndirectAddress(unsigned RegIndex,
1531 ++ unsigned Channel) const = 0;
1532 ++
1533 ++ /// \returns The register class to be used for storing values to an
1534 ++ /// "Indirect Address" .
1535 ++ virtual const TargetRegisterClass *getIndirectAddrStoreRegClass(
1536 ++ unsigned SourceReg) const = 0;
1537 ++
1538 ++ /// \returns The register class to be used for loading values from
1539 ++ /// an "Indirect Address" .
1540 ++ virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const = 0;
1541 ++
1542 ++ /// \brief Build instruction(s) for an indirect register write.
1543 ++ ///
1544 ++ /// \returns The instruction that performs the indirect register write
1545 ++ virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB,
1546 ++ MachineBasicBlock::iterator I,
1547 ++ unsigned ValueReg, unsigned Address,
1548 ++ unsigned OffsetReg) const = 0;
1549 ++
1550 ++ /// \brief Build instruction(s) for an indirect register read.
1551 ++ ///
1552 ++ /// \returns The instruction that performs the indirect register read
1553 ++ virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB,
1554 ++ MachineBasicBlock::iterator I,
1555 ++ unsigned ValueReg, unsigned Address,
1556 ++ unsigned OffsetReg) const = 0;
1557 ++
1558 ++ /// \returns the register class whose sub registers are the set of all
1559 ++ /// possible registers that can be used for indirect addressing.
1560 ++ virtual const TargetRegisterClass *getSuperIndirectRegClass() const = 0;
1561 ++
1562 ++
1563 + /// \brief Convert the AMDIL MachineInstr to a supported ISA
1564 + /// MachineInstr
1565 + virtual void convertToISA(MachineInstr & MI, MachineFunction &MF,
1566 @@ -1933,13 +2005,16 @@ index 0000000..32ac691
1567 +
1568 +} // End llvm namespace
1569 +
1570 ++#define AMDGPU_FLAG_REGISTER_LOAD (UINT64_C(1) << 63)
1571 ++#define AMDGPU_FLAG_REGISTER_STORE (UINT64_C(1) << 62)
1572 ++
1573 +#endif // AMDGPUINSTRINFO_H
1574 diff --git a/lib/Target/R600/AMDGPUInstrInfo.td b/lib/Target/R600/AMDGPUInstrInfo.td
1575 new file mode 100644
1576 -index 0000000..96368e8
1577 +index 0000000..b66ae87
1578 --- /dev/null
1579 +++ b/lib/Target/R600/AMDGPUInstrInfo.td
1580 -@@ -0,0 +1,74 @@
1581 +@@ -0,0 +1,82 @@
1582 +//===-- AMDGPUInstrInfo.td - AMDGPU DAG nodes --------------*- tablegen -*-===//
1583 +//
1584 +// The LLVM Compiler Infrastructure
1585 @@ -2014,12 +2089,20 @@ index 0000000..96368e8
1586 +def AMDGPUurecip : SDNode<"AMDGPUISD::URECIP", SDTIntUnaryOp>;
1587 +
1588 +def fpow : SDNode<"ISD::FPOW", SDTFPBinOp>;
1589 ++
1590 ++def AMDGPUregister_load : SDNode<"AMDGPUISD::REGISTER_LOAD",
1591 ++ SDTypeProfile<1, 2, [SDTCisPtrTy<1>, SDTCisInt<2>]>,
1592 ++ [SDNPHasChain, SDNPMayLoad]>;
1593 ++
1594 ++def AMDGPUregister_store : SDNode<"AMDGPUISD::REGISTER_STORE",
1595 ++ SDTypeProfile<0, 3, [SDTCisPtrTy<1>, SDTCisInt<2>]>,
1596 ++ [SDNPHasChain, SDNPMayStore]>;
1597 diff --git a/lib/Target/R600/AMDGPUInstructions.td b/lib/Target/R600/AMDGPUInstructions.td
1598 new file mode 100644
1599 -index 0000000..e634d20
1600 +index 0000000..0559a5a
1601 --- /dev/null
1602 +++ b/lib/Target/R600/AMDGPUInstructions.td
1603 -@@ -0,0 +1,190 @@
1604 +@@ -0,0 +1,268 @@
1605 +//===-- AMDGPUInstructions.td - Common instruction defs ---*- tablegen -*-===//
1606 +//
1607 +// The LLVM Compiler Infrastructure
1608 @@ -2035,8 +2118,8 @@ index 0000000..e634d20
1609 +//===----------------------------------------------------------------------===//
1610 +
1611 +class AMDGPUInst <dag outs, dag ins, string asm, list<dag> pattern> : Instruction {
1612 -+ field bits<16> AMDILOp = 0;
1613 -+ field bits<3> Gen = 0;
1614 ++ field bit isRegisterLoad = 0;
1615 ++ field bit isRegisterStore = 0;
1616 +
1617 + let Namespace = "AMDGPU";
1618 + let OutOperandList = outs;
1619 @@ -2044,8 +2127,9 @@ index 0000000..e634d20
1620 + let AsmString = asm;
1621 + let Pattern = pattern;
1622 + let Itinerary = NullALU;
1623 -+ let TSFlags{42-40} = Gen;
1624 -+ let TSFlags{63-48} = AMDILOp;
1625 ++
1626 ++ let TSFlags{63} = isRegisterLoad;
1627 ++ let TSFlags{62} = isRegisterStore;
1628 +}
1629 +
1630 +class AMDGPUShaderInst <dag outs, dag ins, string asm, list<dag> pattern>
1631 @@ -2123,7 +2207,9 @@ index 0000000..e634d20
1632 + [{return N->isExactlyValue(1.0);}]
1633 +>;
1634 +
1635 -+let isCodeGenOnly = 1, isPseudo = 1, usesCustomInserter = 1 in {
1636 ++let isCodeGenOnly = 1, isPseudo = 1 in {
1637 ++
1638 ++let usesCustomInserter = 1 in {
1639 +
1640 +class CLAMP <RegisterClass rc> : AMDGPUShaderInst <
1641 + (outs rc:$dst),
1642 @@ -2153,7 +2239,31 @@ index 0000000..e634d20
1643 + [(int_AMDGPU_shader_type imm:$type)]
1644 +>;
1645 +
1646 -+} // End isCodeGenOnly = 1, isPseudo = 1, hasCustomInserter = 1
1647 ++} // usesCustomInserter = 1
1648 ++
1649 ++multiclass RegisterLoadStore <RegisterClass dstClass, Operand addrClass,
1650 ++ ComplexPattern addrPat> {
1651 ++ def RegisterLoad : AMDGPUShaderInst <
1652 ++ (outs dstClass:$dst),
1653 ++ (ins addrClass:$addr, i32imm:$chan),
1654 ++ "RegisterLoad $dst, $addr",
1655 ++ [(set (i32 dstClass:$dst), (AMDGPUregister_load addrPat:$addr,
1656 ++ (i32 timm:$chan)))]
1657 ++ > {
1658 ++ let isRegisterLoad = 1;
1659 ++ }
1660 ++
1661 ++ def RegisterStore : AMDGPUShaderInst <
1662 ++ (outs),
1663 ++ (ins dstClass:$val, addrClass:$addr, i32imm:$chan),
1664 ++ "RegisterStore $val, $addr",
1665 ++ [(AMDGPUregister_store (i32 dstClass:$val), addrPat:$addr, (i32 timm:$chan))]
1666 ++ > {
1667 ++ let isRegisterStore = 1;
1668 ++ }
1669 ++}
1670 ++
1671 ++} // End isCodeGenOnly = 1, isPseudo = 1
1672 +
1673 +/* Generic helper patterns for intrinsics */
1674 +/* -------------------------------------- */
1675 @@ -2186,13 +2296,64 @@ index 0000000..e634d20
1676 +>;
1677 +
1678 +// Vector Build pattern
1679 ++class Vector1_Build <ValueType vecType, RegisterClass vectorClass,
1680 ++ ValueType elemType, RegisterClass elemClass> : Pat <
1681 ++ (vecType (build_vector (elemType elemClass:$src))),
1682 ++ (vecType elemClass:$src)
1683 ++>;
1684 ++
1685 ++class Vector2_Build <ValueType vecType, RegisterClass vectorClass,
1686 ++ ValueType elemType, RegisterClass elemClass> : Pat <
1687 ++ (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1))),
1688 ++ (INSERT_SUBREG (INSERT_SUBREG
1689 ++ (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1)
1690 ++>;
1691 ++
1692 +class Vector_Build <ValueType vecType, RegisterClass vectorClass,
1693 + ValueType elemType, RegisterClass elemClass> : Pat <
1694 + (vecType (build_vector (elemType elemClass:$x), (elemType elemClass:$y),
1695 + (elemType elemClass:$z), (elemType elemClass:$w))),
1696 + (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
1697 -+ (vecType (IMPLICIT_DEF)), elemClass:$x, sel_x), elemClass:$y, sel_y),
1698 -+ elemClass:$z, sel_z), elemClass:$w, sel_w)
1699 ++ (vecType (IMPLICIT_DEF)), elemClass:$x, sub0), elemClass:$y, sub1),
1700 ++ elemClass:$z, sub2), elemClass:$w, sub3)
1701 ++>;
1702 ++
1703 ++class Vector8_Build <ValueType vecType, RegisterClass vectorClass,
1704 ++ ValueType elemType, RegisterClass elemClass> : Pat <
1705 ++ (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1),
1706 ++ (elemType elemClass:$sub2), (elemType elemClass:$sub3),
1707 ++ (elemType elemClass:$sub4), (elemType elemClass:$sub5),
1708 ++ (elemType elemClass:$sub6), (elemType elemClass:$sub7))),
1709 ++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
1710 ++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
1711 ++ (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1),
1712 ++ elemClass:$sub2, sub2), elemClass:$sub3, sub3),
1713 ++ elemClass:$sub4, sub4), elemClass:$sub5, sub5),
1714 ++ elemClass:$sub6, sub6), elemClass:$sub7, sub7)
1715 ++>;
1716 ++
1717 ++class Vector16_Build <ValueType vecType, RegisterClass vectorClass,
1718 ++ ValueType elemType, RegisterClass elemClass> : Pat <
1719 ++ (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1),
1720 ++ (elemType elemClass:$sub2), (elemType elemClass:$sub3),
1721 ++ (elemType elemClass:$sub4), (elemType elemClass:$sub5),
1722 ++ (elemType elemClass:$sub6), (elemType elemClass:$sub7),
1723 ++ (elemType elemClass:$sub8), (elemType elemClass:$sub9),
1724 ++ (elemType elemClass:$sub10), (elemType elemClass:$sub11),
1725 ++ (elemType elemClass:$sub12), (elemType elemClass:$sub13),
1726 ++ (elemType elemClass:$sub14), (elemType elemClass:$sub15))),
1727 ++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
1728 ++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
1729 ++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
1730 ++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
1731 ++ (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1),
1732 ++ elemClass:$sub2, sub2), elemClass:$sub3, sub3),
1733 ++ elemClass:$sub4, sub4), elemClass:$sub5, sub5),
1734 ++ elemClass:$sub6, sub6), elemClass:$sub7, sub7),
1735 ++ elemClass:$sub8, sub8), elemClass:$sub9, sub9),
1736 ++ elemClass:$sub10, sub10), elemClass:$sub11, sub11),
1737 ++ elemClass:$sub12, sub12), elemClass:$sub13, sub13),
1738 ++ elemClass:$sub14, sub14), elemClass:$sub15, sub15)
1739 +>;
1740 +
1741 +// bitconvert pattern
1742 @@ -2409,10 +2570,10 @@ index 0000000..d7d538e
1743 +#endif //AMDGPU_MCINSTLOWER_H
1744 diff --git a/lib/Target/R600/AMDGPURegisterInfo.cpp b/lib/Target/R600/AMDGPURegisterInfo.cpp
1745 new file mode 100644
1746 -index 0000000..eeafec8
1747 +index 0000000..d62e57b
1748 --- /dev/null
1749 +++ b/lib/Target/R600/AMDGPURegisterInfo.cpp
1750 -@@ -0,0 +1,51 @@
1751 +@@ -0,0 +1,74 @@
1752 +//===-- AMDGPURegisterInfo.cpp - AMDGPU Register Information -------------===//
1753 +//
1754 +// The LLVM Compiler Infrastructure
1755 @@ -2462,14 +2623,37 @@ index 0000000..eeafec8
1756 + return 0;
1757 +}
1758 +
1759 ++unsigned AMDGPURegisterInfo::getIndirectSubReg(unsigned IndirectIndex) const {
1760 ++
1761 ++ switch(IndirectIndex) {
1762 ++ case 0: return AMDGPU::sub0;
1763 ++ case 1: return AMDGPU::sub1;
1764 ++ case 2: return AMDGPU::sub2;
1765 ++ case 3: return AMDGPU::sub3;
1766 ++ case 4: return AMDGPU::sub4;
1767 ++ case 5: return AMDGPU::sub5;
1768 ++ case 6: return AMDGPU::sub6;
1769 ++ case 7: return AMDGPU::sub7;
1770 ++ case 8: return AMDGPU::sub8;
1771 ++ case 9: return AMDGPU::sub9;
1772 ++ case 10: return AMDGPU::sub10;
1773 ++ case 11: return AMDGPU::sub11;
1774 ++ case 12: return AMDGPU::sub12;
1775 ++ case 13: return AMDGPU::sub13;
1776 ++ case 14: return AMDGPU::sub14;
1777 ++ case 15: return AMDGPU::sub15;
1778 ++ default: llvm_unreachable("indirect index out of range");
1779 ++ }
1780 ++}
1781 ++
1782 +#define GET_REGINFO_TARGET_DESC
1783 +#include "AMDGPUGenRegisterInfo.inc"
1784 diff --git a/lib/Target/R600/AMDGPURegisterInfo.h b/lib/Target/R600/AMDGPURegisterInfo.h
1785 new file mode 100644
1786 -index 0000000..76ee7ae
1787 +index 0000000..5007ff5
1788 --- /dev/null
1789 +++ b/lib/Target/R600/AMDGPURegisterInfo.h
1790 -@@ -0,0 +1,63 @@
1791 +@@ -0,0 +1,65 @@
1792 +//===-- AMDGPURegisterInfo.h - AMDGPURegisterInfo Interface -*- C++ -*-----===//
1793 +//
1794 +// The LLVM Compiler Infrastructure
1795 @@ -2528,6 +2712,8 @@ index 0000000..76ee7ae
1796 + RegScavenger *RS) const;
1797 + unsigned getFrameRegister(const MachineFunction &MF) const;
1798 +
1799 ++ unsigned getIndirectSubReg(unsigned IndirectIndex) const;
1800 ++
1801 +};
1802 +
1803 +} // End namespace llvm
1804 @@ -2535,10 +2721,10 @@ index 0000000..76ee7ae
1805 +#endif // AMDIDSAREGISTERINFO_H
1806 diff --git a/lib/Target/R600/AMDGPURegisterInfo.td b/lib/Target/R600/AMDGPURegisterInfo.td
1807 new file mode 100644
1808 -index 0000000..8181e02
1809 +index 0000000..b5aca03
1810 --- /dev/null
1811 +++ b/lib/Target/R600/AMDGPURegisterInfo.td
1812 -@@ -0,0 +1,22 @@
1813 +@@ -0,0 +1,25 @@
1814 +//===-- AMDGPURegisterInfo.td - AMDGPU register info -------*- tablegen -*-===//
1815 +//
1816 +// The LLVM Compiler Infrastructure
1817 @@ -2553,20 +2739,23 @@ index 0000000..8181e02
1818 +//===----------------------------------------------------------------------===//
1819 +
1820 +let Namespace = "AMDGPU" in {
1821 -+ def sel_x : SubRegIndex;
1822 -+ def sel_y : SubRegIndex;
1823 -+ def sel_z : SubRegIndex;
1824 -+ def sel_w : SubRegIndex;
1825 ++
1826 ++foreach Index = 0-15 in {
1827 ++ def sub#Index : SubRegIndex;
1828 ++}
1829 ++
1830 ++def INDIRECT_BASE_ADDR : Register <"INDIRECT_BASE_ADDR">;
1831 ++
1832 +}
1833 +
1834 +include "R600RegisterInfo.td"
1835 +include "SIRegisterInfo.td"
1836 diff --git a/lib/Target/R600/AMDGPUStructurizeCFG.cpp b/lib/Target/R600/AMDGPUStructurizeCFG.cpp
1837 new file mode 100644
1838 -index 0000000..22338b5
1839 +index 0000000..a8c9621
1840 --- /dev/null
1841 +++ b/lib/Target/R600/AMDGPUStructurizeCFG.cpp
1842 -@@ -0,0 +1,714 @@
1843 +@@ -0,0 +1,893 @@
1844 +//===-- AMDGPUStructurizeCFG.cpp - ------------------===//
1845 +//
1846 +// The LLVM Compiler Infrastructure
1847 @@ -2591,30 +2780,101 @@ index 0000000..22338b5
1848 +#include "llvm/Analysis/RegionInfo.h"
1849 +#include "llvm/Analysis/RegionPass.h"
1850 +#include "llvm/Transforms/Utils/SSAUpdater.h"
1851 ++#include "llvm/Support/PatternMatch.h"
1852 +
1853 +using namespace llvm;
1854 ++using namespace llvm::PatternMatch;
1855 +
1856 +namespace {
1857 +
1858 +// Definition of the complex types used in this pass.
1859 +
1860 +typedef std::pair<BasicBlock *, Value *> BBValuePair;
1861 -+typedef ArrayRef<BasicBlock*> BBVecRef;
1862 +
1863 +typedef SmallVector<RegionNode*, 8> RNVector;
1864 +typedef SmallVector<BasicBlock*, 8> BBVector;
1865 ++typedef SmallVector<BranchInst*, 8> BranchVector;
1866 +typedef SmallVector<BBValuePair, 2> BBValueVector;
1867 +
1868 ++typedef SmallPtrSet<BasicBlock *, 8> BBSet;
1869 ++
1870 +typedef DenseMap<PHINode *, BBValueVector> PhiMap;
1871 ++typedef DenseMap<DomTreeNode *, unsigned> DTN2UnsignedMap;
1872 +typedef DenseMap<BasicBlock *, PhiMap> BBPhiMap;
1873 +typedef DenseMap<BasicBlock *, Value *> BBPredicates;
1874 +typedef DenseMap<BasicBlock *, BBPredicates> PredMap;
1875 -+typedef DenseMap<BasicBlock *, unsigned> VisitedMap;
1876 ++typedef DenseMap<BasicBlock *, BasicBlock*> BB2BBMap;
1877 ++typedef DenseMap<BasicBlock *, BBVector> BB2BBVecMap;
1878 +
1879 +// The name for newly created blocks.
1880 +
1881 +static const char *FlowBlockName = "Flow";
1882 +
1883 ++/// @brief Find the nearest common dominator for multiple BasicBlocks
1884 ++///
1885 ++/// Helper class for AMDGPUStructurizeCFG
1886 ++/// TODO: Maybe move into common code
1887 ++class NearestCommonDominator {
1888 ++
1889 ++ DominatorTree *DT;
1890 ++
1891 ++ DTN2UnsignedMap IndexMap;
1892 ++
1893 ++ BasicBlock *Result;
1894 ++ unsigned ResultIndex;
1895 ++ bool ExplicitMentioned;
1896 ++
1897 ++public:
1898 ++ /// \brief Start a new query
1899 ++ NearestCommonDominator(DominatorTree *DomTree) {
1900 ++ DT = DomTree;
1901 ++ Result = 0;
1902 ++ }
1903 ++
1904 ++ /// \brief Add BB to the resulting dominator
1905 ++ void addBlock(BasicBlock *BB, bool Remember = true) {
1906 ++
1907 ++ DomTreeNode *Node = DT->getNode(BB);
1908 ++
1909 ++ if (Result == 0) {
1910 ++ unsigned Numbering = 0;
1911 ++ for (;Node;Node = Node->getIDom())
1912 ++ IndexMap[Node] = ++Numbering;
1913 ++ Result = BB;
1914 ++ ResultIndex = 1;
1915 ++ ExplicitMentioned = Remember;
1916 ++ return;
1917 ++ }
1918 ++
1919 ++ for (;Node;Node = Node->getIDom())
1920 ++ if (IndexMap.count(Node))
1921 ++ break;
1922 ++ else
1923 ++ IndexMap[Node] = 0;
1924 ++
1925 ++ assert(Node && "Dominator tree invalid!");
1926 ++
1927 ++ unsigned Numbering = IndexMap[Node];
1928 ++ if (Numbering > ResultIndex) {
1929 ++ Result = Node->getBlock();
1930 ++ ResultIndex = Numbering;
1931 ++ ExplicitMentioned = Remember && (Result == BB);
1932 ++ } else if (Numbering == ResultIndex) {
1933 ++ ExplicitMentioned |= Remember;
1934 ++ }
1935 ++ }
1936 ++
1937 ++ /// \brief Is "Result" one of the BBs added with "Remember" = True?
1938 ++ bool wasResultExplicitMentioned() {
1939 ++ return ExplicitMentioned;
1940 ++ }
1941 ++
1942 ++ /// \brief Get the query result
1943 ++ BasicBlock *getResult() {
1944 ++ return Result;
1945 ++ }
1946 ++};
1947 ++
1948 +/// @brief Transforms the control flow graph on one single entry/exit region
1949 +/// at a time.
1950 +///
1951 @@ -2675,45 +2935,62 @@ index 0000000..22338b5
1952 + DominatorTree *DT;
1953 +
1954 + RNVector Order;
1955 -+ VisitedMap Visited;
1956 -+ PredMap Predicates;
1957 ++ BBSet Visited;
1958 ++
1959 + BBPhiMap DeletedPhis;
1960 -+ BBVector FlowsInserted;
1961 ++ BB2BBVecMap AddedPhis;
1962 ++
1963 ++ PredMap Predicates;
1964 ++ BranchVector Conditions;
1965 ++
1966 ++ BB2BBMap Loops;
1967 ++ PredMap LoopPreds;
1968 ++ BranchVector LoopConds;
1969 +
1970 -+ BasicBlock *LoopStart;
1971 -+ BasicBlock *LoopEnd;
1972 -+ BBPredicates LoopPred;
1973 ++ RegionNode *PrevNode;
1974 +
1975 + void orderNodes();
1976 +
1977 -+ void buildPredicate(BranchInst *Term, unsigned Idx,
1978 -+ BBPredicates &Pred, bool Invert);
1979 ++ void analyzeLoops(RegionNode *N);
1980 +
1981 -+ void analyzeBlock(BasicBlock *BB);
1982 ++ Value *invert(Value *Condition);
1983 +
1984 -+ void analyzeLoop(BasicBlock *BB, unsigned &LoopIdx);
1985 ++ Value *buildCondition(BranchInst *Term, unsigned Idx, bool Invert);
1986 ++
1987 ++ void gatherPredicates(RegionNode *N);
1988 +
1989 + void collectInfos();
1990 +
1991 -+ bool dominatesPredicates(BasicBlock *A, BasicBlock *B);
1992 ++ void insertConditions(bool Loops);
1993 ++
1994 ++ void delPhiValues(BasicBlock *From, BasicBlock *To);
1995 ++
1996 ++ void addPhiValues(BasicBlock *From, BasicBlock *To);
1997 ++
1998 ++ void setPhiValues();
1999 +
2000 + void killTerminator(BasicBlock *BB);
2001 +
2002 -+ RegionNode *skipChained(RegionNode *Node);
2003 ++ void changeExit(RegionNode *Node, BasicBlock *NewExit,
2004 ++ bool IncludeDominator);
2005 +
2006 -+ void delPhiValues(BasicBlock *From, BasicBlock *To);
2007 ++ BasicBlock *getNextFlow(BasicBlock *Dominator);
2008 +
2009 -+ void addPhiValues(BasicBlock *From, BasicBlock *To);
2010 ++ BasicBlock *needPrefix(bool NeedEmpty);
2011 +
2012 -+ BasicBlock *getNextFlow(BasicBlock *Prev);
2013 ++ BasicBlock *needPostfix(BasicBlock *Flow, bool ExitUseAllowed);
2014 +
2015 -+ bool isPredictableTrue(BasicBlock *Prev, BasicBlock *Node);
2016 ++ void setPrevNode(BasicBlock *BB);
2017 +
2018 -+ BasicBlock *wireFlowBlock(BasicBlock *Prev, RegionNode *Node);
2019 ++ bool dominatesPredicates(BasicBlock *BB, RegionNode *Node);
2020 +
2021 -+ void createFlow();
2022 ++ bool isPredictableTrue(RegionNode *Node);
2023 ++
2024 ++ void wireFlow(bool ExitUseAllowed, BasicBlock *LoopEnd);
2025 +
2026 -+ void insertConditions();
2027 ++ void handleLoops(bool ExitUseAllowed, BasicBlock *LoopEnd);
2028 ++
2029 ++ void createFlow();
2030 +
2031 + void rebuildSSA();
2032 +
2033 @@ -2767,212 +3044,214 @@ index 0000000..22338b5
2034 + }
2035 +}
2036 +
2037 -+/// \brief Build blocks and loop predicates
2038 -+void AMDGPUStructurizeCFG::buildPredicate(BranchInst *Term, unsigned Idx,
2039 -+ BBPredicates &Pred, bool Invert) {
2040 -+ Value *True = Invert ? BoolFalse : BoolTrue;
2041 -+ Value *False = Invert ? BoolTrue : BoolFalse;
2042 ++/// \brief Determine the end of the loops
2043 ++void AMDGPUStructurizeCFG::analyzeLoops(RegionNode *N) {
2044 +
2045 -+ RegionInfo *RI = ParentRegion->getRegionInfo();
2046 -+ BasicBlock *BB = Term->getParent();
2047 ++ if (N->isSubRegion()) {
2048 ++ // Test for exit as back edge
2049 ++ BasicBlock *Exit = N->getNodeAs<Region>()->getExit();
2050 ++ if (Visited.count(Exit))
2051 ++ Loops[Exit] = N->getEntry();
2052 ++
2053 ++ } else {
2054 ++ // Test for sucessors as back edge
2055 ++ BasicBlock *BB = N->getNodeAs<BasicBlock>();
2056 ++ BranchInst *Term = cast<BranchInst>(BB->getTerminator());
2057 +
2058 -+ // Handle the case where multiple regions start at the same block
2059 -+ Region *R = BB != ParentRegion->getEntry() ?
2060 -+ RI->getRegionFor(BB) : ParentRegion;
2061 ++ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
2062 ++ BasicBlock *Succ = Term->getSuccessor(i);
2063 +
2064 -+ if (R == ParentRegion) {
2065 -+ // It's a top level block in our region
2066 -+ Value *Cond = True;
2067 -+ if (Term->isConditional()) {
2068 -+ BasicBlock *Other = Term->getSuccessor(!Idx);
2069 ++ if (Visited.count(Succ))
2070 ++ Loops[Succ] = BB;
2071 ++ }
2072 ++ }
2073 ++}
2074 +
2075 -+ if (Visited.count(Other)) {
2076 -+ if (!Pred.count(Other))
2077 -+ Pred[Other] = False;
2078 ++/// \brief Invert the given condition
2079 ++Value *AMDGPUStructurizeCFG::invert(Value *Condition) {
2080 +
2081 -+ if (!Pred.count(BB))
2082 -+ Pred[BB] = True;
2083 -+ return;
2084 -+ }
2085 -+ Cond = Term->getCondition();
2086 ++ // First: Check if it's a constant
2087 ++ if (Condition == BoolTrue)
2088 ++ return BoolFalse;
2089 +
2090 -+ if (Idx != Invert)
2091 -+ Cond = BinaryOperator::CreateNot(Cond, "", Term);
2092 -+ }
2093 ++ if (Condition == BoolFalse)
2094 ++ return BoolTrue;
2095 +
2096 -+ Pred[BB] = Cond;
2097 ++ if (Condition == BoolUndef)
2098 ++ return BoolUndef;
2099 +
2100 -+ } else if (ParentRegion->contains(R)) {
2101 -+ // It's a block in a sub region
2102 -+ while(R->getParent() != ParentRegion)
2103 -+ R = R->getParent();
2104 ++ // Second: If the condition is already inverted, return the original value
2105 ++ if (match(Condition, m_Not(m_Value(Condition))))
2106 ++ return Condition;
2107 +
2108 -+ Pred[R->getEntry()] = True;
2109 ++ // Third: Check all the users for an invert
2110 ++ BasicBlock *Parent = cast<Instruction>(Condition)->getParent();
2111 ++ for (Value::use_iterator I = Condition->use_begin(),
2112 ++ E = Condition->use_end(); I != E; ++I) {
2113 +
2114 -+ } else {
2115 -+ // It's a branch from outside into our parent region
2116 -+ Pred[BB] = True;
2117 ++ Instruction *User = dyn_cast<Instruction>(*I);
2118 ++ if (!User || User->getParent() != Parent)
2119 ++ continue;
2120 ++
2121 ++ if (match(*I, m_Not(m_Specific(Condition))))
2122 ++ return *I;
2123 + }
2124 -+}
2125 +
2126 -+/// \brief Analyze the successors of each block and build up predicates
2127 -+void AMDGPUStructurizeCFG::analyzeBlock(BasicBlock *BB) {
2128 -+ pred_iterator PI = pred_begin(BB), PE = pred_end(BB);
2129 -+ BBPredicates &Pred = Predicates[BB];
2130 ++ // Last option: Create a new instruction
2131 ++ return BinaryOperator::CreateNot(Condition, "", Parent->getTerminator());
2132 ++}
2133 +
2134 -+ for (; PI != PE; ++PI) {
2135 -+ BranchInst *Term = cast<BranchInst>((*PI)->getTerminator());
2136 ++/// \brief Build the condition for one edge
2137 ++Value *AMDGPUStructurizeCFG::buildCondition(BranchInst *Term, unsigned Idx,
2138 ++ bool Invert) {
2139 ++ Value *Cond = Invert ? BoolFalse : BoolTrue;
2140 ++ if (Term->isConditional()) {
2141 ++ Cond = Term->getCondition();
2142 +
2143 -+ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
2144 -+ BasicBlock *Succ = Term->getSuccessor(i);
2145 -+ if (Succ != BB)
2146 -+ continue;
2147 -+ buildPredicate(Term, i, Pred, false);
2148 -+ }
2149 ++ if (Idx != Invert)
2150 ++ Cond = invert(Cond);
2151 + }
2152 ++ return Cond;
2153 +}
2154 +
2155 -+/// \brief Analyze the conditions leading to loop to a previous block
2156 -+void AMDGPUStructurizeCFG::analyzeLoop(BasicBlock *BB, unsigned &LoopIdx) {
2157 -+ BranchInst *Term = cast<BranchInst>(BB->getTerminator());
2158 ++/// \brief Analyze the predecessors of each block and build up predicates
2159 ++void AMDGPUStructurizeCFG::gatherPredicates(RegionNode *N) {
2160 +
2161 -+ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
2162 -+ BasicBlock *Succ = Term->getSuccessor(i);
2163 ++ RegionInfo *RI = ParentRegion->getRegionInfo();
2164 ++ BasicBlock *BB = N->getEntry();
2165 ++ BBPredicates &Pred = Predicates[BB];
2166 ++ BBPredicates &LPred = LoopPreds[BB];
2167 ++
2168 ++ for (pred_iterator PI = pred_begin(BB), PE = pred_end(BB);
2169 ++ PI != PE; ++PI) {
2170 +
2171 -+ // Ignore it if it's not a back edge
2172 -+ if (!Visited.count(Succ))
2173 ++ // Ignore it if it's a branch from outside into our region entry
2174 ++ if (!ParentRegion->contains(*PI))
2175 + continue;
2176 +
2177 -+ buildPredicate(Term, i, LoopPred, true);
2178 ++ Region *R = RI->getRegionFor(*PI);
2179 ++ if (R == ParentRegion) {
2180 ++
2181 ++ // It's a top level block in our region
2182 ++ BranchInst *Term = cast<BranchInst>((*PI)->getTerminator());
2183 ++ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
2184 ++ BasicBlock *Succ = Term->getSuccessor(i);
2185 ++ if (Succ != BB)
2186 ++ continue;
2187 ++
2188 ++ if (Visited.count(*PI)) {
2189 ++ // Normal forward edge
2190 ++ if (Term->isConditional()) {
2191 ++ // Try to treat it like an ELSE block
2192 ++ BasicBlock *Other = Term->getSuccessor(!i);
2193 ++ if (Visited.count(Other) && !Loops.count(Other) &&
2194 ++ !Pred.count(Other) && !Pred.count(*PI)) {
2195 ++
2196 ++ Pred[Other] = BoolFalse;
2197 ++ Pred[*PI] = BoolTrue;
2198 ++ continue;
2199 ++ }
2200 ++ }
2201 ++ Pred[*PI] = buildCondition(Term, i, false);
2202 ++
2203 ++ } else {
2204 ++ // Back edge
2205 ++ LPred[*PI] = buildCondition(Term, i, true);
2206 ++ }
2207 ++ }
2208 ++
2209 ++ } else {
2210 ++
2211 ++ // It's an exit from a sub region
2212 ++ while(R->getParent() != ParentRegion)
2213 ++ R = R->getParent();
2214 ++
2215 ++ // Edge from inside a subregion to its entry, ignore it
2216 ++ if (R == N)
2217 ++ continue;
2218 +
2219 -+ LoopEnd = BB;
2220 -+ if (Visited[Succ] < LoopIdx) {
2221 -+ LoopIdx = Visited[Succ];
2222 -+ LoopStart = Succ;
2223 ++ BasicBlock *Entry = R->getEntry();
2224 ++ if (Visited.count(Entry))
2225 ++ Pred[Entry] = BoolTrue;
2226 ++ else
2227 ++ LPred[Entry] = BoolFalse;
2228 + }
2229 + }
2230 +}
2231 +
2232 +/// \brief Collect various loop and predicate infos
2233 +void AMDGPUStructurizeCFG::collectInfos() {
2234 -+ unsigned Number = 0, LoopIdx = ~0;
2235 +
2236 + // Reset predicate
2237 + Predicates.clear();
2238 +
2239 + // and loop infos
2240 -+ LoopStart = LoopEnd = 0;
2241 -+ LoopPred.clear();
2242 ++ Loops.clear();
2243 ++ LoopPreds.clear();
2244 ++
2245 ++ // Reset the visited nodes
2246 ++ Visited.clear();
2247 +
2248 -+ RNVector::reverse_iterator OI = Order.rbegin(), OE = Order.rend();
2249 -+ for (Visited.clear(); OI != OE; Visited[(*OI++)->getEntry()] = ++Number) {
2250 ++ for (RNVector::reverse_iterator OI = Order.rbegin(), OE = Order.rend();
2251 ++ OI != OE; ++OI) {
2252 +
2253 + // Analyze all the conditions leading to a node
2254 -+ analyzeBlock((*OI)->getEntry());
2255 ++ gatherPredicates(*OI);
2256 +
2257 -+ if ((*OI)->isSubRegion())
2258 -+ continue;
2259 ++ // Remember that we've seen this node
2260 ++ Visited.insert((*OI)->getEntry());
2261 +
2262 -+ // Find the first/last loop nodes and loop predicates
2263 -+ analyzeLoop((*OI)->getNodeAs<BasicBlock>(), LoopIdx);
2264 ++ // Find the last back edges
2265 ++ analyzeLoops(*OI);
2266 + }
2267 +}
2268 +
2269 -+/// \brief Does A dominate all the predicates of B ?
2270 -+bool AMDGPUStructurizeCFG::dominatesPredicates(BasicBlock *A, BasicBlock *B) {
2271 -+ BBPredicates &Preds = Predicates[B];
2272 -+ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
2273 -+ PI != PE; ++PI) {
2274 ++/// \brief Insert the missing branch conditions
2275 ++void AMDGPUStructurizeCFG::insertConditions(bool Loops) {
2276 ++ BranchVector &Conds = Loops ? LoopConds : Conditions;
2277 ++ Value *Default = Loops ? BoolTrue : BoolFalse;
2278 ++ SSAUpdater PhiInserter;
2279 +
2280 -+ if (!DT->dominates(A, PI->first))
2281 -+ return false;
2282 -+ }
2283 -+ return true;
2284 -+}
2285 ++ for (BranchVector::iterator I = Conds.begin(),
2286 ++ E = Conds.end(); I != E; ++I) {
2287 +
2288 -+/// \brief Remove phi values from all successors and the remove the terminator.
2289 -+void AMDGPUStructurizeCFG::killTerminator(BasicBlock *BB) {
2290 -+ TerminatorInst *Term = BB->getTerminator();
2291 -+ if (!Term)
2292 -+ return;
2293 ++ BranchInst *Term = *I;
2294 ++ assert(Term->isConditional());
2295 +
2296 -+ for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB);
2297 -+ SI != SE; ++SI) {
2298 ++ BasicBlock *Parent = Term->getParent();
2299 ++ BasicBlock *SuccTrue = Term->getSuccessor(0);
2300 ++ BasicBlock *SuccFalse = Term->getSuccessor(1);
2301 +
2302 -+ delPhiValues(BB, *SI);
2303 -+ }
2304 ++ PhiInserter.Initialize(Boolean, "");
2305 ++ PhiInserter.AddAvailableValue(&Func->getEntryBlock(), Default);
2306 ++ PhiInserter.AddAvailableValue(Loops ? SuccFalse : Parent, Default);
2307 +
2308 -+ Term->eraseFromParent();
2309 -+}
2310 ++ BBPredicates &Preds = Loops ? LoopPreds[SuccFalse] : Predicates[SuccTrue];
2311 +
2312 -+/// First: Skip forward to the first region node that either isn't a subregion or not
2313 -+/// dominating it's exit, remove all the skipped nodes from the node order.
2314 -+///
2315 -+/// Second: Handle the first successor directly if the resulting nodes successor
2316 -+/// predicates are still dominated by the original entry
2317 -+RegionNode *AMDGPUStructurizeCFG::skipChained(RegionNode *Node) {
2318 -+ BasicBlock *Entry = Node->getEntry();
2319 ++ NearestCommonDominator Dominator(DT);
2320 ++ Dominator.addBlock(Parent, false);
2321 +
2322 -+ // Skip forward as long as it is just a linear flow
2323 -+ while (true) {
2324 -+ BasicBlock *Entry = Node->getEntry();
2325 -+ BasicBlock *Exit;
2326 ++ Value *ParentValue = 0;
2327 ++ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
2328 ++ PI != PE; ++PI) {
2329 +
2330 -+ if (Node->isSubRegion()) {
2331 -+ Exit = Node->getNodeAs<Region>()->getExit();
2332 -+ } else {
2333 -+ TerminatorInst *Term = Entry->getTerminator();
2334 -+ if (Term->getNumSuccessors() != 1)
2335 ++ if (PI->first == Parent) {
2336 ++ ParentValue = PI->second;
2337 + break;
2338 -+ Exit = Term->getSuccessor(0);
2339 ++ }
2340 ++ PhiInserter.AddAvailableValue(PI->first, PI->second);
2341 ++ Dominator.addBlock(PI->first);
2342 + }
2343 +
2344 -+ // It's a back edge, break here so we can insert a loop node
2345 -+ if (!Visited.count(Exit))
2346 -+ return Node;
2347 -+
2348 -+ // More than node edges are pointing to exit
2349 -+ if (!DT->dominates(Entry, Exit))
2350 -+ return Node;
2351 -+
2352 -+ RegionNode *Next = ParentRegion->getNode(Exit);
2353 -+ RNVector::iterator I = std::find(Order.begin(), Order.end(), Next);
2354 -+ assert(I != Order.end());
2355 -+
2356 -+ Visited.erase(Next->getEntry());
2357 -+ Order.erase(I);
2358 -+ Node = Next;
2359 -+ }
2360 ++ if (ParentValue) {
2361 ++ Term->setCondition(ParentValue);
2362 ++ } else {
2363 ++ if (!Dominator.wasResultExplicitMentioned())
2364 ++ PhiInserter.AddAvailableValue(Dominator.getResult(), Default);
2365 +
2366 -+ BasicBlock *BB = Node->getEntry();
2367 -+ TerminatorInst *Term = BB->getTerminator();
2368 -+ if (Term->getNumSuccessors() != 2)
2369 -+ return Node;
2370 -+
2371 -+ // Our node has exactly two succesors, check if we can handle
2372 -+ // any of them directly
2373 -+ BasicBlock *Succ = Term->getSuccessor(0);
2374 -+ if (!Visited.count(Succ) || !dominatesPredicates(Entry, Succ)) {
2375 -+ Succ = Term->getSuccessor(1);
2376 -+ if (!Visited.count(Succ) || !dominatesPredicates(Entry, Succ))
2377 -+ return Node;
2378 -+ } else {
2379 -+ BasicBlock *Succ2 = Term->getSuccessor(1);
2380 -+ if (Visited.count(Succ2) && Visited[Succ] > Visited[Succ2] &&
2381 -+ dominatesPredicates(Entry, Succ2))
2382 -+ Succ = Succ2;
2383 ++ Term->setCondition(PhiInserter.GetValueInMiddleOfBlock(Parent));
2384 ++ }
2385 + }
2386 -+
2387 -+ RegionNode *Next = ParentRegion->getNode(Succ);
2388 -+ RNVector::iterator E = Order.end();
2389 -+ RNVector::iterator I = std::find(Order.begin(), E, Next);
2390 -+ assert(I != E);
2391 -+
2392 -+ killTerminator(BB);
2393 -+ FlowsInserted.push_back(BB);
2394 -+ Visited.erase(Succ);
2395 -+ Order.erase(I);
2396 -+ return ParentRegion->getNode(wireFlowBlock(BB, Next));
2397 +}
2398 +
2399 +/// \brief Remove all PHI values coming from "From" into "To" and remember
2400 @@ -2990,224 +3269,306 @@ index 0000000..22338b5
2401 + }
2402 +}
2403 +
2404 -+/// \brief Add the PHI values back once we knew the new predecessor
2405 ++/// \brief Add a dummy PHI value as soon as we knew the new predecessor
2406 +void AMDGPUStructurizeCFG::addPhiValues(BasicBlock *From, BasicBlock *To) {
2407 -+ if (!DeletedPhis.count(To))
2408 -+ return;
2409 ++ for (BasicBlock::iterator I = To->begin(), E = To->end();
2410 ++ I != E && isa<PHINode>(*I);) {
2411 +
2412 -+ PhiMap &Map = DeletedPhis[To];
2413 ++ PHINode &Phi = cast<PHINode>(*I++);
2414 ++ Value *Undef = UndefValue::get(Phi.getType());
2415 ++ Phi.addIncoming(Undef, From);
2416 ++ }
2417 ++ AddedPhis[To].push_back(From);
2418 ++}
2419 ++
2420 ++/// \brief Add the real PHI value as soon as everything is set up
2421 ++void AMDGPUStructurizeCFG::setPhiValues() {
2422 ++
2423 + SSAUpdater Updater;
2424 ++ for (BB2BBVecMap::iterator AI = AddedPhis.begin(), AE = AddedPhis.end();
2425 ++ AI != AE; ++AI) {
2426 +
2427 -+ for (PhiMap::iterator I = Map.begin(), E = Map.end(); I != E; ++I) {
2428 ++ BasicBlock *To = AI->first;
2429 ++ BBVector &From = AI->second;
2430 +
2431 -+ PHINode *Phi = I->first;
2432 -+ Updater.Initialize(Phi->getType(), "");
2433 -+ BasicBlock *Fallback = To;
2434 -+ bool HaveFallback = false;
2435 ++ if (!DeletedPhis.count(To))
2436 ++ continue;
2437 +
2438 -+ for (BBValueVector::iterator VI = I->second.begin(), VE = I->second.end();
2439 -+ VI != VE; ++VI) {
2440 ++ PhiMap &Map = DeletedPhis[To];
2441 ++ for (PhiMap::iterator PI = Map.begin(), PE = Map.end();
2442 ++ PI != PE; ++PI) {
2443 +
2444 -+ Updater.AddAvailableValue(VI->first, VI->second);
2445 -+ BasicBlock *Dom = DT->findNearestCommonDominator(Fallback, VI->first);
2446 -+ if (Dom == VI->first)
2447 -+ HaveFallback = true;
2448 -+ else if (Dom != Fallback)
2449 -+ HaveFallback = false;
2450 -+ Fallback = Dom;
2451 -+ }
2452 -+ if (!HaveFallback) {
2453 ++ PHINode *Phi = PI->first;
2454 + Value *Undef = UndefValue::get(Phi->getType());
2455 -+ Updater.AddAvailableValue(Fallback, Undef);
2456 ++ Updater.Initialize(Phi->getType(), "");
2457 ++ Updater.AddAvailableValue(&Func->getEntryBlock(), Undef);
2458 ++ Updater.AddAvailableValue(To, Undef);
2459 ++
2460 ++ NearestCommonDominator Dominator(DT);
2461 ++ Dominator.addBlock(To, false);
2462 ++ for (BBValueVector::iterator VI = PI->second.begin(),
2463 ++ VE = PI->second.end(); VI != VE; ++VI) {
2464 ++
2465 ++ Updater.AddAvailableValue(VI->first, VI->second);
2466 ++ Dominator.addBlock(VI->first);
2467 ++ }
2468 ++
2469 ++ if (!Dominator.wasResultExplicitMentioned())
2470 ++ Updater.AddAvailableValue(Dominator.getResult(), Undef);
2471 ++
2472 ++ for (BBVector::iterator FI = From.begin(), FE = From.end();
2473 ++ FI != FE; ++FI) {
2474 ++
2475 ++ int Idx = Phi->getBasicBlockIndex(*FI);
2476 ++ assert(Idx != -1);
2477 ++ Phi->setIncomingValue(Idx, Updater.GetValueAtEndOfBlock(*FI));
2478 ++ }
2479 ++ }
2480 ++
2481 ++ DeletedPhis.erase(To);
2482 ++ }
2483 ++ assert(DeletedPhis.empty());
2484 ++}
2485 ++
2486 ++/// \brief Remove phi values from all successors and then remove the terminator.
2487 ++void AMDGPUStructurizeCFG::killTerminator(BasicBlock *BB) {
2488 ++ TerminatorInst *Term = BB->getTerminator();
2489 ++ if (!Term)
2490 ++ return;
2491 ++
2492 ++ for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB);
2493 ++ SI != SE; ++SI) {
2494 ++
2495 ++ delPhiValues(BB, *SI);
2496 ++ }
2497 ++
2498 ++ Term->eraseFromParent();
2499 ++}
2500 ++
2501 ++/// \brief Let node exit(s) point to NewExit
2502 ++void AMDGPUStructurizeCFG::changeExit(RegionNode *Node, BasicBlock *NewExit,
2503 ++ bool IncludeDominator) {
2504 ++
2505 ++ if (Node->isSubRegion()) {
2506 ++ Region *SubRegion = Node->getNodeAs<Region>();
2507 ++ BasicBlock *OldExit = SubRegion->getExit();
2508 ++ BasicBlock *Dominator = 0;
2509 ++
2510 ++ // Find all the edges from the sub region to the exit
2511 ++ for (pred_iterator I = pred_begin(OldExit), E = pred_end(OldExit);
2512 ++ I != E;) {
2513 ++
2514 ++ BasicBlock *BB = *I++;
2515 ++ if (!SubRegion->contains(BB))
2516 ++ continue;
2517 ++
2518 ++ // Modify the edges to point to the new exit
2519 ++ delPhiValues(BB, OldExit);
2520 ++ BB->getTerminator()->replaceUsesOfWith(OldExit, NewExit);
2521 ++ addPhiValues(BB, NewExit);
2522 ++
2523 ++ // Find the new dominator (if requested)
2524 ++ if (IncludeDominator) {
2525 ++ if (!Dominator)
2526 ++ Dominator = BB;
2527 ++ else
2528 ++ Dominator = DT->findNearestCommonDominator(Dominator, BB);
2529 ++ }
2530 + }
2531 +
2532 -+ Phi->addIncoming(Updater.GetValueAtEndOfBlock(From), From);
2533 ++ // Change the dominator (if requested)
2534 ++ if (Dominator)
2535 ++ DT->changeImmediateDominator(NewExit, Dominator);
2536 ++
2537 ++ // Update the region info
2538 ++ SubRegion->replaceExit(NewExit);
2539 ++
2540 ++ } else {
2541 ++ BasicBlock *BB = Node->getNodeAs<BasicBlock>();
2542 ++ killTerminator(BB);
2543 ++ BranchInst::Create(NewExit, BB);
2544 ++ addPhiValues(BB, NewExit);
2545 ++ if (IncludeDominator)
2546 ++ DT->changeImmediateDominator(NewExit, BB);
2547 + }
2548 -+ DeletedPhis.erase(To);
2549 +}
2550 +
2551 +/// \brief Create a new flow node and update dominator tree and region info
2552 -+BasicBlock *AMDGPUStructurizeCFG::getNextFlow(BasicBlock *Prev) {
2553 ++BasicBlock *AMDGPUStructurizeCFG::getNextFlow(BasicBlock *Dominator) {
2554 + LLVMContext &Context = Func->getContext();
2555 + BasicBlock *Insert = Order.empty() ? ParentRegion->getExit() :
2556 + Order.back()->getEntry();
2557 + BasicBlock *Flow = BasicBlock::Create(Context, FlowBlockName,
2558 + Func, Insert);
2559 -+ DT->addNewBlock(Flow, Prev);
2560 ++ DT->addNewBlock(Flow, Dominator);
2561 + ParentRegion->getRegionInfo()->setRegionFor(Flow, ParentRegion);
2562 -+ FlowsInserted.push_back(Flow);
2563 + return Flow;
2564 +}
2565 +
2566 ++/// \brief Create a new or reuse the previous node as flow node
2567 ++BasicBlock *AMDGPUStructurizeCFG::needPrefix(bool NeedEmpty) {
2568 ++
2569 ++ BasicBlock *Entry = PrevNode->getEntry();
2570 ++
2571 ++ if (!PrevNode->isSubRegion()) {
2572 ++ killTerminator(Entry);
2573 ++ if (!NeedEmpty || Entry->getFirstInsertionPt() == Entry->end())
2574 ++ return Entry;
2575 ++
2576 ++ }
2577 ++
2578 ++ // create a new flow node
2579 ++ BasicBlock *Flow = getNextFlow(Entry);
2580 ++
2581 ++ // and wire it up
2582 ++ changeExit(PrevNode, Flow, true);
2583 ++ PrevNode = ParentRegion->getBBNode(Flow);
2584 ++ return Flow;
2585 ++}
2586 ++
2587 ++/// \brief Returns the region exit if possible, otherwise just a new flow node
2588 ++BasicBlock *AMDGPUStructurizeCFG::needPostfix(BasicBlock *Flow,
2589 ++ bool ExitUseAllowed) {
2590 ++
2591 ++ if (Order.empty() && ExitUseAllowed) {
2592 ++ BasicBlock *Exit = ParentRegion->getExit();
2593 ++ DT->changeImmediateDominator(Exit, Flow);
2594 ++ addPhiValues(Flow, Exit);
2595 ++ return Exit;
2596 ++ }
2597 ++ return getNextFlow(Flow);
2598 ++}
2599 ++
2600 ++/// \brief Set the previous node
2601 ++void AMDGPUStructurizeCFG::setPrevNode(BasicBlock *BB) {
2602 ++ PrevNode = ParentRegion->contains(BB) ? ParentRegion->getBBNode(BB) : 0;
2603 ++}
2604 ++
2605 ++/// \brief Does BB dominate all the predicates of Node ?
2606 ++bool AMDGPUStructurizeCFG::dominatesPredicates(BasicBlock *BB, RegionNode *Node) {
2607 ++ BBPredicates &Preds = Predicates[Node->getEntry()];
2608 ++ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
2609 ++ PI != PE; ++PI) {
2610 ++
2611 ++ if (!DT->dominates(BB, PI->first))
2612 ++ return false;
2613 ++ }
2614 ++ return true;
2615 ++}
2616 ++
2617 +/// \brief Can we predict that this node will always be called?
2618 -+bool AMDGPUStructurizeCFG::isPredictableTrue(BasicBlock *Prev,
2619 -+ BasicBlock *Node) {
2620 -+ BBPredicates &Preds = Predicates[Node];
2621 ++bool AMDGPUStructurizeCFG::isPredictableTrue(RegionNode *Node) {
2622 ++
2623 ++ BBPredicates &Preds = Predicates[Node->getEntry()];
2624 + bool Dominated = false;
2625 +
2626 ++ // Regionentry is always true
2627 ++ if (PrevNode == 0)
2628 ++ return true;
2629 ++
2630 + for (BBPredicates::iterator I = Preds.begin(), E = Preds.end();
2631 + I != E; ++I) {
2632 +
2633 + if (I->second != BoolTrue)
2634 + return false;
2635 +
2636 -+ if (!Dominated && DT->dominates(I->first, Prev))
2637 ++ if (!Dominated && DT->dominates(I->first, PrevNode->getEntry()))
2638 + Dominated = true;
2639 + }
2640 ++
2641 ++ // TODO: The dominator check is too strict
2642 + return Dominated;
2643 +}
2644 +
2645 -+/// \brief Wire up the new control flow by inserting or updating the branch
2646 -+/// instructions at node exits
2647 -+BasicBlock *AMDGPUStructurizeCFG::wireFlowBlock(BasicBlock *Prev,
2648 -+ RegionNode *Node) {
2649 -+ BasicBlock *Entry = Node->getEntry();
2650 -+
2651 -+ if (LoopStart == Entry) {
2652 -+ LoopStart = Prev;
2653 -+ LoopPred[Prev] = BoolTrue;
2654 -+ }
2655 ++/// Take one node from the order vector and wire it up
2656 ++void AMDGPUStructurizeCFG::wireFlow(bool ExitUseAllowed,
2657 ++ BasicBlock *LoopEnd) {
2658 +
2659 -+ // Wire it up temporary, skipChained may recurse into us
2660 -+ BranchInst::Create(Entry, Prev);
2661 -+ DT->changeImmediateDominator(Entry, Prev);
2662 -+ addPhiValues(Prev, Entry);
2663 ++ RegionNode *Node = Order.pop_back_val();
2664 ++ Visited.insert(Node->getEntry());
2665 +
2666 -+ Node = skipChained(Node);
2667 ++ if (isPredictableTrue(Node)) {
2668 ++ // Just a linear flow
2669 ++ if (PrevNode) {
2670 ++ changeExit(PrevNode, Node->getEntry(), true);
2671 ++ }
2672 ++ PrevNode = Node;
2673 +
2674 -+ BasicBlock *Next = getNextFlow(Prev);
2675 -+ if (!isPredictableTrue(Prev, Entry)) {
2676 -+ // Let Prev point to entry and next block
2677 -+ Prev->getTerminator()->eraseFromParent();
2678 -+ BranchInst::Create(Entry, Next, BoolUndef, Prev);
2679 + } else {
2680 -+ DT->changeImmediateDominator(Next, Entry);
2681 -+ }
2682 ++ // Insert extra prefix node (or reuse last one)
2683 ++ BasicBlock *Flow = needPrefix(false);
2684 +
2685 -+ // Let node exit(s) point to next block
2686 -+ if (Node->isSubRegion()) {
2687 -+ Region *SubRegion = Node->getNodeAs<Region>();
2688 -+ BasicBlock *Exit = SubRegion->getExit();
2689 ++ // Insert extra postfix node (or use exit instead)
2690 ++ BasicBlock *Entry = Node->getEntry();
2691 ++ BasicBlock *Next = needPostfix(Flow, ExitUseAllowed);
2692 +
2693 -+ // Find all the edges from the sub region to the exit
2694 -+ BBVector ToDo;
2695 -+ for (pred_iterator I = pred_begin(Exit), E = pred_end(Exit); I != E; ++I) {
2696 -+ if (SubRegion->contains(*I))
2697 -+ ToDo.push_back(*I);
2698 -+ }
2699 ++ // let it point to entry and next block
2700 ++ Conditions.push_back(BranchInst::Create(Entry, Next, BoolUndef, Flow));
2701 ++ addPhiValues(Flow, Entry);
2702 ++ DT->changeImmediateDominator(Entry, Flow);
2703 +
2704 -+ // Modify the edges to point to the new flow block
2705 -+ for (BBVector::iterator I = ToDo.begin(), E = ToDo.end(); I != E; ++I) {
2706 -+ delPhiValues(*I, Exit);
2707 -+ TerminatorInst *Term = (*I)->getTerminator();
2708 -+ Term->replaceUsesOfWith(Exit, Next);
2709 ++ PrevNode = Node;
2710 ++ while (!Order.empty() && !Visited.count(LoopEnd) &&
2711 ++ dominatesPredicates(Entry, Order.back())) {
2712 ++ handleLoops(false, LoopEnd);
2713 + }
2714 +
2715 -+ // Update the region info
2716 -+ SubRegion->replaceExit(Next);
2717 -+
2718 -+ } else {
2719 -+ BasicBlock *BB = Node->getNodeAs<BasicBlock>();
2720 -+ killTerminator(BB);
2721 -+ BranchInst::Create(Next, BB);
2722 -+
2723 -+ if (BB == LoopEnd)
2724 -+ LoopEnd = 0;
2725 ++ changeExit(PrevNode, Next, false);
2726 ++ setPrevNode(Next);
2727 + }
2728 -+
2729 -+ return Next;
2730 +}
2731 +
2732 -+/// Destroy node order and visited map, build up flow order instead.
2733 -+/// After this function control flow looks like it should be, but
2734 -+/// branches only have undefined conditions.
2735 -+void AMDGPUStructurizeCFG::createFlow() {
2736 -+ DeletedPhis.clear();
2737 -+
2738 -+ BasicBlock *Prev = Order.pop_back_val()->getEntry();
2739 -+ assert(Prev == ParentRegion->getEntry() && "Incorrect node order!");
2740 -+ Visited.erase(Prev);
2741 -+
2742 -+ if (LoopStart == Prev) {
2743 -+ // Loop starts at entry, split entry so that we can predicate it
2744 -+ BasicBlock::iterator Insert = Prev->getFirstInsertionPt();
2745 -+ BasicBlock *Split = Prev->splitBasicBlock(Insert, FlowBlockName);
2746 -+ DT->addNewBlock(Split, Prev);
2747 -+ ParentRegion->getRegionInfo()->setRegionFor(Split, ParentRegion);
2748 -+ Predicates[Split] = Predicates[Prev];
2749 -+ Order.push_back(ParentRegion->getBBNode(Split));
2750 -+ LoopPred[Prev] = BoolTrue;
2751 -+
2752 -+ } else if (LoopStart == Order.back()->getEntry()) {
2753 -+ // Loop starts behind entry, split entry so that we can jump to it
2754 -+ Instruction *Term = Prev->getTerminator();
2755 -+ BasicBlock *Split = Prev->splitBasicBlock(Term, FlowBlockName);
2756 -+ DT->addNewBlock(Split, Prev);
2757 -+ ParentRegion->getRegionInfo()->setRegionFor(Split, ParentRegion);
2758 -+ Prev = Split;
2759 -+ }
2760 -+
2761 -+ killTerminator(Prev);
2762 -+ FlowsInserted.clear();
2763 -+ FlowsInserted.push_back(Prev);
2764 ++void AMDGPUStructurizeCFG::handleLoops(bool ExitUseAllowed,
2765 ++ BasicBlock *LoopEnd) {
2766 ++ RegionNode *Node = Order.back();
2767 ++ BasicBlock *LoopStart = Node->getEntry();
2768 +
2769 -+ while (!Order.empty()) {
2770 -+ RegionNode *Node = Order.pop_back_val();
2771 -+ Visited.erase(Node->getEntry());
2772 -+ Prev = wireFlowBlock(Prev, Node);
2773 -+ if (LoopStart && !LoopEnd) {
2774 -+ // Create an extra loop end node
2775 -+ LoopEnd = Prev;
2776 -+ Prev = getNextFlow(LoopEnd);
2777 -+ BranchInst::Create(Prev, LoopStart, BoolUndef, LoopEnd);
2778 -+ addPhiValues(LoopEnd, LoopStart);
2779 -+ }
2780 ++ if (!Loops.count(LoopStart)) {
2781 ++ wireFlow(ExitUseAllowed, LoopEnd);
2782 ++ return;
2783 + }
2784 +
2785 -+ BasicBlock *Exit = ParentRegion->getExit();
2786 -+ BranchInst::Create(Exit, Prev);
2787 -+ addPhiValues(Prev, Exit);
2788 -+ if (DT->dominates(ParentRegion->getEntry(), Exit))
2789 -+ DT->changeImmediateDominator(Exit, Prev);
2790 -+
2791 -+ if (LoopStart && LoopEnd) {
2792 -+ BBVector::iterator FI = std::find(FlowsInserted.begin(),
2793 -+ FlowsInserted.end(),
2794 -+ LoopStart);
2795 -+ for (; *FI != LoopEnd; ++FI) {
2796 -+ addPhiValues(*FI, (*FI)->getTerminator()->getSuccessor(0));
2797 -+ }
2798 ++ if (!isPredictableTrue(Node))
2799 ++ LoopStart = needPrefix(true);
2800 ++
2801 ++ LoopEnd = Loops[Node->getEntry()];
2802 ++ wireFlow(false, LoopEnd);
2803 ++ while (!Visited.count(LoopEnd)) {
2804 ++ handleLoops(false, LoopEnd);
2805 + }
2806 +
2807 -+ assert(Order.empty());
2808 -+ assert(Visited.empty());
2809 -+ assert(DeletedPhis.empty());
2810 ++ // Create an extra loop end node
2811 ++ LoopEnd = needPrefix(false);
2812 ++ BasicBlock *Next = needPostfix(LoopEnd, ExitUseAllowed);
2813 ++ LoopConds.push_back(BranchInst::Create(Next, LoopStart,
2814 ++ BoolUndef, LoopEnd));
2815 ++ addPhiValues(LoopEnd, LoopStart);
2816 ++ setPrevNode(Next);
2817 +}
2818 +
2819 -+/// \brief Insert the missing branch conditions
2820 -+void AMDGPUStructurizeCFG::insertConditions() {
2821 -+ SSAUpdater PhiInserter;
2822 -+
2823 -+ for (BBVector::iterator FI = FlowsInserted.begin(), FE = FlowsInserted.end();
2824 -+ FI != FE; ++FI) {
2825 -+
2826 -+ BranchInst *Term = cast<BranchInst>((*FI)->getTerminator());
2827 -+ if (Term->isUnconditional())
2828 -+ continue;
2829 ++/// After this function control flow looks like it should be, but
2830 ++/// branches and PHI nodes only have undefined conditions.
2831 ++void AMDGPUStructurizeCFG::createFlow() {
2832 +
2833 -+ PhiInserter.Initialize(Boolean, "");
2834 -+ PhiInserter.AddAvailableValue(&Func->getEntryBlock(), BoolFalse);
2835 ++ BasicBlock *Exit = ParentRegion->getExit();
2836 ++ bool EntryDominatesExit = DT->dominates(ParentRegion->getEntry(), Exit);
2837 +
2838 -+ BasicBlock *Succ = Term->getSuccessor(0);
2839 -+ BBPredicates &Preds = (*FI == LoopEnd) ? LoopPred : Predicates[Succ];
2840 -+ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
2841 -+ PI != PE; ++PI) {
2842 ++ DeletedPhis.clear();
2843 ++ AddedPhis.clear();
2844 ++ Conditions.clear();
2845 ++ LoopConds.clear();
2846 +
2847 -+ PhiInserter.AddAvailableValue(PI->first, PI->second);
2848 -+ }
2849 ++ PrevNode = 0;
2850 ++ Visited.clear();
2851 +
2852 -+ Term->setCondition(PhiInserter.GetValueAtEndOfBlock(*FI));
2853 ++ while (!Order.empty()) {
2854 ++ handleLoops(EntryDominatesExit, 0);
2855 + }
2856 ++
2857 ++ if (PrevNode)
2858 ++ changeExit(PrevNode, Exit, EntryDominatesExit);
2859 ++ else
2860 ++ assert(EntryDominatesExit);
2861 +}
2862 +
2863 +/// Handle a rare case where the disintegrated nodes instructions
2864 @@ -3265,14 +3626,21 @@ index 0000000..22338b5
2865 + orderNodes();
2866 + collectInfos();
2867 + createFlow();
2868 -+ insertConditions();
2869 ++ insertConditions(false);
2870 ++ insertConditions(true);
2871 ++ setPhiValues();
2872 + rebuildSSA();
2873 +
2874 ++ // Cleanup
2875 + Order.clear();
2876 + Visited.clear();
2877 -+ Predicates.clear();
2878 + DeletedPhis.clear();
2879 -+ FlowsInserted.clear();
2880 ++ AddedPhis.clear();
2881 ++ Predicates.clear();
2882 ++ Conditions.clear();
2883 ++ Loops.clear();
2884 ++ LoopPreds.clear();
2885 ++ LoopConds.clear();
2886 +
2887 + return true;
2888 +}
2889 @@ -3447,10 +3815,10 @@ index 0000000..cab7884
2890 +#endif // AMDGPUSUBTARGET_H
2891 diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp b/lib/Target/R600/AMDGPUTargetMachine.cpp
2892 new file mode 100644
2893 -index 0000000..d09dc2e
2894 +index 0000000..e2f00be
2895 --- /dev/null
2896 +++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
2897 -@@ -0,0 +1,142 @@
2898 +@@ -0,0 +1,153 @@
2899 +//===-- AMDGPUTargetMachine.cpp - TargetMachine for hw codegen targets-----===//
2900 +//
2901 +// The LLVM Compiler Infrastructure
2902 @@ -3555,6 +3923,12 @@ index 0000000..d09dc2e
2903 +bool AMDGPUPassConfig::addInstSelector() {
2904 + addPass(createAMDGPUPeepholeOpt(*TM));
2905 + addPass(createAMDGPUISelDag(getAMDGPUTargetMachine()));
2906 ++
2907 ++ const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>();
2908 ++ if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
2909 ++ // This callbacks this pass uses are not implemented yet on SI.
2910 ++ addPass(createAMDGPUIndirectAddressingPass(*TM));
2911 ++ }
2912 + return false;
2913 +}
2914 +
2915 @@ -3569,6 +3943,11 @@ index 0000000..d09dc2e
2916 +}
2917 +
2918 +bool AMDGPUPassConfig::addPostRegAlloc() {
2919 ++ const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>();
2920 ++
2921 ++ if (ST.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) {
2922 ++ addPass(createSIInsertWaits(*TM));
2923 ++ }
2924 + return false;
2925 +}
2926 +
2927 @@ -3585,8 +3964,8 @@ index 0000000..d09dc2e
2928 + addPass(createAMDGPUCFGStructurizerPass(*TM));
2929 + addPass(createR600ExpandSpecialInstrsPass(*TM));
2930 + addPass(&FinalizeMachineBundlesID);
2931 ++ addPass(createR600LowerConstCopy(*TM));
2932 + } else {
2933 -+ addPass(createSILowerLiteralConstantsPass(*TM));
2934 + addPass(createSILowerControlFlowPass(*TM));
2935 + }
2936 +
2937 @@ -3595,7 +3974,7 @@ index 0000000..d09dc2e
2938 +
2939 diff --git a/lib/Target/R600/AMDGPUTargetMachine.h b/lib/Target/R600/AMDGPUTargetMachine.h
2940 new file mode 100644
2941 -index 0000000..399e55c
2942 +index 0000000..5a1dcf4
2943 --- /dev/null
2944 +++ b/lib/Target/R600/AMDGPUTargetMachine.h
2945 @@ -0,0 +1,70 @@
2946 @@ -3616,9 +3995,9 @@ index 0000000..399e55c
2947 +#ifndef AMDGPU_TARGET_MACHINE_H
2948 +#define AMDGPU_TARGET_MACHINE_H
2949 +
2950 ++#include "AMDGPUFrameLowering.h"
2951 +#include "AMDGPUInstrInfo.h"
2952 +#include "AMDGPUSubtarget.h"
2953 -+#include "AMDILFrameLowering.h"
2954 +#include "AMDILIntrinsicInfo.h"
2955 +#include "R600ISelLowering.h"
2956 +#include "llvm/ADT/OwningPtr.h"
2957 @@ -3671,10 +4050,10 @@ index 0000000..399e55c
2958 +#endif // AMDGPU_TARGET_MACHINE_H
2959 diff --git a/lib/Target/R600/AMDIL.h b/lib/Target/R600/AMDIL.h
2960 new file mode 100644
2961 -index 0000000..4e577dc
2962 +index 0000000..b39fbdb
2963 --- /dev/null
2964 +++ b/lib/Target/R600/AMDIL.h
2965 -@@ -0,0 +1,106 @@
2966 +@@ -0,0 +1,122 @@
2967 +//===-- AMDIL.h - Top-level interface for AMDIL representation --*- C++ -*-===//
2968 +//
2969 +// The LLVM Compiler Infrastructure
2970 @@ -3767,14 +4146,30 @@ index 0000000..4e577dc
2971 +enum AddressSpaces {
2972 + PRIVATE_ADDRESS = 0, ///< Address space for private memory.
2973 + GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).
2974 -+ CONSTANT_ADDRESS = 2, ///< Address space for constant memory.
2975 ++ CONSTANT_ADDRESS = 2, ///< Address space for constant memory
2976 + LOCAL_ADDRESS = 3, ///< Address space for local memory.
2977 + REGION_ADDRESS = 4, ///< Address space for region memory.
2978 + ADDRESS_NONE = 5, ///< Address space for unknown memory.
2979 + PARAM_D_ADDRESS = 6, ///< Address space for direct addressible parameter memory (CONST0)
2980 + PARAM_I_ADDRESS = 7, ///< Address space for indirect addressible parameter memory (VTX1)
2981 + USER_SGPR_ADDRESS = 8, ///< Address space for USER_SGPRS on SI
2982 -+ LAST_ADDRESS = 9
2983 ++ CONSTANT_BUFFER_0 = 9,
2984 ++ CONSTANT_BUFFER_1 = 10,
2985 ++ CONSTANT_BUFFER_2 = 11,
2986 ++ CONSTANT_BUFFER_3 = 12,
2987 ++ CONSTANT_BUFFER_4 = 13,
2988 ++ CONSTANT_BUFFER_5 = 14,
2989 ++ CONSTANT_BUFFER_6 = 15,
2990 ++ CONSTANT_BUFFER_7 = 16,
2991 ++ CONSTANT_BUFFER_8 = 17,
2992 ++ CONSTANT_BUFFER_9 = 18,
2993 ++ CONSTANT_BUFFER_10 = 19,
2994 ++ CONSTANT_BUFFER_11 = 20,
2995 ++ CONSTANT_BUFFER_12 = 21,
2996 ++ CONSTANT_BUFFER_13 = 22,
2997 ++ CONSTANT_BUFFER_14 = 23,
2998 ++ CONSTANT_BUFFER_15 = 24,
2999 ++ LAST_ADDRESS = 25
3000 +};
3001 +
3002 +} // namespace AMDGPUAS
3003 @@ -4073,10 +4468,10 @@ index 0000000..c12cedc
3004 +
3005 diff --git a/lib/Target/R600/AMDILCFGStructurizer.cpp b/lib/Target/R600/AMDILCFGStructurizer.cpp
3006 new file mode 100644
3007 -index 0000000..9de97b6
3008 +index 0000000..568d281
3009 --- /dev/null
3010 +++ b/lib/Target/R600/AMDILCFGStructurizer.cpp
3011 -@@ -0,0 +1,3049 @@
3012 +@@ -0,0 +1,3045 @@
3013 +//===-- AMDILCFGStructurizer.cpp - CFG Structurizer -----------------------===//
3014 +//
3015 +// The LLVM Compiler Infrastructure
3016 @@ -6101,9 +6496,7 @@ index 0000000..9de97b6
3017 + CFGTraits::insertAssignInstrBefore(insertPos, passRep, immReg, 1);
3018 + InstrT *newInstr =
3019 + CFGTraits::insertInstrBefore(insertPos, AMDGPU::BRANCH_COND_i32, passRep);
3020 -+ MachineInstrBuilder MIB(*funcRep, newInstr);
3021 -+ MIB.addMBB(loopHeader);
3022 -+ MIB.addReg(immReg, false);
3023 ++ MachineInstrBuilder(newInstr).addMBB(loopHeader).addReg(immReg, false);
3024 +
3025 + SHOWNEWINSTR(newInstr);
3026 +
3027 @@ -6925,12 +7318,13 @@ index 0000000..9de97b6
3028 + MachineInstr *oldInstr = &(*instrPos);
3029 + const TargetInstrInfo *tii = passRep->getTargetInstrInfo();
3030 + MachineBasicBlock *blk = oldInstr->getParent();
3031 -+ MachineFunction *MF = blk->getParent();
3032 -+ MachineInstr *newInstr = MF->CreateMachineInstr(tii->get(newOpcode), DL);
3033 ++ MachineInstr *newInstr =
3034 ++ blk->getParent()->CreateMachineInstr(tii->get(newOpcode),
3035 ++ DL);
3036 +
3037 + blk->insert(instrPos, newInstr);
3038 -+ MachineInstrBuilder MIB(*MF, newInstr);
3039 -+ MIB.addReg(oldInstr->getOperand(1).getReg(), false);
3040 ++ MachineInstrBuilder(newInstr).addReg(oldInstr->getOperand(1).getReg(),
3041 ++ false);
3042 +
3043 + SHOWNEWINSTR(newInstr);
3044 + //erase later oldInstr->eraseFromParent();
3045 @@ -6943,13 +7337,13 @@ index 0000000..9de97b6
3046 + RegiT regNum,
3047 + DebugLoc DL) {
3048 + const TargetInstrInfo *tii = passRep->getTargetInstrInfo();
3049 -+ MachineFunction *MF = blk->getParent();
3050 +
3051 -+ MachineInstr *newInstr = MF->CreateMachineInstr(tii->get(newOpcode), DL);
3052 ++ MachineInstr *newInstr =
3053 ++ blk->getParent()->CreateMachineInstr(tii->get(newOpcode), DL);
3054 +
3055 + //insert before
3056 + blk->insert(insertPos, newInstr);
3057 -+ MachineInstrBuilder(*MF, newInstr).addReg(regNum, false);
3058 ++ MachineInstrBuilder(newInstr).addReg(regNum, false);
3059 +
3060 + SHOWNEWINSTR(newInstr);
3061 + } //insertCondBranchBefore
3062 @@ -6959,12 +7353,11 @@ index 0000000..9de97b6
3063 + AMDGPUCFGStructurizer *passRep,
3064 + RegiT regNum) {
3065 + const TargetInstrInfo *tii = passRep->getTargetInstrInfo();
3066 -+ MachineFunction *MF = blk->getParent();
3067 + MachineInstr *newInstr =
3068 -+ MF->CreateMachineInstr(tii->get(newOpcode), DebugLoc());
3069 ++ blk->getParent()->CreateMachineInstr(tii->get(newOpcode), DebugLoc());
3070 +
3071 + blk->push_back(newInstr);
3072 -+ MachineInstrBuilder(*MF, newInstr).addReg(regNum, false);
3073 ++ MachineInstrBuilder(newInstr).addReg(regNum, false);
3074 +
3075 + SHOWNEWINSTR(newInstr);
3076 + } //insertCondBranchEnd
3077 @@ -7009,14 +7402,12 @@ index 0000000..9de97b6
3078 + RegiT src2Reg) {
3079 + const AMDGPUInstrInfo *tii =
3080 + static_cast<const AMDGPUInstrInfo *>(passRep->getTargetInstrInfo());
3081 -+ MachineFunction *MF = blk->getParent();
3082 + MachineInstr *newInstr =
3083 -+ MF->CreateMachineInstr(tii->get(tii->getIEQOpcode()), DebugLoc());
3084 ++ blk->getParent()->CreateMachineInstr(tii->get(tii->getIEQOpcode()), DebugLoc());
3085 +
3086 -+ MachineInstrBuilder MIB(*MF, newInstr);
3087 -+ MIB.addReg(dstReg, RegState::Define); //set target
3088 -+ MIB.addReg(src1Reg); //set src value
3089 -+ MIB.addReg(src2Reg); //set src value
3090 ++ MachineInstrBuilder(newInstr).addReg(dstReg, RegState::Define); //set target
3091 ++ MachineInstrBuilder(newInstr).addReg(src1Reg); //set src value
3092 ++ MachineInstrBuilder(newInstr).addReg(src2Reg); //set src value
3093 +
3094 + blk->insert(instrPos, newInstr);
3095 + SHOWNEWINSTR(newInstr);
3096 @@ -7872,13 +8263,13 @@ index 0000000..6dc2deb
3097 +
3098 +} // namespace llvm
3099 +#endif // AMDILEVERGREENDEVICE_H
3100 -diff --git a/lib/Target/R600/AMDILFrameLowering.cpp b/lib/Target/R600/AMDILFrameLowering.cpp
3101 +diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp b/lib/Target/R600/AMDILISelDAGToDAG.cpp
3102 new file mode 100644
3103 -index 0000000..9ad495a
3104 +index 0000000..2e726e9
3105 --- /dev/null
3106 -+++ b/lib/Target/R600/AMDILFrameLowering.cpp
3107 -@@ -0,0 +1,47 @@
3108 -+//===----------------------- AMDILFrameLowering.cpp -----------------*- C++ -*-===//
3109 ++++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp
3110 +@@ -0,0 +1,577 @@
3111 ++//===-- AMDILISelDAGToDAG.cpp - A dag to dag inst selector for AMDIL ------===//
3112 +//
3113 +// The LLVM Compiler Infrastructure
3114 +//
3115 @@ -7888,119 +8279,21 @@ index 0000000..9ad495a
3116 +//==-----------------------------------------------------------------------===//
3117 +//
3118 +/// \file
3119 -+/// \brief Interface to describe a layout of a stack frame on a AMDGPU target
3120 -+/// machine.
3121 ++/// \brief Defines an instruction selector for the AMDGPU target.
3122 +//
3123 +//===----------------------------------------------------------------------===//
3124 -+#include "AMDILFrameLowering.h"
3125 -+#include "llvm/CodeGen/MachineFrameInfo.h"
3126 -+
3127 -+using namespace llvm;
3128 -+AMDGPUFrameLowering::AMDGPUFrameLowering(StackDirection D, unsigned StackAl,
3129 -+ int LAO, unsigned TransAl)
3130 -+ : TargetFrameLowering(D, StackAl, LAO, TransAl) {
3131 -+}
3132 -+
3133 -+AMDGPUFrameLowering::~AMDGPUFrameLowering() {
3134 -+}
3135 -+
3136 -+int AMDGPUFrameLowering::getFrameIndexOffset(const MachineFunction &MF,
3137 -+ int FI) const {
3138 -+ const MachineFrameInfo *MFI = MF.getFrameInfo();
3139 -+ return MFI->getObjectOffset(FI);
3140 -+}
3141 -+
3142 -+const TargetFrameLowering::SpillSlot *
3143 -+AMDGPUFrameLowering::getCalleeSavedSpillSlots(unsigned &NumEntries) const {
3144 -+ NumEntries = 0;
3145 -+ return 0;
3146 -+}
3147 -+void
3148 -+AMDGPUFrameLowering::emitPrologue(MachineFunction &MF) const {
3149 -+}
3150 -+void
3151 -+AMDGPUFrameLowering::emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const {
3152 -+}
3153 -+bool
3154 -+AMDGPUFrameLowering::hasFP(const MachineFunction &MF) const {
3155 -+ return false;
3156 -+}
3157 -diff --git a/lib/Target/R600/AMDILFrameLowering.h b/lib/Target/R600/AMDILFrameLowering.h
3158 -new file mode 100644
3159 -index 0000000..51337c3
3160 ---- /dev/null
3161 -+++ b/lib/Target/R600/AMDILFrameLowering.h
3162 -@@ -0,0 +1,40 @@
3163 -+//===--------------------- AMDILFrameLowering.h -----------------*- C++ -*-===//
3164 -+//
3165 -+// The LLVM Compiler Infrastructure
3166 -+//
3167 -+// This file is distributed under the University of Illinois Open Source
3168 -+// License. See LICENSE.TXT for details.
3169 -+//
3170 -+//===----------------------------------------------------------------------===//
3171 -+//
3172 -+/// \file
3173 -+/// \brief Interface to describe a layout of a stack frame on a AMDIL target
3174 -+/// machine.
3175 -+//
3176 -+//===----------------------------------------------------------------------===//
3177 -+#ifndef AMDILFRAME_LOWERING_H
3178 -+#define AMDILFRAME_LOWERING_H
3179 -+
3180 -+#include "llvm/CodeGen/MachineFunction.h"
3181 -+#include "llvm/Target/TargetFrameLowering.h"
3182 -+
3183 -+namespace llvm {
3184 -+
3185 -+/// \brief Information about the stack frame layout on the AMDGPU targets.
3186 -+///
3187 -+/// It holds the direction of the stack growth, the known stack alignment on
3188 -+/// entry to each function, and the offset to the locals area.
3189 -+/// See TargetFrameInfo for more comments.
3190 -+class AMDGPUFrameLowering : public TargetFrameLowering {
3191 -+public:
3192 -+ AMDGPUFrameLowering(StackDirection D, unsigned StackAl, int LAO,
3193 -+ unsigned TransAl = 1);
3194 -+ virtual ~AMDGPUFrameLowering();
3195 -+ virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const;
3196 -+ virtual const SpillSlot *getCalleeSavedSpillSlots(unsigned &NumEntries) const;
3197 -+ virtual void emitPrologue(MachineFunction &MF) const;
3198 -+ virtual void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const;
3199 -+ virtual bool hasFP(const MachineFunction &MF) const;
3200 -+};
3201 -+} // namespace llvm
3202 -+#endif // AMDILFRAME_LOWERING_H
3203 -diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp b/lib/Target/R600/AMDILISelDAGToDAG.cpp
3204 -new file mode 100644
3205 -index 0000000..d15ed39
3206 ---- /dev/null
3207 -+++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp
3208 -@@ -0,0 +1,485 @@
3209 -+//===-- AMDILISelDAGToDAG.cpp - A dag to dag inst selector for AMDIL ------===//
3210 -+//
3211 -+// The LLVM Compiler Infrastructure
3212 -+//
3213 -+// This file is distributed under the University of Illinois Open Source
3214 -+// License. See LICENSE.TXT for details.
3215 -+//
3216 -+//==-----------------------------------------------------------------------===//
3217 -+//
3218 -+/// \file
3219 -+/// \brief Defines an instruction selector for the AMDGPU target.
3220 -+//
3221 -+//===----------------------------------------------------------------------===//
3222 -+#include "AMDGPUInstrInfo.h"
3223 -+#include "AMDGPUISelLowering.h" // For AMDGPUISD
3224 -+#include "AMDGPURegisterInfo.h"
3225 -+#include "AMDILDevices.h"
3226 -+#include "R600InstrInfo.h"
3227 -+#include "llvm/ADT/ValueMap.h"
3228 -+#include "llvm/CodeGen/PseudoSourceValue.h"
3229 -+#include "llvm/CodeGen/SelectionDAGISel.h"
3230 -+#include "llvm/Support/Compiler.h"
3231 -+#include <list>
3232 -+#include <queue>
3233 ++#include "AMDGPUInstrInfo.h"
3234 ++#include "AMDGPUISelLowering.h" // For AMDGPUISD
3235 ++#include "AMDGPURegisterInfo.h"
3236 ++#include "AMDILDevices.h"
3237 ++#include "R600InstrInfo.h"
3238 ++#include "llvm/ADT/ValueMap.h"
3239 ++#include "llvm/CodeGen/PseudoSourceValue.h"
3240 ++#include "llvm/CodeGen/SelectionDAGISel.h"
3241 ++#include "llvm/Support/Compiler.h"
3242 ++#include "llvm/CodeGen/SelectionDAG.h"
3243 ++#include <list>
3244 ++#include <queue>
3245 +
3246 +using namespace llvm;
3247 +
3248 @@ -8024,6 +8317,7 @@ index 0000000..d15ed39
3249 +
3250 +private:
3251 + inline SDValue getSmallIPtrImm(unsigned Imm);
3252 ++ bool FoldOperands(unsigned, const R600InstrInfo *, std::vector<SDValue> &);
3253 +
3254 + // Complex pattern selectors
3255 + bool SelectADDRParam(SDValue Addr, SDValue& R1, SDValue& R2);
3256 @@ -8046,9 +8340,11 @@ index 0000000..d15ed39
3257 + static bool isLocalLoad(const LoadSDNode *N);
3258 + static bool isRegionLoad(const LoadSDNode *N);
3259 +
3260 -+ bool SelectADDR8BitOffset(SDValue Addr, SDValue& Base, SDValue& Offset);
3261 -+ bool SelectADDRReg(SDValue Addr, SDValue& Base, SDValue& Offset);
3262 ++ bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr);
3263 ++ bool SelectGlobalValueVariableOffset(SDValue Addr,
3264 ++ SDValue &BaseReg, SDValue& Offset);
3265 + bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base, SDValue &Offset);
3266 ++ bool SelectADDRIndirect(SDValue Addr, SDValue &Base, SDValue &Offset);
3267 +
3268 + // Include the pieces autogenerated from the target description.
3269 +#include "AMDGPUGenDAGISel.inc"
3270 @@ -8135,16 +8431,6 @@ index 0000000..d15ed39
3271 + }
3272 + switch (Opc) {
3273 + default: break;
3274 -+ case ISD::FrameIndex: {
3275 -+ if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(N)) {
3276 -+ unsigned int FI = FIN->getIndex();
3277 -+ EVT OpVT = N->getValueType(0);
3278 -+ unsigned int NewOpc = AMDGPU::COPY;
3279 -+ SDValue TFI = CurDAG->getTargetFrameIndex(FI, MVT::i32);
3280 -+ return CurDAG->SelectNodeTo(N, NewOpc, OpVT, TFI);
3281 -+ }
3282 -+ break;
3283 -+ }
3284 + case ISD::ConstantFP:
3285 + case ISD::Constant: {
3286 + const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>();
3287 @@ -8203,7 +8489,9 @@ index 0000000..d15ed39
3288 + continue;
3289 + }
3290 + } else {
3291 -+ if (!TII->isALUInstr(Use->getMachineOpcode())) {
3292 ++ if (!TII->isALUInstr(Use->getMachineOpcode()) ||
3293 ++ (TII->get(Use->getMachineOpcode()).TSFlags &
3294 ++ R600_InstFlag::VECTOR)) {
3295 + continue;
3296 + }
3297 +
3298 @@ -8238,7 +8526,116 @@ index 0000000..d15ed39
3299 + break;
3300 + }
3301 + }
3302 -+ return SelectCode(N);
3303 ++ SDNode *Result = SelectCode(N);
3304 ++
3305 ++ // Fold operands of selected node
3306 ++
3307 ++ const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>();
3308 ++ if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
3309 ++ const R600InstrInfo *TII =
3310 ++ static_cast<const R600InstrInfo*>(TM.getInstrInfo());
3311 ++ if (Result && Result->isMachineOpcode() &&
3312 ++ !(TII->get(Result->getMachineOpcode()).TSFlags & R600_InstFlag::VECTOR)
3313 ++ && TII->isALUInstr(Result->getMachineOpcode())) {
3314 ++ // Fold FNEG/FABS/CONST_ADDRESS
3315 ++ // TODO: Isel can generate multiple MachineInst, we need to recursively
3316 ++ // parse Result
3317 ++ bool IsModified = false;
3318 ++ do {
3319 ++ std::vector<SDValue> Ops;
3320 ++ for(SDNode::op_iterator I = Result->op_begin(), E = Result->op_end();
3321 ++ I != E; ++I)
3322 ++ Ops.push_back(*I);
3323 ++ IsModified = FoldOperands(Result->getMachineOpcode(), TII, Ops);
3324 ++ if (IsModified) {
3325 ++ Result = CurDAG->UpdateNodeOperands(Result, Ops.data(), Ops.size());
3326 ++ }
3327 ++ } while (IsModified);
3328 ++
3329 ++ // If node has a single use which is CLAMP_R600, folds it
3330 ++ if (Result->hasOneUse() && Result->isMachineOpcode()) {
3331 ++ SDNode *PotentialClamp = *Result->use_begin();
3332 ++ if (PotentialClamp->isMachineOpcode() &&
3333 ++ PotentialClamp->getMachineOpcode() == AMDGPU::CLAMP_R600) {
3334 ++ unsigned ClampIdx =
3335 ++ TII->getOperandIdx(Result->getMachineOpcode(), R600Operands::CLAMP);
3336 ++ std::vector<SDValue> Ops;
3337 ++ unsigned NumOp = Result->getNumOperands();
3338 ++ for (unsigned i = 0; i < NumOp; ++i) {
3339 ++ Ops.push_back(Result->getOperand(i));
3340 ++ }
3341 ++ Ops[ClampIdx - 1] = CurDAG->getTargetConstant(1, MVT::i32);
3342 ++ Result = CurDAG->SelectNodeTo(PotentialClamp,
3343 ++ Result->getMachineOpcode(), PotentialClamp->getVTList(),
3344 ++ Ops.data(), NumOp);
3345 ++ }
3346 ++ }
3347 ++ }
3348 ++ }
3349 ++
3350 ++ return Result;
3351 ++}
3352 ++
3353 ++bool AMDGPUDAGToDAGISel::FoldOperands(unsigned Opcode,
3354 ++ const R600InstrInfo *TII, std::vector<SDValue> &Ops) {
3355 ++ int OperandIdx[] = {
3356 ++ TII->getOperandIdx(Opcode, R600Operands::SRC0),
3357 ++ TII->getOperandIdx(Opcode, R600Operands::SRC1),
3358 ++ TII->getOperandIdx(Opcode, R600Operands::SRC2)
3359 ++ };
3360 ++ int SelIdx[] = {
3361 ++ TII->getOperandIdx(Opcode, R600Operands::SRC0_SEL),
3362 ++ TII->getOperandIdx(Opcode, R600Operands::SRC1_SEL),
3363 ++ TII->getOperandIdx(Opcode, R600Operands::SRC2_SEL)
3364 ++ };
3365 ++ int NegIdx[] = {
3366 ++ TII->getOperandIdx(Opcode, R600Operands::SRC0_NEG),
3367 ++ TII->getOperandIdx(Opcode, R600Operands::SRC1_NEG),
3368 ++ TII->getOperandIdx(Opcode, R600Operands::SRC2_NEG)
3369 ++ };
3370 ++ int AbsIdx[] = {
3371 ++ TII->getOperandIdx(Opcode, R600Operands::SRC0_ABS),
3372 ++ TII->getOperandIdx(Opcode, R600Operands::SRC1_ABS),
3373 ++ -1
3374 ++ };
3375 ++
3376 ++ for (unsigned i = 0; i < 3; i++) {
3377 ++ if (OperandIdx[i] < 0)
3378 ++ return false;
3379 ++ SDValue Operand = Ops[OperandIdx[i] - 1];
3380 ++ switch (Operand.getOpcode()) {
3381 ++ case AMDGPUISD::CONST_ADDRESS: {
3382 ++ if (i == 2)
3383 ++ break;
3384 ++ SDValue CstOffset;
3385 ++ if (!Operand.getValueType().isVector() &&
3386 ++ SelectGlobalValueConstantOffset(Operand.getOperand(0), CstOffset)) {
3387 ++ Ops[OperandIdx[i] - 1] = CurDAG->getRegister(AMDGPU::ALU_CONST, MVT::f32);
3388 ++ Ops[SelIdx[i] - 1] = CstOffset;
3389 ++ return true;
3390 ++ }
3391 ++ }
3392 ++ break;
3393 ++ case ISD::FNEG:
3394 ++ if (NegIdx[i] < 0)
3395 ++ break;
3396 ++ Ops[OperandIdx[i] - 1] = Operand.getOperand(0);
3397 ++ Ops[NegIdx[i] - 1] = CurDAG->getTargetConstant(1, MVT::i32);
3398 ++ return true;
3399 ++ case ISD::FABS:
3400 ++ if (AbsIdx[i] < 0)
3401 ++ break;
3402 ++ Ops[OperandIdx[i] - 1] = Operand.getOperand(0);
3403 ++ Ops[AbsIdx[i] - 1] = CurDAG->getTargetConstant(1, MVT::i32);
3404 ++ return true;
3405 ++ case ISD::BITCAST:
3406 ++ Ops[OperandIdx[i] - 1] = Operand.getOperand(0);
3407 ++ return true;
3408 ++ default:
3409 ++ break;
3410 ++ }
3411 ++ }
3412 ++ return false;
3413 +}
3414 +
3415 +bool AMDGPUDAGToDAGISel::checkType(const Value *ptr, unsigned int addrspace) {
3416 @@ -8385,41 +8782,23 @@ index 0000000..d15ed39
3417 +
3418 +///==== AMDGPU Functions ====///
3419 +
3420 -+bool AMDGPUDAGToDAGISel::SelectADDR8BitOffset(SDValue Addr, SDValue& Base,
3421 -+ SDValue& Offset) {
3422 -+ if (Addr.getOpcode() == ISD::TargetExternalSymbol ||
3423 -+ Addr.getOpcode() == ISD::TargetGlobalAddress) {
3424 -+ return false;
3425 ++bool AMDGPUDAGToDAGISel::SelectGlobalValueConstantOffset(SDValue Addr,
3426 ++ SDValue& IntPtr) {
3427 ++ if (ConstantSDNode *Cst = dyn_cast<ConstantSDNode>(Addr)) {
3428 ++ IntPtr = CurDAG->getIntPtrConstant(Cst->getZExtValue() / 4, true);
3429 ++ return true;
3430 + }
3431 ++ return false;
3432 ++}
3433 +
3434 -+
3435 -+ if (Addr.getOpcode() == ISD::ADD) {
3436 -+ bool Match = false;
3437 -+
3438 -+ // Find the base ptr and the offset
3439 -+ for (unsigned i = 0; i < Addr.getNumOperands(); i++) {
3440 -+ SDValue Arg = Addr.getOperand(i);
3441 -+ ConstantSDNode * OffsetNode = dyn_cast<ConstantSDNode>(Arg);
3442 -+ // This arg isn't a constant so it must be the base PTR.
3443 -+ if (!OffsetNode) {
3444 -+ Base = Addr.getOperand(i);
3445 -+ continue;
3446 -+ }
3447 -+ // Check if the constant argument fits in 8-bits. The offset is in bytes
3448 -+ // so we need to convert it to dwords.
3449 -+ if (isUInt<8>(OffsetNode->getZExtValue() >> 2)) {
3450 -+ Match = true;
3451 -+ Offset = CurDAG->getTargetConstant(OffsetNode->getZExtValue() >> 2,
3452 -+ MVT::i32);
3453 -+ }
3454 -+ }
3455 -+ return Match;
3456 ++bool AMDGPUDAGToDAGISel::SelectGlobalValueVariableOffset(SDValue Addr,
3457 ++ SDValue& BaseReg, SDValue &Offset) {
3458 ++ if (!dyn_cast<ConstantSDNode>(Addr)) {
3459 ++ BaseReg = Addr;
3460 ++ Offset = CurDAG->getIntPtrConstant(0, true);
3461 ++ return true;
3462 + }
3463 -+
3464 -+ // Default case, no offset
3465 -+ Base = Addr;
3466 -+ Offset = CurDAG->getTargetConstant(0, MVT::i32);
3467 -+ return true;
3468 ++ return false;
3469 +}
3470 +
3471 +bool AMDGPUDAGToDAGISel::SelectADDRVTX_READ(SDValue Addr, SDValue &Base,
3472 @@ -8449,16 +8828,21 @@ index 0000000..d15ed39
3473 + return true;
3474 +}
3475 +
3476 -+bool AMDGPUDAGToDAGISel::SelectADDRReg(SDValue Addr, SDValue& Base,
3477 -+ SDValue& Offset) {
3478 -+ if (Addr.getOpcode() == ISD::TargetExternalSymbol ||
3479 -+ Addr.getOpcode() == ISD::TargetGlobalAddress ||
3480 -+ Addr.getOpcode() != ISD::ADD) {
3481 -+ return false;
3482 -+ }
3483 ++bool AMDGPUDAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,
3484 ++ SDValue &Offset) {
3485 ++ ConstantSDNode *C;
3486 +
3487 -+ Base = Addr.getOperand(0);
3488 -+ Offset = Addr.getOperand(1);
3489 ++ if ((C = dyn_cast<ConstantSDNode>(Addr))) {
3490 ++ Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);
3491 ++ Offset = CurDAG->getTargetConstant(C->getZExtValue(), MVT::i32);
3492 ++ } else if ((Addr.getOpcode() == ISD::ADD || Addr.getOpcode() == ISD::OR) &&
3493 ++ (C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {
3494 ++ Base = Addr.getOperand(0);
3495 ++ Offset = CurDAG->getTargetConstant(C->getZExtValue(), MVT::i32);
3496 ++ } else {
3497 ++ Base = Addr;
3498 ++ Offset = CurDAG->getTargetConstant(0, MVT::i32);
3499 ++ }
3500 +
3501 + return true;
3502 +}
3503 @@ -9857,10 +10241,10 @@ index 0000000..bc7df37
3504 +#endif // AMDILNIDEVICE_H
3505 diff --git a/lib/Target/R600/AMDILPeepholeOptimizer.cpp b/lib/Target/R600/AMDILPeepholeOptimizer.cpp
3506 new file mode 100644
3507 -index 0000000..4a748b8
3508 +index 0000000..57317ac
3509 --- /dev/null
3510 +++ b/lib/Target/R600/AMDILPeepholeOptimizer.cpp
3511 -@@ -0,0 +1,1215 @@
3512 +@@ -0,0 +1,1256 @@
3513 +//===-- AMDILPeepholeOptimizer.cpp - AMDGPU Peephole optimizations ---------===//
3514 +//
3515 +// The LLVM Compiler Infrastructure
3516 @@ -10409,14 +10793,51 @@ index 0000000..4a748b8
3517 + lhsMaskOffset = lhsMaskVal ? CountTrailingZeros_32(lhsMaskVal) : lhsShiftVal;
3518 + rhsMaskOffset = rhsMaskVal ? CountTrailingZeros_32(rhsMaskVal) : rhsShiftVal;
3519 + // TODO: Handle the case of A & B | D & ~B(i.e. inverted masks).
3520 ++ if (mDebug) {
3521 ++ dbgs() << "Found pattern: \'((A" << (LHSMask ? " & B)" : ")");
3522 ++ dbgs() << (LHSShift ? " << C)" : ")") << " | ((D" ;
3523 ++ dbgs() << (RHSMask ? " & E)" : ")");
3524 ++ dbgs() << (RHSShift ? " << F)\'\n" : ")\'\n");
3525 ++ dbgs() << "A = LHSSrc\t\tD = RHSSrc \n";
3526 ++ dbgs() << "B = " << lhsMaskVal << "\t\tE = " << rhsMaskVal << "\n";
3527 ++ dbgs() << "C = " << lhsShiftVal << "\t\tF = " << rhsShiftVal << "\n";
3528 ++ dbgs() << "width(B) = " << lhsMaskWidth;
3529 ++ dbgs() << "\twidth(E) = " << rhsMaskWidth << "\n";
3530 ++ dbgs() << "offset(B) = " << lhsMaskOffset;
3531 ++ dbgs() << "\toffset(E) = " << rhsMaskOffset << "\n";
3532 ++ dbgs() << "Constraints: \n";
3533 ++ dbgs() << "\t(1) B ^ E == 0\n";
3534 ++ dbgs() << "\t(2-LHS) B is a mask\n";
3535 ++ dbgs() << "\t(2-LHS) E is a mask\n";
3536 ++ dbgs() << "\t(3-LHS) (offset(B)) >= (width(E) + offset(E))\n";
3537 ++ dbgs() << "\t(3-RHS) (offset(E)) >= (width(B) + offset(B))\n";
3538 ++ }
3539 + if ((lhsMaskVal || rhsMaskVal) && !(lhsMaskVal ^ rhsMaskVal)) {
3540 ++ if (mDebug) {
3541 ++ dbgs() << lhsMaskVal << " ^ " << rhsMaskVal;
3542 ++ dbgs() << " = " << (lhsMaskVal ^ rhsMaskVal) << "\n";
3543 ++ dbgs() << "Failed constraint 1!\n";
3544 ++ }
3545 + return false;
3546 + }
3547 ++ if (mDebug) {
3548 ++ dbgs() << "LHS = " << lhsMaskOffset << "";
3549 ++ dbgs() << " >= (" << rhsMaskWidth << " + " << rhsMaskOffset << ") = ";
3550 ++ dbgs() << (lhsMaskOffset >= (rhsMaskWidth + rhsMaskOffset));
3551 ++ dbgs() << "\nRHS = " << rhsMaskOffset << "";
3552 ++ dbgs() << " >= (" << lhsMaskWidth << " + " << lhsMaskOffset << ") = ";
3553 ++ dbgs() << (rhsMaskOffset >= (lhsMaskWidth + lhsMaskOffset));
3554 ++ dbgs() << "\n";
3555 ++ }
3556 + if (lhsMaskOffset >= (rhsMaskWidth + rhsMaskOffset)) {
3557 + offset = ConstantInt::get(aType, lhsMaskOffset, false);
3558 + width = ConstantInt::get(aType, lhsMaskWidth, false);
3559 + RHSSrc = RHS;
3560 + if (!isMask_32(lhsMaskVal) && !isShiftedMask_32(lhsMaskVal)) {
3561 ++ if (mDebug) {
3562 ++ dbgs() << "Value is not a Mask: " << lhsMaskVal << "\n";
3563 ++ dbgs() << "Failed constraint 2!\n";
3564 ++ }
3565 + return false;
3566 + }
3567 + if (!LHSShift) {
3568 @@ -10435,6 +10856,10 @@ index 0000000..4a748b8
3569 + LHSSrc = RHSSrc;
3570 + RHSSrc = LHS;
3571 + if (!isMask_32(rhsMaskVal) && !isShiftedMask_32(rhsMaskVal)) {
3572 ++ if (mDebug) {
3573 ++ dbgs() << "Non-Mask: " << rhsMaskVal << "\n";
3574 ++ dbgs() << "Failed constraint 2!\n";
3575 ++ }
3576 + return false;
3577 + }
3578 + if (!RHSShift) {
3579 @@ -11287,10 +11712,10 @@ index 0000000..5b2cb25
3580 +#endif // AMDILSIDEVICE_H
3581 diff --git a/lib/Target/R600/CMakeLists.txt b/lib/Target/R600/CMakeLists.txt
3582 new file mode 100644
3583 -index 0000000..ce0b56b
3584 +index 0000000..8ef9f8c
3585 --- /dev/null
3586 +++ b/lib/Target/R600/CMakeLists.txt
3587 -@@ -0,0 +1,55 @@
3588 +@@ -0,0 +1,56 @@
3589 +set(LLVM_TARGET_DEFINITIONS AMDGPU.td)
3590 +
3591 +tablegen(LLVM AMDGPUGenRegisterInfo.inc -gen-register-info)
3592 @@ -11304,7 +11729,7 @@ index 0000000..ce0b56b
3593 +tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)
3594 +add_public_tablegen_target(AMDGPUCommonTableGen)
3595 +
3596 -+add_llvm_target(R600CodeGen
3597 ++add_llvm_target(AMDGPUCodeGen
3598 + AMDIL7XXDevice.cpp
3599 + AMDILCFGStructurizer.cpp
3600 + AMDILDevice.cpp
3601 @@ -11318,9 +11743,9 @@ index 0000000..ce0b56b
3602 + AMDILPeepholeOptimizer.cpp
3603 + AMDILSIDevice.cpp
3604 + AMDGPUAsmPrinter.cpp
3605 ++ AMDGPUIndirectAddressing.cpp
3606 + AMDGPUMCInstLower.cpp
3607 + AMDGPUSubtarget.cpp
3608 -+ AMDGPUStructurizeCFG.cpp
3609 + AMDGPUTargetMachine.cpp
3610 + AMDGPUISelLowering.cpp
3611 + AMDGPUConvertToISA.cpp
3612 @@ -11329,9 +11754,9 @@ index 0000000..ce0b56b
3613 + R600ExpandSpecialInstrs.cpp
3614 + R600InstrInfo.cpp
3615 + R600ISelLowering.cpp
3616 ++ R600LowerConstCopy.cpp
3617 + R600MachineFunctionInfo.cpp
3618 + R600RegisterInfo.cpp
3619 -+ SIAnnotateControlFlow.cpp
3620 + SIAssignInterpRegs.cpp
3621 + SIInstrInfo.cpp
3622 + SIISelLowering.cpp
3623 @@ -11339,6 +11764,7 @@ index 0000000..ce0b56b
3624 + SILowerControlFlow.cpp
3625 + SIMachineFunctionInfo.cpp
3626 + SIRegisterInfo.cpp
3627 ++ SIFixSGPRLiveness.cpp
3628 + )
3629 +
3630 +add_dependencies(LLVMR600CodeGen intrinsics_gen)
3631 @@ -11348,10 +11774,10 @@ index 0000000..ce0b56b
3632 +add_subdirectory(MCTargetDesc)
3633 diff --git a/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp
3634 new file mode 100644
3635 -index 0000000..e6c550b
3636 +index 0000000..d6450a0
3637 --- /dev/null
3638 +++ b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp
3639 -@@ -0,0 +1,132 @@
3640 +@@ -0,0 +1,168 @@
3641 +//===-- AMDGPUInstPrinter.cpp - AMDGPU MC Inst -> ASM ---------------------===//
3642 +//
3643 +// The LLVM Compiler Infrastructure
3644 @@ -11394,6 +11820,21 @@ index 0000000..e6c550b
3645 + }
3646 +}
3647 +
3648 ++void AMDGPUInstPrinter::printInterpSlot(const MCInst *MI, unsigned OpNum,
3649 ++ raw_ostream &O) {
3650 ++ unsigned Imm = MI->getOperand(OpNum).getImm();
3651 ++
3652 ++ if (Imm == 2) {
3653 ++ O << "P0";
3654 ++ } else if (Imm == 1) {
3655 ++ O << "P20";
3656 ++ } else if (Imm == 0) {
3657 ++ O << "P10";
3658 ++ } else {
3659 ++ assert(!"Invalid interpolation parameter slot");
3660 ++ }
3661 ++}
3662 ++
3663 +void AMDGPUInstPrinter::printMemOperand(const MCInst *MI, unsigned OpNo,
3664 + raw_ostream &O) {
3665 + printOperand(MI, OpNo, O);
3666 @@ -11459,10 +11900,7 @@ index 0000000..e6c550b
3667 +
3668 +void AMDGPUInstPrinter::printRel(const MCInst *MI, unsigned OpNo,
3669 + raw_ostream &O) {
3670 -+ const MCOperand &Op = MI->getOperand(OpNo);
3671 -+ if (Op.getImm() != 0) {
3672 -+ O << " + " << Op.getImm();
3673 -+ }
3674 ++ printIfSet(MI, OpNo, O, "+");
3675 +}
3676 +
3677 +void AMDGPUInstPrinter::printUpdateExecMask(const MCInst *MI, unsigned OpNo,
3678 @@ -11483,13 +11921,37 @@ index 0000000..e6c550b
3679 + }
3680 +}
3681 +
3682 ++void AMDGPUInstPrinter::printSel(const MCInst *MI, unsigned OpNo,
3683 ++ raw_ostream &O) {
3684 ++ const char * chans = "XYZW";
3685 ++ int sel = MI->getOperand(OpNo).getImm();
3686 ++
3687 ++ int chan = sel & 3;
3688 ++ sel >>= 2;
3689 ++
3690 ++ if (sel >= 512) {
3691 ++ sel -= 512;
3692 ++ int cb = sel >> 12;
3693 ++ sel &= 4095;
3694 ++ O << cb << "[" << sel << "]";
3695 ++ } else if (sel >= 448) {
3696 ++ sel -= 448;
3697 ++ O << sel;
3698 ++ } else if (sel >= 0){
3699 ++ O << sel;
3700 ++ }
3701 ++
3702 ++ if (sel >= 0)
3703 ++ O << "." << chans[chan];
3704 ++}
3705 ++
3706 +#include "AMDGPUGenAsmWriter.inc"
3707 diff --git a/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h
3708 new file mode 100644
3709 -index 0000000..96e0e46
3710 +index 0000000..767a708
3711 --- /dev/null
3712 +++ b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h
3713 -@@ -0,0 +1,52 @@
3714 +@@ -0,0 +1,54 @@
3715 +//===-- AMDGPUInstPrinter.h - AMDGPU MC Inst -> ASM interface ---*- C++ -*-===//
3716 +//
3717 +// The LLVM Compiler Infrastructure
3718 @@ -11525,6 +11987,7 @@ index 0000000..96e0e46
3719 +
3720 +private:
3721 + void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
3722 ++ void printInterpSlot(const MCInst *MI, unsigned OpNum, raw_ostream &O);
3723 + void printMemOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
3724 + void printIfSet(const MCInst *MI, unsigned OpNo, raw_ostream &O, StringRef Asm);
3725 + void printAbs(const MCInst *MI, unsigned OpNo, raw_ostream &O);
3726 @@ -11537,6 +12000,7 @@ index 0000000..96e0e46
3727 + void printUpdateExecMask(const MCInst *MI, unsigned OpNo, raw_ostream &O);
3728 + void printUpdatePred(const MCInst *MI, unsigned OpNo, raw_ostream &O);
3729 + void printWrite(const MCInst *MI, unsigned OpNo, raw_ostream &O);
3730 ++ void printSel(const MCInst *MI, unsigned OpNo, raw_ostream &O);
3731 +};
3732 +
3733 +} // End namespace llvm
3734 @@ -11544,7 +12008,7 @@ index 0000000..96e0e46
3735 +#endif // AMDGPUINSTRPRINTER_H
3736 diff --git a/lib/Target/R600/InstPrinter/CMakeLists.txt b/lib/Target/R600/InstPrinter/CMakeLists.txt
3737 new file mode 100644
3738 -index 0000000..069c55b
3739 +index 0000000..6776337
3740 --- /dev/null
3741 +++ b/lib/Target/R600/InstPrinter/CMakeLists.txt
3742 @@ -0,0 +1,7 @@
3743 @@ -11554,7 +12018,7 @@ index 0000000..069c55b
3744 + AMDGPUInstPrinter.cpp
3745 + )
3746 +
3747 -+add_dependencies(LLVMR600AsmPrinter AMDGPUCommonTableGen)
3748 ++add_dependencies(LLVMR600AsmPrinter R600CommonTableGen)
3749 diff --git a/lib/Target/R600/InstPrinter/LLVMBuild.txt b/lib/Target/R600/InstPrinter/LLVMBuild.txt
3750 new file mode 100644
3751 index 0000000..ec0be89
3752 @@ -11869,10 +12333,10 @@ index 0000000..3ad0fa6
3753 +#endif // AMDGPUMCASMINFO_H
3754 diff --git a/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h b/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h
3755 new file mode 100644
3756 -index 0000000..9d0d6cf
3757 +index 0000000..8721f80
3758 --- /dev/null
3759 +++ b/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h
3760 -@@ -0,0 +1,60 @@
3761 +@@ -0,0 +1,49 @@
3762 +//===-- AMDGPUCodeEmitter.h - AMDGPU Code Emitter interface -----------------===//
3763 +//
3764 +// The LLVM Compiler Infrastructure
3765 @@ -11917,17 +12381,6 @@ index 0000000..9d0d6cf
3766 + SmallVectorImpl<MCFixup> &Fixups) const {
3767 + return 0;
3768 + }
3769 -+ virtual uint64_t VOPPostEncode(const MCInst &MI, uint64_t Value) const {
3770 -+ return Value;
3771 -+ }
3772 -+ virtual uint64_t i32LiteralEncode(const MCInst &MI, unsigned OpNo,
3773 -+ SmallVectorImpl<MCFixup> &Fixups) const {
3774 -+ return 0;
3775 -+ }
3776 -+ virtual uint32_t SMRDmemriEncode(const MCInst &MI, unsigned OpNo,
3777 -+ SmallVectorImpl<MCFixup> &Fixups) const {
3778 -+ return 0;
3779 -+ }
3780 +};
3781 +
3782 +} // End namespace llvm
3783 @@ -12182,10 +12635,10 @@ index 0000000..8894a76
3784 +include $(LEVEL)/Makefile.common
3785 diff --git a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
3786 new file mode 100644
3787 -index 0000000..dc91924
3788 +index 0000000..115fe8d
3789 --- /dev/null
3790 +++ b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
3791 -@@ -0,0 +1,575 @@
3792 +@@ -0,0 +1,582 @@
3793 +//===- R600MCCodeEmitter.cpp - Code Emitter for R600->Cayman GPU families -===//
3794 +//
3795 +// The LLVM Compiler Infrastructure
3796 @@ -12252,8 +12705,8 @@ index 0000000..dc91924
3797 + void EmitALUInstr(const MCInst &MI, SmallVectorImpl<MCFixup> &Fixups,
3798 + raw_ostream &OS) const;
3799 + void EmitSrc(const MCInst &MI, unsigned OpIdx, raw_ostream &OS) const;
3800 -+ void EmitSrcISA(const MCInst &MI, unsigned OpIdx, uint64_t &Value,
3801 -+ raw_ostream &OS) const;
3802 ++ void EmitSrcISA(const MCInst &MI, unsigned RegOpIdx, unsigned SelOpIdx,
3803 ++ raw_ostream &OS) const;
3804 + void EmitDst(const MCInst &MI, raw_ostream &OS) const;
3805 + void EmitTexInstr(const MCInst &MI, SmallVectorImpl<MCFixup> &Fixups,
3806 + raw_ostream &OS) const;
3807 @@ -12350,9 +12803,12 @@ index 0000000..dc91924
3808 + case AMDGPU::VTX_READ_PARAM_8_eg:
3809 + case AMDGPU::VTX_READ_PARAM_16_eg:
3810 + case AMDGPU::VTX_READ_PARAM_32_eg:
3811 ++ case AMDGPU::VTX_READ_PARAM_128_eg:
3812 + case AMDGPU::VTX_READ_GLOBAL_8_eg:
3813 + case AMDGPU::VTX_READ_GLOBAL_32_eg:
3814 -+ case AMDGPU::VTX_READ_GLOBAL_128_eg: {
3815 ++ case AMDGPU::VTX_READ_GLOBAL_128_eg:
3816 ++ case AMDGPU::TEX_VTX_CONSTBUF:
3817 ++ case AMDGPU::TEX_VTX_TEXBUF : {
3818 + uint64_t InstWord01 = getBinaryCodeForInstr(MI, Fixups);
3819 + uint32_t InstWord2 = MI.getOperand(2).getImm(); // Offset
3820 +
3821 @@ -12382,7 +12838,6 @@ index 0000000..dc91924
3822 + SmallVectorImpl<MCFixup> &Fixups,
3823 + raw_ostream &OS) const {
3824 + const MCInstrDesc &MCDesc = MCII.get(MI.getOpcode());
3825 -+ unsigned NumOperands = MI.getNumOperands();
3826 +
3827 + // Emit instruction type
3828 + EmitByte(INSTR_ALU, OS);
3829 @@ -12398,19 +12853,21 @@ index 0000000..dc91924
3830 + InstWord01 |= ISAOpCode << 1;
3831 + }
3832 +
3833 -+ unsigned SrcIdx = 0;
3834 -+ for (unsigned int OpIdx = 1; OpIdx < NumOperands; ++OpIdx) {
3835 -+ if (MI.getOperand(OpIdx).isImm() || MI.getOperand(OpIdx).isFPImm() ||
3836 -+ OpIdx == (unsigned)MCDesc.findFirstPredOperandIdx()) {
3837 -+ continue;
3838 -+ }
3839 -+ EmitSrcISA(MI, OpIdx, InstWord01, OS);
3840 -+ SrcIdx++;
3841 -+ }
3842 ++ unsigned SrcNum = MCDesc.TSFlags & R600_InstFlag::OP3 ? 3 :
3843 ++ MCDesc.TSFlags & R600_InstFlag::OP2 ? 2 : 1;
3844 ++
3845 ++ EmitByte(SrcNum, OS);
3846 ++
3847 ++ const unsigned SrcOps[3][2] = {
3848 ++ {R600Operands::SRC0, R600Operands::SRC0_SEL},
3849 ++ {R600Operands::SRC1, R600Operands::SRC1_SEL},
3850 ++ {R600Operands::SRC2, R600Operands::SRC2_SEL}
3851 ++ };
3852 +
3853 -+ // Emit zeros for unused sources
3854 -+ for ( ; SrcIdx < 3; SrcIdx++) {
3855 -+ EmitNullBytes(SRC_BYTE_COUNT - 6, OS);
3856 ++ for (unsigned SrcIdx = 0; SrcIdx < SrcNum; ++SrcIdx) {
3857 ++ unsigned RegOpIdx = R600Operands::ALUOpTable[SrcNum-1][SrcOps[SrcIdx][0]];
3858 ++ unsigned SelOpIdx = R600Operands::ALUOpTable[SrcNum-1][SrcOps[SrcIdx][1]];
3859 ++ EmitSrcISA(MI, RegOpIdx, SelOpIdx, OS);
3860 + }
3861 +
3862 + Emit(InstWord01, OS);
3863 @@ -12481,34 +12938,37 @@ index 0000000..dc91924
3864 +
3865 +}
3866 +
3867 -+void R600MCCodeEmitter::EmitSrcISA(const MCInst &MI, unsigned OpIdx,
3868 -+ uint64_t &Value, raw_ostream &OS) const {
3869 -+ const MCOperand &MO = MI.getOperand(OpIdx);
3870 ++void R600MCCodeEmitter::EmitSrcISA(const MCInst &MI, unsigned RegOpIdx,
3871 ++ unsigned SelOpIdx, raw_ostream &OS) const {
3872 ++ const MCOperand &RegMO = MI.getOperand(RegOpIdx);
3873 ++ const MCOperand &SelMO = MI.getOperand(SelOpIdx);
3874 ++
3875 + union {
3876 + float f;
3877 + uint32_t i;
3878 + } InlineConstant;
3879 + InlineConstant.i = 0;
3880 -+ // Emit the source select (2 bytes). For GPRs, this is the register index.
3881 -+ // For other potential instruction operands, (e.g. constant registers) the
3882 -+ // value of the source select is defined in the r600isa docs.
3883 -+ if (MO.isReg()) {
3884 -+ unsigned Reg = MO.getReg();
3885 -+ if (AMDGPUMCRegisterClasses[AMDGPU::R600_CReg32RegClassID].contains(Reg)) {
3886 -+ EmitByte(1, OS);
3887 -+ } else {
3888 -+ EmitByte(0, OS);
3889 -+ }
3890 ++ // Emit source type (1 byte) and source select (4 bytes). For GPRs type is 0
3891 ++ // and select is 0 (GPR index is encoded in the instr encoding. For constants
3892 ++ // type is 1 and select is the original const select passed from the driver.
3893 ++ unsigned Reg = RegMO.getReg();
3894 ++ if (Reg == AMDGPU::ALU_CONST) {
3895 ++ EmitByte(1, OS);
3896 ++ uint32_t Sel = SelMO.getImm();
3897 ++ Emit(Sel, OS);
3898 ++ } else {
3899 ++ EmitByte(0, OS);
3900 ++ Emit((uint32_t)0, OS);
3901 ++ }
3902 +
3903 -+ if (Reg == AMDGPU::ALU_LITERAL_X) {
3904 -+ unsigned ImmOpIndex = MI.getNumOperands() - 1;
3905 -+ MCOperand ImmOp = MI.getOperand(ImmOpIndex);
3906 -+ if (ImmOp.isFPImm()) {
3907 -+ InlineConstant.f = ImmOp.getFPImm();
3908 -+ } else {
3909 -+ assert(ImmOp.isImm());
3910 -+ InlineConstant.i = ImmOp.getImm();
3911 -+ }
3912 ++ if (Reg == AMDGPU::ALU_LITERAL_X) {
3913 ++ unsigned ImmOpIndex = MI.getNumOperands() - 1;
3914 ++ MCOperand ImmOp = MI.getOperand(ImmOpIndex);
3915 ++ if (ImmOp.isFPImm()) {
3916 ++ InlineConstant.f = ImmOp.getFPImm();
3917 ++ } else {
3918 ++ assert(ImmOp.isImm());
3919 ++ InlineConstant.i = ImmOp.getImm();
3920 + }
3921 + }
3922 +
3923 @@ -12763,10 +13223,10 @@ index 0000000..dc91924
3924 +#include "AMDGPUGenMCCodeEmitter.inc"
3925 diff --git a/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp b/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp
3926 new file mode 100644
3927 -index 0000000..c47dc99
3928 +index 0000000..6dfbbe8
3929 --- /dev/null
3930 +++ b/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp
3931 -@@ -0,0 +1,298 @@
3932 +@@ -0,0 +1,235 @@
3933 +//===-- SIMCCodeEmitter.cpp - SI Code Emitter -------------------------------===//
3934 +//
3935 +// The LLVM Compiler Infrastructure
3936 @@ -12793,38 +13253,16 @@ index 0000000..c47dc99
3937 +#include "llvm/MC/MCFixup.h"
3938 +#include "llvm/Support/raw_ostream.h"
3939 +
3940 -+#define VGPR_BIT(src_idx) (1ULL << (9 * src_idx - 1))
3941 -+#define SI_INSTR_FLAGS_ENCODING_MASK 0xf
3942 -+
3943 -+// These must be kept in sync with SIInstructions.td and also the
3944 -+// InstrEncodingInfo array in SIInstrInfo.cpp.
3945 -+//
3946 -+// NOTE: This enum is only used to identify the encoding type within LLVM,
3947 -+// the actual encoding type that is part of the instruction format is different
3948 -+namespace SIInstrEncodingType {
3949 -+ enum Encoding {
3950 -+ EXP = 0,
3951 -+ LDS = 1,
3952 -+ MIMG = 2,
3953 -+ MTBUF = 3,
3954 -+ MUBUF = 4,
3955 -+ SMRD = 5,
3956 -+ SOP1 = 6,
3957 -+ SOP2 = 7,
3958 -+ SOPC = 8,
3959 -+ SOPK = 9,
3960 -+ SOPP = 10,
3961 -+ VINTRP = 11,
3962 -+ VOP1 = 12,
3963 -+ VOP2 = 13,
3964 -+ VOP3 = 14,
3965 -+ VOPC = 15
3966 -+ };
3967 -+}
3968 -+
3969 +using namespace llvm;
3970 +
3971 +namespace {
3972 ++
3973 ++/// \brief Helper type used in encoding
3974 ++typedef union {
3975 ++ int32_t I;
3976 ++ float F;
3977 ++} IntFloatUnion;
3978 ++
3979 +class SIMCCodeEmitter : public AMDGPUMCCodeEmitter {
3980 + SIMCCodeEmitter(const SIMCCodeEmitter &); // DO NOT IMPLEMENT
3981 + void operator=(const SIMCCodeEmitter &); // DO NOT IMPLEMENT
3982 @@ -12833,6 +13271,15 @@ index 0000000..c47dc99
3983 + const MCSubtargetInfo &STI;
3984 + MCContext &Ctx;
3985 +
3986 ++ /// \brief Encode a sequence of registers with the correct alignment.
3987 ++ unsigned GPRAlign(const MCInst &MI, unsigned OpNo, unsigned shift) const;
3988 ++
3989 ++ /// \brief Can this operand also contain immediate values?
3990 ++ bool isSrcOperand(const MCInstrDesc &Desc, unsigned OpNo) const;
3991 ++
3992 ++ /// \brief Encode an fp or int literal
3993 ++ uint32_t getLitEncoding(const MCOperand &MO) const;
3994 ++
3995 +public:
3996 + SIMCCodeEmitter(const MCInstrInfo &mcii, const MCRegisterInfo &mri,
3997 + const MCSubtargetInfo &sti, MCContext &ctx)
3998 @@ -12848,11 +13295,6 @@ index 0000000..c47dc99
3999 + virtual uint64_t getMachineOpValue(const MCInst &MI, const MCOperand &MO,
4000 + SmallVectorImpl<MCFixup> &Fixups) const;
4001 +
4002 -+public:
4003 -+
4004 -+ /// \brief Encode a sequence of registers with the correct alignment.
4005 -+ unsigned GPRAlign(const MCInst &MI, unsigned OpNo, unsigned shift) const;
4006 -+
4007 + /// \brief Encoding for when 2 consecutive registers are used
4008 + virtual unsigned GPR2AlignEncode(const MCInst &MI, unsigned OpNo,
4009 + SmallVectorImpl<MCFixup> &Fixup) const;
4010 @@ -12860,73 +13302,142 @@ index 0000000..c47dc99
4011 + /// \brief Encoding for when 4 consectuive registers are used
4012 + virtual unsigned GPR4AlignEncode(const MCInst &MI, unsigned OpNo,
4013 + SmallVectorImpl<MCFixup> &Fixup) const;
4014 ++};
4015 +
4016 -+ /// \brief Encoding for SMRD indexed loads
4017 -+ virtual uint32_t SMRDmemriEncode(const MCInst &MI, unsigned OpNo,
4018 -+ SmallVectorImpl<MCFixup> &Fixup) const;
4019 ++} // End anonymous namespace
4020 ++
4021 ++MCCodeEmitter *llvm::createSIMCCodeEmitter(const MCInstrInfo &MCII,
4022 ++ const MCRegisterInfo &MRI,
4023 ++ const MCSubtargetInfo &STI,
4024 ++ MCContext &Ctx) {
4025 ++ return new SIMCCodeEmitter(MCII, MRI, STI, Ctx);
4026 ++}
4027 +
4028 -+ /// \brief Post-Encoder method for VOP instructions
4029 -+ virtual uint64_t VOPPostEncode(const MCInst &MI, uint64_t Value) const;
4030 ++bool SIMCCodeEmitter::isSrcOperand(const MCInstrDesc &Desc,
4031 ++ unsigned OpNo) const {
4032 +
4033 -+private:
4034 ++ unsigned RegClass = Desc.OpInfo[OpNo].RegClass;
4035 ++ return (AMDGPU::SSrc_32RegClassID == RegClass) ||
4036 ++ (AMDGPU::SSrc_64RegClassID == RegClass) ||
4037 ++ (AMDGPU::VSrc_32RegClassID == RegClass) ||
4038 ++ (AMDGPU::VSrc_64RegClassID == RegClass);
4039 ++}
4040 +
4041 -+ /// \returns this SIInstrEncodingType for this instruction.
4042 -+ unsigned getEncodingType(const MCInst &MI) const;
4043 ++uint32_t SIMCCodeEmitter::getLitEncoding(const MCOperand &MO) const {
4044 +
4045 -+ /// \brief Get then size in bytes of this instructions encoding.
4046 -+ unsigned getEncodingBytes(const MCInst &MI) const;
4047 ++ IntFloatUnion Imm;
4048 ++ if (MO.isImm())
4049 ++ Imm.I = MO.getImm();
4050 ++ else if (MO.isFPImm())
4051 ++ Imm.F = MO.getFPImm();
4052 ++ else
4053 ++ return ~0;
4054 +
4055 -+ /// \returns the hardware encoding for a register
4056 -+ unsigned getRegBinaryCode(unsigned reg) const;
4057 ++ if (Imm.I >= 0 && Imm.I <= 64)
4058 ++ return 128 + Imm.I;
4059 +
4060 -+ /// \brief Generated function that returns the hardware encoding for
4061 -+ /// a register
4062 -+ unsigned getHWRegNum(unsigned reg) const;
4063 ++ if (Imm.I >= -16 && Imm.I <= -1)
4064 ++ return 192 + abs(Imm.I);
4065 +
4066 -+};
4067 ++ if (Imm.F == 0.5f)
4068 ++ return 240;
4069 +
4070 -+} // End anonymous namespace
4071 ++ if (Imm.F == -0.5f)
4072 ++ return 241;
4073 +
4074 -+MCCodeEmitter *llvm::createSIMCCodeEmitter(const MCInstrInfo &MCII,
4075 -+ const MCRegisterInfo &MRI,
4076 -+ const MCSubtargetInfo &STI,
4077 -+ MCContext &Ctx) {
4078 -+ return new SIMCCodeEmitter(MCII, MRI, STI, Ctx);
4079 ++ if (Imm.F == 1.0f)
4080 ++ return 242;
4081 ++
4082 ++ if (Imm.F == -1.0f)
4083 ++ return 243;
4084 ++
4085 ++ if (Imm.F == 2.0f)
4086 ++ return 244;
4087 ++
4088 ++ if (Imm.F == -2.0f)
4089 ++ return 245;
4090 ++
4091 ++ if (Imm.F == 4.0f)
4092 ++ return 246;
4093 ++
4094 ++ if (Imm.F == 4.0f)
4095 ++ return 247;
4096 ++
4097 ++ return 255;
4098 +}
4099 +
4100 +void SIMCCodeEmitter::EncodeInstruction(const MCInst &MI, raw_ostream &OS,
4101 + SmallVectorImpl<MCFixup> &Fixups) const {
4102 ++
4103 + uint64_t Encoding = getBinaryCodeForInstr(MI, Fixups);
4104 -+ unsigned bytes = getEncodingBytes(MI);
4105 ++ const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
4106 ++ unsigned bytes = Desc.getSize();
4107 ++
4108 + for (unsigned i = 0; i < bytes; i++) {
4109 + OS.write((uint8_t) ((Encoding >> (8 * i)) & 0xff));
4110 + }
4111 ++
4112 ++ if (bytes > 4)
4113 ++ return;
4114 ++
4115 ++ // Check for additional literals in SRC0/1/2 (Op 1/2/3)
4116 ++ for (unsigned i = 0, e = MI.getNumOperands(); i < e; ++i) {
4117 ++
4118 ++ // Check if this operand should be encoded as [SV]Src
4119 ++ if (!isSrcOperand(Desc, i))
4120 ++ continue;
4121 ++
4122 ++ // Is this operand a literal immediate?
4123 ++ const MCOperand &Op = MI.getOperand(i);
4124 ++ if (getLitEncoding(Op) != 255)
4125 ++ continue;
4126 ++
4127 ++ // Yes! Encode it
4128 ++ IntFloatUnion Imm;
4129 ++ if (Op.isImm())
4130 ++ Imm.I = Op.getImm();
4131 ++ else
4132 ++ Imm.F = Op.getFPImm();
4133 ++
4134 ++ for (unsigned j = 0; j < 4; j++) {
4135 ++ OS.write((uint8_t) ((Imm.I >> (8 * j)) & 0xff));
4136 ++ }
4137 ++
4138 ++ // Only one literal value allowed
4139 ++ break;
4140 ++ }
4141 +}
4142 +
4143 +uint64_t SIMCCodeEmitter::getMachineOpValue(const MCInst &MI,
4144 + const MCOperand &MO,
4145 + SmallVectorImpl<MCFixup> &Fixups) const {
4146 -+ if (MO.isReg()) {
4147 -+ return getRegBinaryCode(MO.getReg());
4148 -+ } else if (MO.isImm()) {
4149 -+ return MO.getImm();
4150 -+ } else if (MO.isFPImm()) {
4151 -+ // XXX: Not all instructions can use inline literals
4152 -+ // XXX: We should make sure this is a 32-bit constant
4153 -+ union {
4154 -+ float F;
4155 -+ uint32_t I;
4156 -+ } Imm;
4157 -+ Imm.F = MO.getFPImm();
4158 -+ return Imm.I;
4159 -+ } else if (MO.isExpr()) {
4160 ++ if (MO.isReg())
4161 ++ return MRI.getEncodingValue(MO.getReg());
4162 ++
4163 ++ if (MO.isExpr()) {
4164 + const MCExpr *Expr = MO.getExpr();
4165 + MCFixupKind Kind = MCFixupKind(FK_PCRel_4);
4166 + Fixups.push_back(MCFixup::Create(0, Expr, Kind, MI.getLoc()));
4167 + return 0;
4168 -+ } else{
4169 -+ llvm_unreachable("Encoding of this operand type is not supported yet.");
4170 + }
4171 ++
4172 ++ // Figure out the operand number, needed for isSrcOperand check
4173 ++ unsigned OpNo = 0;
4174 ++ for (unsigned e = MI.getNumOperands(); OpNo < e; ++OpNo) {
4175 ++ if (&MO == &MI.getOperand(OpNo))
4176 ++ break;
4177 ++ }
4178 ++
4179 ++ const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
4180 ++ if (isSrcOperand(Desc, OpNo)) {
4181 ++ uint32_t Enc = getLitEncoding(MO);
4182 ++ if (Enc != ~0U && (Enc != 255 || Desc.getSize() == 4))
4183 ++ return Enc;
4184 ++
4185 ++ } else if (MO.isImm())
4186 ++ return MO.getImm();
4187 ++
4188 ++ llvm_unreachable("Encoding of this operand type is not supported yet.");
4189 + return 0;
4190 +}
4191 +
4192 @@ -12936,10 +13447,10 @@ index 0000000..c47dc99
4193 +
4194 +unsigned SIMCCodeEmitter::GPRAlign(const MCInst &MI, unsigned OpNo,
4195 + unsigned shift) const {
4196 -+ unsigned regCode = getRegBinaryCode(MI.getOperand(OpNo).getReg());
4197 -+ return regCode >> shift;
4198 -+ return 0;
4199 ++ unsigned regCode = MRI.getEncodingValue(MI.getOperand(OpNo).getReg());
4200 ++ return (regCode & 0xff) >> shift;
4201 +}
4202 ++
4203 +unsigned SIMCCodeEmitter::GPR2AlignEncode(const MCInst &MI,
4204 + unsigned OpNo ,
4205 + SmallVectorImpl<MCFixup> &Fixup) const {
4206 @@ -12951,120 +13462,6 @@ index 0000000..c47dc99
4207 + SmallVectorImpl<MCFixup> &Fixup) const {
4208 + return GPRAlign(MI, OpNo, 2);
4209 +}
4210 -+
4211 -+#define SMRD_OFFSET_MASK 0xff
4212 -+#define SMRD_IMM_SHIFT 8
4213 -+#define SMRD_SBASE_MASK 0x3f
4214 -+#define SMRD_SBASE_SHIFT 9
4215 -+/// This function is responsibe for encoding the offset
4216 -+/// and the base ptr for SMRD instructions it should return a bit string in
4217 -+/// this format:
4218 -+///
4219 -+/// OFFSET = bits{7-0}
4220 -+/// IMM = bits{8}
4221 -+/// SBASE = bits{14-9}
4222 -+///
4223 -+uint32_t SIMCCodeEmitter::SMRDmemriEncode(const MCInst &MI, unsigned OpNo,
4224 -+ SmallVectorImpl<MCFixup> &Fixup) const {
4225 -+ uint32_t Encoding;
4226 -+
4227 -+ const MCOperand &OffsetOp = MI.getOperand(OpNo + 1);
4228 -+
4229 -+ //XXX: Use this function for SMRD loads with register offsets
4230 -+ assert(OffsetOp.isImm());
4231 -+
4232 -+ Encoding =
4233 -+ (getMachineOpValue(MI, OffsetOp, Fixup) & SMRD_OFFSET_MASK)
4234 -+ | (1 << SMRD_IMM_SHIFT) //XXX If the Offset is a register we shouldn't set this bit
4235 -+ | ((GPR2AlignEncode(MI, OpNo, Fixup) & SMRD_SBASE_MASK) << SMRD_SBASE_SHIFT)
4236 -+ ;
4237 -+
4238 -+ return Encoding;
4239 -+}
4240 -+
4241 -+//===----------------------------------------------------------------------===//
4242 -+// Post Encoder Callbacks
4243 -+//===----------------------------------------------------------------------===//
4244 -+
4245 -+uint64_t SIMCCodeEmitter::VOPPostEncode(const MCInst &MI, uint64_t Value) const{
4246 -+ unsigned encodingType = getEncodingType(MI);
4247 -+ unsigned numSrcOps;
4248 -+ unsigned vgprBitOffset;
4249 -+
4250 -+ if (encodingType == SIInstrEncodingType::VOP3) {
4251 -+ numSrcOps = 3;
4252 -+ vgprBitOffset = 32;
4253 -+ } else {
4254 -+ numSrcOps = 1;
4255 -+ vgprBitOffset = 0;
4256 -+ }
4257 -+
4258 -+ // Add one to skip over the destination reg operand.
4259 -+ for (unsigned opIdx = 1; opIdx < numSrcOps + 1; opIdx++) {
4260 -+ const MCOperand &MO = MI.getOperand(opIdx);
4261 -+ if (MO.isReg()) {
4262 -+ unsigned reg = MI.getOperand(opIdx).getReg();
4263 -+ if (AMDGPUMCRegisterClasses[AMDGPU::VReg_32RegClassID].contains(reg) ||
4264 -+ AMDGPUMCRegisterClasses[AMDGPU::VReg_64RegClassID].contains(reg)) {
4265 -+ Value |= (VGPR_BIT(opIdx)) << vgprBitOffset;
4266 -+ }
4267 -+ } else if (MO.isFPImm()) {
4268 -+ union {
4269 -+ float f;
4270 -+ uint32_t i;
4271 -+ } Imm;
4272 -+ // XXX: Not all instructions can use inline literals
4273 -+ // XXX: We should make sure this is a 32-bit constant
4274 -+ Imm.f = MO.getFPImm();
4275 -+ Value |= ((uint64_t)Imm.i) << 32;
4276 -+ }
4277 -+ }
4278 -+ return Value;
4279 -+}
4280 -+
4281 -+//===----------------------------------------------------------------------===//
4282 -+// Encoding helper functions
4283 -+//===----------------------------------------------------------------------===//
4284 -+
4285 -+unsigned SIMCCodeEmitter::getEncodingType(const MCInst &MI) const {
4286 -+ return MCII.get(MI.getOpcode()).TSFlags & SI_INSTR_FLAGS_ENCODING_MASK;
4287 -+}
4288 -+
4289 -+unsigned SIMCCodeEmitter::getEncodingBytes(const MCInst &MI) const {
4290 -+
4291 -+ // These instructions aren't real instructions with an encoding type, so
4292 -+ // we need to manually specify their size.
4293 -+ switch (MI.getOpcode()) {
4294 -+ default: break;
4295 -+ case AMDGPU::SI_LOAD_LITERAL_I32:
4296 -+ case AMDGPU::SI_LOAD_LITERAL_F32:
4297 -+ return 4;
4298 -+ }
4299 -+
4300 -+ unsigned encoding_type = getEncodingType(MI);
4301 -+ switch (encoding_type) {
4302 -+ case SIInstrEncodingType::EXP:
4303 -+ case SIInstrEncodingType::LDS:
4304 -+ case SIInstrEncodingType::MUBUF:
4305 -+ case SIInstrEncodingType::MTBUF:
4306 -+ case SIInstrEncodingType::MIMG:
4307 -+ case SIInstrEncodingType::VOP3:
4308 -+ return 8;
4309 -+ default:
4310 -+ return 4;
4311 -+ }
4312 -+}
4313 -+
4314 -+
4315 -+unsigned SIMCCodeEmitter::getRegBinaryCode(unsigned reg) const {
4316 -+ switch (reg) {
4317 -+ case AMDGPU::M0: return 124;
4318 -+ case AMDGPU::SREG_LIT_0: return 128;
4319 -+ case AMDGPU::SI_LITERAL_CONSTANT: return 255;
4320 -+ default: return MRI.getEncodingValue(reg);
4321 -+ }
4322 -+}
4323 -+
4324 diff --git a/lib/Target/R600/Makefile b/lib/Target/R600/Makefile
4325 new file mode 100644
4326 index 0000000..1b3ebbe
4327 @@ -13096,10 +13493,10 @@ index 0000000..1b3ebbe
4328 +include $(LEVEL)/Makefile.common
4329 diff --git a/lib/Target/R600/Processors.td b/lib/Target/R600/Processors.td
4330 new file mode 100644
4331 -index 0000000..3dc1ecd
4332 +index 0000000..868810c
4333 --- /dev/null
4334 +++ b/lib/Target/R600/Processors.td
4335 -@@ -0,0 +1,29 @@
4336 +@@ -0,0 +1,30 @@
4337 +//===-- Processors.td - TODO: Add brief description -------===//
4338 +//
4339 +// The LLVM Compiler Infrastructure
4340 @@ -13115,6 +13512,7 @@ index 0000000..3dc1ecd
4341 +
4342 +class Proc<string Name, ProcessorItineraries itin, list<SubtargetFeature> Features>
4343 +: Processor<Name, itin, Features>;
4344 ++def : Proc<"", R600_EG_Itin, [FeatureR600ALUInst]>;
4345 +def : Proc<"r600", R600_EG_Itin, [FeatureR600ALUInst]>;
4346 +def : Proc<"rv710", R600_EG_Itin, []>;
4347 +def : Proc<"rv730", R600_EG_Itin, []>;
4348 @@ -13131,10 +13529,10 @@ index 0000000..3dc1ecd
4349 +
4350 diff --git a/lib/Target/R600/R600Defines.h b/lib/Target/R600/R600Defines.h
4351 new file mode 100644
4352 -index 0000000..7dea8e4
4353 +index 0000000..16cfcf5
4354 --- /dev/null
4355 +++ b/lib/Target/R600/R600Defines.h
4356 -@@ -0,0 +1,79 @@
4357 +@@ -0,0 +1,97 @@
4358 +//===-- R600Defines.h - R600 Helper Macros ----------------------*- C++ -*-===//
4359 +//
4360 +// The LLVM Compiler Infrastructure
4361 @@ -13186,6 +13584,9 @@ index 0000000..7dea8e4
4362 +#define HW_REG_MASK 0x1ff
4363 +#define HW_CHAN_SHIFT 9
4364 +
4365 ++#define GET_REG_CHAN(reg) ((reg) >> HW_CHAN_SHIFT)
4366 ++#define GET_REG_INDEX(reg) ((reg) & HW_REG_MASK)
4367 ++
4368 +namespace R600Operands {
4369 + enum Ops {
4370 + DST,
4371 @@ -13199,27 +13600,42 @@ index 0000000..7dea8e4
4372 + SRC0_NEG,
4373 + SRC0_REL,
4374 + SRC0_ABS,
4375 ++ SRC0_SEL,
4376 + SRC1,
4377 + SRC1_NEG,
4378 + SRC1_REL,
4379 + SRC1_ABS,
4380 ++ SRC1_SEL,
4381 + SRC2,
4382 + SRC2_NEG,
4383 + SRC2_REL,
4384 ++ SRC2_SEL,
4385 + LAST,
4386 + PRED_SEL,
4387 + IMM,
4388 + COUNT
4389 + };
4390 ++
4391 ++ const static int ALUOpTable[3][R600Operands::COUNT] = {
4392 ++// W C S S S S S S S S S S S
4393 ++// R O D L S R R R R S R R R R S R R R L P
4394 ++// D U I M R A R C C C C R C C C C R C C C A R I
4395 ++// S E U T O E M C 0 0 0 0 C 1 1 1 1 C 2 2 2 S E M
4396 ++// T M P E D L P 0 N R A S 1 N R A S 2 N R S T D M
4397 ++ {0,-1,-1, 1, 2, 3, 4, 5, 6, 7, 8, 9,-1,-1,-1,-1,-1,-1,-1,-1,-1,10,11,12},
4398 ++ {0, 1, 2, 3, 4 ,5 ,6 ,7, 8, 9,10,11,12,13,14,15,16,-1,-1,-1,-1,17,18,19},
4399 ++ {0,-1,-1,-1,-1, 1, 2, 3, 4, 5,-1, 6, 7, 8, 9,-1,10,11,12,13,14,15,16,17}
4400 ++ };
4401 ++
4402 +}
4403 +
4404 +#endif // R600DEFINES_H_
4405 diff --git a/lib/Target/R600/R600ExpandSpecialInstrs.cpp b/lib/Target/R600/R600ExpandSpecialInstrs.cpp
4406 new file mode 100644
4407 -index 0000000..b6e62b7
4408 +index 0000000..c00c349
4409 --- /dev/null
4410 +++ b/lib/Target/R600/R600ExpandSpecialInstrs.cpp
4411 -@@ -0,0 +1,334 @@
4412 +@@ -0,0 +1,290 @@
4413 +//===-- R600ExpandSpecialInstrs.cpp - Expand special instructions ---------===//
4414 +//
4415 +// The LLVM Compiler Infrastructure
4416 @@ -13277,118 +13693,6 @@ index 0000000..b6e62b7
4417 + return new R600ExpandSpecialInstrsPass(TM);
4418 +}
4419 +
4420 -+bool R600ExpandSpecialInstrsPass::ExpandInputPerspective(MachineInstr &MI) {
4421 -+ const R600RegisterInfo &TRI = TII->getRegisterInfo();
4422 -+ if (MI.getOpcode() != AMDGPU::input_perspective)
4423 -+ return false;
4424 -+
4425 -+ MachineBasicBlock::iterator I = &MI;
4426 -+ unsigned DstReg = MI.getOperand(0).getReg();
4427 -+ R600MachineFunctionInfo *MFI = MI.getParent()->getParent()
4428 -+ ->getInfo<R600MachineFunctionInfo>();
4429 -+ unsigned IJIndexBase;
4430 -+
4431 -+ // In Evergreen ISA doc section 8.3.2 :
4432 -+ // We need to interpolate XY and ZW in two different instruction groups.
4433 -+ // An INTERP_* must occupy all 4 slots of an instruction group.
4434 -+ // Output of INTERP_XY is written in X,Y slots
4435 -+ // Output of INTERP_ZW is written in Z,W slots
4436 -+ //
4437 -+ // Thus interpolation requires the following sequences :
4438 -+ //
4439 -+ // AnyGPR.x = INTERP_ZW; (Write Masked Out)
4440 -+ // AnyGPR.y = INTERP_ZW; (Write Masked Out)
4441 -+ // DstGPR.z = INTERP_ZW;
4442 -+ // DstGPR.w = INTERP_ZW; (End of first IG)
4443 -+ // DstGPR.x = INTERP_XY;
4444 -+ // DstGPR.y = INTERP_XY;
4445 -+ // AnyGPR.z = INTERP_XY; (Write Masked Out)
4446 -+ // AnyGPR.w = INTERP_XY; (Write Masked Out) (End of second IG)
4447 -+ //
4448 -+ switch (MI.getOperand(1).getImm()) {
4449 -+ case 0:
4450 -+ IJIndexBase = MFI->GetIJPerspectiveIndex();
4451 -+ break;
4452 -+ case 1:
4453 -+ IJIndexBase = MFI->GetIJLinearIndex();
4454 -+ break;
4455 -+ default:
4456 -+ assert(0 && "Unknow ij index");
4457 -+ }
4458 -+
4459 -+ for (unsigned i = 0; i < 8; i++) {
4460 -+ unsigned IJIndex = AMDGPU::R600_TReg32RegClass.getRegister(
4461 -+ 2 * IJIndexBase + ((i + 1) % 2));
4462 -+ unsigned ReadReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
4463 -+ MI.getOperand(2).getImm());
4464 -+
4465 -+
4466 -+ unsigned Sel = AMDGPU::sel_x;
4467 -+ switch (i % 4) {
4468 -+ case 0:Sel = AMDGPU::sel_x;break;
4469 -+ case 1:Sel = AMDGPU::sel_y;break;
4470 -+ case 2:Sel = AMDGPU::sel_z;break;
4471 -+ case 3:Sel = AMDGPU::sel_w;break;
4472 -+ default:break;
4473 -+ }
4474 -+
4475 -+ unsigned Res = TRI.getSubReg(DstReg, Sel);
4476 -+
4477 -+ unsigned Opcode = (i < 4)?AMDGPU::INTERP_ZW:AMDGPU::INTERP_XY;
4478 -+
4479 -+ MachineBasicBlock &MBB = *(MI.getParent());
4480 -+ MachineInstr *NewMI =
4481 -+ TII->buildDefaultInstruction(MBB, I, Opcode, Res, IJIndex, ReadReg);
4482 -+
4483 -+ if (!(i> 1 && i < 6)) {
4484 -+ TII->addFlag(NewMI, 0, MO_FLAG_MASK);
4485 -+ }
4486 -+
4487 -+ if (i % 4 != 3)
4488 -+ TII->addFlag(NewMI, 0, MO_FLAG_NOT_LAST);
4489 -+ }
4490 -+
4491 -+ MI.eraseFromParent();
4492 -+
4493 -+ return true;
4494 -+}
4495 -+
4496 -+bool R600ExpandSpecialInstrsPass::ExpandInputConstant(MachineInstr &MI) {
4497 -+ const R600RegisterInfo &TRI = TII->getRegisterInfo();
4498 -+ if (MI.getOpcode() != AMDGPU::input_constant)
4499 -+ return false;
4500 -+
4501 -+ MachineBasicBlock::iterator I = &MI;
4502 -+ unsigned DstReg = MI.getOperand(0).getReg();
4503 -+
4504 -+ for (unsigned i = 0; i < 4; i++) {
4505 -+ unsigned ReadReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
4506 -+ MI.getOperand(1).getImm());
4507 -+
4508 -+ unsigned Sel = AMDGPU::sel_x;
4509 -+ switch (i % 4) {
4510 -+ case 0:Sel = AMDGPU::sel_x;break;
4511 -+ case 1:Sel = AMDGPU::sel_y;break;
4512 -+ case 2:Sel = AMDGPU::sel_z;break;
4513 -+ case 3:Sel = AMDGPU::sel_w;break;
4514 -+ default:break;
4515 -+ }
4516 -+
4517 -+ unsigned Res = TRI.getSubReg(DstReg, Sel);
4518 -+
4519 -+ MachineBasicBlock &MBB = *(MI.getParent());
4520 -+ MachineInstr *NewMI = TII->buildDefaultInstruction(
4521 -+ MBB, I, AMDGPU::INTERP_LOAD_P0, Res, ReadReg);
4522 -+
4523 -+ if (i % 4 != 3)
4524 -+ TII->addFlag(NewMI, 0, MO_FLAG_NOT_LAST);
4525 -+ }
4526 -+
4527 -+ MI.eraseFromParent();
4528 -+
4529 -+ return true;
4530 -+}
4531 -+
4532 +bool R600ExpandSpecialInstrsPass::runOnMachineFunction(MachineFunction &MF) {
4533 +
4534 + const R600RegisterInfo &TRI = TII->getRegisterInfo();
4535 @@ -13422,7 +13726,7 @@ index 0000000..b6e62b7
4536 + MI.eraseFromParent();
4537 + continue;
4538 + }
4539 -+ case AMDGPU::BREAK:
4540 ++ case AMDGPU::BREAK: {
4541 + MachineInstr *PredSet = TII->buildDefaultInstruction(MBB, I,
4542 + AMDGPU::PRED_SETE_INT,
4543 + AMDGPU::PREDICATE_BIT,
4544 @@ -13436,12 +13740,81 @@ index 0000000..b6e62b7
4545 + .addReg(AMDGPU::PREDICATE_BIT);
4546 + MI.eraseFromParent();
4547 + continue;
4548 -+ }
4549 ++ }
4550 +
4551 -+ if (ExpandInputPerspective(MI))
4552 -+ continue;
4553 -+ if (ExpandInputConstant(MI))
4554 -+ continue;
4555 ++ case AMDGPU::INTERP_PAIR_XY: {
4556 ++ MachineInstr *BMI;
4557 ++ unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
4558 ++ MI.getOperand(2).getImm());
4559 ++
4560 ++ for (unsigned Chan = 0; Chan < 4; ++Chan) {
4561 ++ unsigned DstReg;
4562 ++
4563 ++ if (Chan < 2)
4564 ++ DstReg = MI.getOperand(Chan).getReg();
4565 ++ else
4566 ++ DstReg = Chan == 2 ? AMDGPU::T0_Z : AMDGPU::T0_W;
4567 ++
4568 ++ BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_XY,
4569 ++ DstReg, MI.getOperand(3 + (Chan % 2)).getReg(), PReg);
4570 ++
4571 ++ BMI->setIsInsideBundle(Chan > 0);
4572 ++ if (Chan >= 2)
4573 ++ TII->addFlag(BMI, 0, MO_FLAG_MASK);
4574 ++ if (Chan != 3)
4575 ++ TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST);
4576 ++ }
4577 ++
4578 ++ MI.eraseFromParent();
4579 ++ continue;
4580 ++ }
4581 ++
4582 ++ case AMDGPU::INTERP_PAIR_ZW: {
4583 ++ MachineInstr *BMI;
4584 ++ unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
4585 ++ MI.getOperand(2).getImm());
4586 ++
4587 ++ for (unsigned Chan = 0; Chan < 4; ++Chan) {
4588 ++ unsigned DstReg;
4589 ++
4590 ++ if (Chan < 2)
4591 ++ DstReg = Chan == 0 ? AMDGPU::T0_X : AMDGPU::T0_Y;
4592 ++ else
4593 ++ DstReg = MI.getOperand(Chan-2).getReg();
4594 ++
4595 ++ BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_ZW,
4596 ++ DstReg, MI.getOperand(3 + (Chan % 2)).getReg(), PReg);
4597 ++
4598 ++ BMI->setIsInsideBundle(Chan > 0);
4599 ++ if (Chan < 2)
4600 ++ TII->addFlag(BMI, 0, MO_FLAG_MASK);
4601 ++ if (Chan != 3)
4602 ++ TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST);
4603 ++ }
4604 ++
4605 ++ MI.eraseFromParent();
4606 ++ continue;
4607 ++ }
4608 ++
4609 ++ case AMDGPU::INTERP_VEC_LOAD: {
4610 ++ const R600RegisterInfo &TRI = TII->getRegisterInfo();
4611 ++ MachineInstr *BMI;
4612 ++ unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
4613 ++ MI.getOperand(1).getImm());
4614 ++ unsigned DstReg = MI.getOperand(0).getReg();
4615 ++
4616 ++ for (unsigned Chan = 0; Chan < 4; ++Chan) {
4617 ++ BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_LOAD_P0,
4618 ++ TRI.getSubReg(DstReg, TRI.getSubRegFromChannel(Chan)), PReg);
4619 ++ BMI->setIsInsideBundle(Chan > 0);
4620 ++ if (Chan != 3)
4621 ++ TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST);
4622 ++ }
4623 ++
4624 ++ MI.eraseFromParent();
4625 ++ continue;
4626 ++ }
4627 ++ }
4628 +
4629 + bool IsReduction = TII->isReductionOp(MI.getOpcode());
4630 + bool IsVector = TII->isVector(MI);
4631 @@ -13540,8 +13913,7 @@ index 0000000..b6e62b7
4632 + MachineInstr *NewMI =
4633 + TII->buildDefaultInstruction(MBB, I, Opcode, DstReg, Src0, Src1);
4634 +
4635 -+ if (Chan != 0)
4636 -+ NewMI->bundleWithPred();
4637 ++ NewMI->setIsInsideBundle(Chan != 0);
4638 + if (Mask) {
4639 + TII->addFlag(NewMI, 0, MO_FLAG_MASK);
4640 + }
4641 @@ -13556,10 +13928,10 @@ index 0000000..b6e62b7
4642 +}
4643 diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp
4644 new file mode 100644
4645 -index 0000000..d6b9d90
4646 +index 0000000..9c38522
4647 --- /dev/null
4648 +++ b/lib/Target/R600/R600ISelLowering.cpp
4649 -@@ -0,0 +1,909 @@
4650 +@@ -0,0 +1,1195 @@
4651 +//===-- R600ISelLowering.cpp - R600 DAG Lowering Implementation -----------===//
4652 +//
4653 +// The LLVM Compiler Infrastructure
4654 @@ -13580,6 +13952,7 @@ index 0000000..d6b9d90
4655 +#include "R600MachineFunctionInfo.h"
4656 +#include "llvm/Argument.h"
4657 +#include "llvm/Function.h"
4658 ++#include "llvm/CodeGen/MachineFrameInfo.h"
4659 +#include "llvm/CodeGen/MachineInstrBuilder.h"
4660 +#include "llvm/CodeGen/MachineRegisterInfo.h"
4661 +#include "llvm/CodeGen/SelectionDAG.h"
4662 @@ -13633,10 +14006,27 @@ index 0000000..d6b9d90
4663 + setOperationAction(ISD::SELECT, MVT::i32, Custom);
4664 + setOperationAction(ISD::SELECT, MVT::f32, Custom);
4665 +
4666 ++ // Legalize loads and stores to the private address space.
4667 ++ setOperationAction(ISD::LOAD, MVT::i32, Custom);
4668 ++ setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
4669 ++ setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
4670 ++ setLoadExtAction(ISD::EXTLOAD, MVT::v4i8, Custom);
4671 ++ setLoadExtAction(ISD::EXTLOAD, MVT::i8, Custom);
4672 ++ setLoadExtAction(ISD::ZEXTLOAD, MVT::i8, Custom);
4673 ++ setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i8, Custom);
4674 ++ setOperationAction(ISD::STORE, MVT::i8, Custom);
4675 + setOperationAction(ISD::STORE, MVT::i32, Custom);
4676 ++ setOperationAction(ISD::STORE, MVT::v2i32, Custom);
4677 + setOperationAction(ISD::STORE, MVT::v4i32, Custom);
4678 +
4679 ++ setOperationAction(ISD::LOAD, MVT::i32, Custom);
4680 ++ setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
4681 ++ setOperationAction(ISD::FrameIndex, MVT::i32, Custom);
4682 ++
4683 + setTargetDAGCombine(ISD::FP_ROUND);
4684 ++ setTargetDAGCombine(ISD::FP_TO_SINT);
4685 ++ setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);
4686 ++ setTargetDAGCombine(ISD::SELECT_CC);
4687 +
4688 + setSchedulingPreference(Sched::VLIW);
4689 +}
4690 @@ -13677,15 +14067,6 @@ index 0000000..d6b9d90
4691 + break;
4692 + }
4693 +
4694 -+ case AMDGPU::R600_LOAD_CONST: {
4695 -+ int64_t RegIndex = MI->getOperand(1).getImm();
4696 -+ unsigned ConstantReg = AMDGPU::R600_CReg32RegClass.getRegister(RegIndex);
4697 -+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::COPY))
4698 -+ .addOperand(MI->getOperand(0))
4699 -+ .addReg(ConstantReg);
4700 -+ break;
4701 -+ }
4702 -+
4703 + case AMDGPU::MASK_WRITE: {
4704 + unsigned maskedRegister = MI->getOperand(0).getReg();
4705 + assert(TargetRegisterInfo::isVirtualRegister(maskedRegister));
4706 @@ -13716,18 +14097,6 @@ index 0000000..d6b9d90
4707 + break;
4708 + }
4709 +
4710 -+ case AMDGPU::RESERVE_REG: {
4711 -+ R600MachineFunctionInfo * MFI = MF->getInfo<R600MachineFunctionInfo>();
4712 -+ int64_t ReservedIndex = MI->getOperand(0).getImm();
4713 -+ unsigned ReservedReg =
4714 -+ AMDGPU::R600_TReg32RegClass.getRegister(ReservedIndex);
4715 -+ MFI->ReservedRegs.push_back(ReservedReg);
4716 -+ unsigned SuperReg =
4717 -+ AMDGPU::R600_Reg128RegClass.getRegister(ReservedIndex / 4);
4718 -+ MFI->ReservedRegs.push_back(SuperReg);
4719 -+ break;
4720 -+ }
4721 -+
4722 + case AMDGPU::TXD: {
4723 + unsigned T0 = MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass);
4724 + unsigned T1 = MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass);
4725 @@ -13812,33 +14181,26 @@ index 0000000..d6b9d90
4726 + break;
4727 + }
4728 +
4729 -+ case AMDGPU::input_perspective: {
4730 -+ R600MachineFunctionInfo *MFI = MF->getInfo<R600MachineFunctionInfo>();
4731 -+
4732 -+ // XXX Be more fine about register reservation
4733 -+ for (unsigned i = 0; i < 4; i ++) {
4734 -+ unsigned ReservedReg = AMDGPU::R600_TReg32RegClass.getRegister(i);
4735 -+ MFI->ReservedRegs.push_back(ReservedReg);
4736 -+ }
4737 -+
4738 -+ switch (MI->getOperand(1).getImm()) {
4739 -+ case 0:// Perspective
4740 -+ MFI->HasPerspectiveInterpolation = true;
4741 -+ break;
4742 -+ case 1:// Linear
4743 -+ MFI->HasLinearInterpolation = true;
4744 -+ break;
4745 -+ default:
4746 -+ assert(0 && "Unknow ij index");
4747 -+ }
4748 -+
4749 -+ return BB;
4750 -+ }
4751 -+
4752 + case AMDGPU::EG_ExportSwz:
4753 + case AMDGPU::R600_ExportSwz: {
4754 ++ // Instruction is left unmodified if its not the last one of its type
4755 ++ bool isLastInstructionOfItsType = true;
4756 ++ unsigned InstExportType = MI->getOperand(1).getImm();
4757 ++ for (MachineBasicBlock::iterator NextExportInst = llvm::next(I),
4758 ++ EndBlock = BB->end(); NextExportInst != EndBlock;
4759 ++ NextExportInst = llvm::next(NextExportInst)) {
4760 ++ if (NextExportInst->getOpcode() == AMDGPU::EG_ExportSwz ||
4761 ++ NextExportInst->getOpcode() == AMDGPU::R600_ExportSwz) {
4762 ++ unsigned CurrentInstExportType = NextExportInst->getOperand(1)
4763 ++ .getImm();
4764 ++ if (CurrentInstExportType == InstExportType) {
4765 ++ isLastInstructionOfItsType = false;
4766 ++ break;
4767 ++ }
4768 ++ }
4769 ++ }
4770 + bool EOP = (llvm::next(I)->getOpcode() == AMDGPU::RETURN)? 1 : 0;
4771 -+ if (!EOP)
4772 ++ if (!EOP && !isLastInstructionOfItsType)
4773 + return BB;
4774 + unsigned CfInst = (MI->getOpcode() == AMDGPU::EG_ExportSwz)? 84 : 40;
4775 + BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI->getOpcode()))
4776 @@ -13850,7 +14212,7 @@ index 0000000..d6b9d90
4777 + .addOperand(MI->getOperand(5))
4778 + .addOperand(MI->getOperand(6))
4779 + .addImm(CfInst)
4780 -+ .addImm(1);
4781 ++ .addImm(EOP);
4782 + break;
4783 + }
4784 + }
4785 @@ -13926,7 +14288,9 @@ index 0000000..d6b9d90
4786 + case ISD::SELECT: return LowerSELECT(Op, DAG);
4787 + case ISD::SETCC: return LowerSETCC(Op, DAG);
4788 + case ISD::STORE: return LowerSTORE(Op, DAG);
4789 ++ case ISD::LOAD: return LowerLOAD(Op, DAG);
4790 + case ISD::FPOW: return LowerFPOW(Op, DAG);
4791 ++ case ISD::FrameIndex: return LowerFrameIndex(Op, DAG);
4792 + case ISD::INTRINSIC_VOID: {
4793 + SDValue Chain = Op.getOperand(0);
4794 + unsigned IntrinsicID =
4795 @@ -13953,39 +14317,7 @@ index 0000000..d6b9d90
4796 + Chain);
4797 +
4798 + }
4799 -+ case AMDGPUIntrinsic::R600_store_stream_output : {
4800 -+ MachineFunction &MF = DAG.getMachineFunction();
4801 -+ R600MachineFunctionInfo *MFI = MF.getInfo<R600MachineFunctionInfo>();
4802 -+ int64_t RegIndex = cast<ConstantSDNode>(Op.getOperand(3))->getZExtValue();
4803 -+ int64_t BufIndex = cast<ConstantSDNode>(Op.getOperand(4))->getZExtValue();
4804 -+
4805 -+ SDNode **OutputsMap = MFI->StreamOutputs[BufIndex];
4806 -+ unsigned Inst;
4807 -+ switch (cast<ConstantSDNode>(Op.getOperand(4))->getZExtValue() ) {
4808 -+ // STREAM3
4809 -+ case 3:
4810 -+ Inst = 4;
4811 -+ break;
4812 -+ // STREAM2
4813 -+ case 2:
4814 -+ Inst = 3;
4815 -+ break;
4816 -+ // STREAM1
4817 -+ case 1:
4818 -+ Inst = 2;
4819 -+ break;
4820 -+ // STREAM0
4821 -+ case 0:
4822 -+ Inst = 1;
4823 -+ break;
4824 -+ default:
4825 -+ assert(0 && "Wrong buffer id for stream outputs !");
4826 -+ }
4827 +
4828 -+ return InsertScalarToRegisterExport(DAG, Op.getDebugLoc(), OutputsMap,
4829 -+ RegIndex / 4, RegIndex % 4, Inst, 0, Op.getOperand(2),
4830 -+ Chain);
4831 -+ }
4832 + // default for switch(IntrinsicID)
4833 + default: break;
4834 + }
4835 @@ -14004,38 +14336,35 @@ index 0000000..d6b9d90
4836 + unsigned Reg = AMDGPU::R600_TReg32RegClass.getRegister(RegIndex);
4837 + return CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass, Reg, VT);
4838 + }
4839 -+ case AMDGPUIntrinsic::R600_load_input_perspective: {
4840 -+ int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
4841 -+ if (slot < 0)
4842 -+ return DAG.getUNDEF(MVT::f32);
4843 -+ SDValue FullVector = DAG.getNode(
4844 -+ AMDGPUISD::INTERP,
4845 -+ DL, MVT::v4f32,
4846 -+ DAG.getConstant(0, MVT::i32), DAG.getConstant(slot / 4 , MVT::i32));
4847 -+ return DAG.getNode(ISD::EXTRACT_VECTOR_ELT,
4848 -+ DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32));
4849 -+ }
4850 -+ case AMDGPUIntrinsic::R600_load_input_linear: {
4851 -+ int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
4852 -+ if (slot < 0)
4853 -+ return DAG.getUNDEF(MVT::f32);
4854 -+ SDValue FullVector = DAG.getNode(
4855 -+ AMDGPUISD::INTERP,
4856 -+ DL, MVT::v4f32,
4857 -+ DAG.getConstant(1, MVT::i32), DAG.getConstant(slot / 4 , MVT::i32));
4858 -+ return DAG.getNode(ISD::EXTRACT_VECTOR_ELT,
4859 -+ DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32));
4860 -+ }
4861 -+ case AMDGPUIntrinsic::R600_load_input_constant: {
4862 ++
4863 ++ case AMDGPUIntrinsic::R600_interp_input: {
4864 + int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
4865 -+ if (slot < 0)
4866 -+ return DAG.getUNDEF(MVT::f32);
4867 -+ SDValue FullVector = DAG.getNode(
4868 -+ AMDGPUISD::INTERP_P0,
4869 -+ DL, MVT::v4f32,
4870 -+ DAG.getConstant(slot / 4 , MVT::i32));
4871 -+ return DAG.getNode(ISD::EXTRACT_VECTOR_ELT,
4872 -+ DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32));
4873 ++ int ijb = cast<ConstantSDNode>(Op.getOperand(2))->getSExtValue();
4874 ++ MachineSDNode *interp;
4875 ++ if (ijb < 0) {
4876 ++ interp = DAG.getMachineNode(AMDGPU::INTERP_VEC_LOAD, DL,
4877 ++ MVT::v4f32, DAG.getTargetConstant(slot / 4 , MVT::i32));
4878 ++ return DAG.getTargetExtractSubreg(
4879 ++ TII->getRegisterInfo().getSubRegFromChannel(slot % 4),
4880 ++ DL, MVT::f32, SDValue(interp, 0));
4881 ++ }
4882 ++
4883 ++ if (slot % 4 < 2)
4884 ++ interp = DAG.getMachineNode(AMDGPU::INTERP_PAIR_XY, DL,
4885 ++ MVT::f32, MVT::f32, DAG.getTargetConstant(slot / 4 , MVT::i32),
4886 ++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
4887 ++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb + 1), MVT::f32),
4888 ++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
4889 ++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb), MVT::f32));
4890 ++ else
4891 ++ interp = DAG.getMachineNode(AMDGPU::INTERP_PAIR_ZW, DL,
4892 ++ MVT::f32, MVT::f32, DAG.getTargetConstant(slot / 4 , MVT::i32),
4893 ++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
4894 ++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb + 1), MVT::f32),
4895 ++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
4896 ++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb), MVT::f32));
4897 ++
4898 ++ return SDValue(interp, slot % 2);
4899 + }
4900 +
4901 + case r600_read_ngroups_x:
4902 @@ -14089,6 +14418,20 @@ index 0000000..d6b9d90
4903 + switch (N->getOpcode()) {
4904 + default: return;
4905 + case ISD::FP_TO_UINT: Results.push_back(LowerFPTOUINT(N->getOperand(0), DAG));
4906 ++ return;
4907 ++ case ISD::LOAD: {
4908 ++ SDNode *Node = LowerLOAD(SDValue(N, 0), DAG).getNode();
4909 ++ Results.push_back(SDValue(Node, 0));
4910 ++ Results.push_back(SDValue(Node, 1));
4911 ++ // XXX: LLVM seems not to replace Chain Value inside CustomWidenLowerNode
4912 ++ // function
4913 ++ DAG.ReplaceAllUsesOfValueWith(SDValue(N,1), SDValue(Node, 1));
4914 ++ return;
4915 ++ }
4916 ++ case ISD::STORE:
4917 ++ SDNode *Node = LowerSTORE(SDValue(N, 0), DAG).getNode();
4918 ++ Results.push_back(SDValue(Node, 0));
4919 ++ return;
4920 + }
4921 +}
4922 +
4923 @@ -14156,6 +14499,20 @@ index 0000000..d6b9d90
4924 + false, false, false, 0);
4925 +}
4926 +
4927 ++SDValue R600TargetLowering::LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const {
4928 ++
4929 ++ MachineFunction &MF = DAG.getMachineFunction();
4930 ++ const AMDGPUFrameLowering *TFL =
4931 ++ static_cast<const AMDGPUFrameLowering*>(getTargetMachine().getFrameLowering());
4932 ++
4933 ++ FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Op);
4934 ++ assert(FIN);
4935 ++
4936 ++ unsigned FrameIndex = FIN->getIndex();
4937 ++ unsigned Offset = TFL->getFrameIndexOffset(MF, FrameIndex);
4938 ++ return DAG.getConstant(Offset * 4 * TFL->getStackWidth(MF), MVT::i32);
4939 ++}
4940 ++
4941 +SDValue R600TargetLowering::LowerROTL(SDValue Op, SelectionDAG &DAG) const {
4942 + DebugLoc DL = Op.getDebugLoc();
4943 + EVT VT = Op.getValueType();
4944 @@ -14242,9 +14599,12 @@ index 0000000..d6b9d90
4945 + }
4946 +
4947 + // Try to lower to a SET* instruction:
4948 -+ // We need all the operands of SELECT_CC to have the same value type, so if
4949 -+ // necessary we need to change True and False to be the same type as LHS and
4950 -+ // RHS, and then convert the result of the select_cc back to the correct type.
4951 ++ //
4952 ++ // CompareVT == MVT::f32 and VT == MVT::i32 is supported by the hardware,
4953 ++ // but for the other case where CompareVT != VT, all operands of
4954 ++ // SELECT_CC need to have the same value type, so we need to change True and
4955 ++ // False to be the same type as LHS and RHS, and then convert the result of
4956 ++ // the select_cc back to the correct type.
4957 +
4958 + // Move hardware True/False values to the correct operand.
4959 + if (isHWTrueValue(False) && isHWFalseValue(True)) {
4960 @@ -14254,32 +14614,17 @@ index 0000000..d6b9d90
4961 + }
4962 +
4963 + if (isHWTrueValue(True) && isHWFalseValue(False)) {
4964 -+ if (CompareVT != VT) {
4965 -+ if (VT == MVT::f32 && CompareVT == MVT::i32) {
4966 -+ SDValue Boolean = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
4967 -+ LHS, RHS,
4968 -+ DAG.getConstant(-1, MVT::i32),
4969 -+ DAG.getConstant(0, MVT::i32),
4970 -+ CC);
4971 -+ // Convert integer values of true (-1) and false (0) to fp values of
4972 -+ // true (1.0f) and false (0.0f).
4973 -+ SDValue LSB = DAG.getNode(ISD::AND, DL, MVT::i32, Boolean,
4974 -+ DAG.getConstant(1, MVT::i32));
4975 -+ return DAG.getNode(ISD::UINT_TO_FP, DL, VT, LSB);
4976 -+ } else if (VT == MVT::i32 && CompareVT == MVT::f32) {
4977 -+ SDValue BoolAsFlt = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
4978 -+ LHS, RHS,
4979 -+ DAG.getConstantFP(1.0f, MVT::f32),
4980 -+ DAG.getConstantFP(0.0f, MVT::f32),
4981 -+ CC);
4982 -+ // Convert fp values of true (1.0f) and false (0.0f) to integer values
4983 -+ // of true (-1) and false (0).
4984 -+ SDValue Neg = DAG.getNode(ISD::FNEG, DL, MVT::f32, BoolAsFlt);
4985 -+ return DAG.getNode(ISD::FP_TO_SINT, DL, VT, Neg);
4986 -+ } else {
4987 -+ // I don't think there will be any other type pairings.
4988 -+ assert(!"Unhandled operand type parings in SELECT_CC");
4989 -+ }
4990 ++ if (CompareVT != VT && VT == MVT::f32 && CompareVT == MVT::i32) {
4991 ++ SDValue Boolean = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
4992 ++ LHS, RHS,
4993 ++ DAG.getConstant(-1, MVT::i32),
4994 ++ DAG.getConstant(0, MVT::i32),
4995 ++ CC);
4996 ++ // Convert integer values of true (-1) and false (0) to fp values of
4997 ++ // true (1.0f) and false (0.0f).
4998 ++ SDValue LSB = DAG.getNode(ISD::AND, DL, MVT::i32, Boolean,
4999 ++ DAG.getConstant(1, MVT::i32));
5000 ++ return DAG.getNode(ISD::UINT_TO_FP, DL, VT, LSB);
5001 + } else {
5002 + // This SELECT_CC is already legal.
5003 + return DAG.getNode(ISD::SELECT_CC, DL, VT, LHS, RHS, True, False, CC);
5004 @@ -14370,6 +14715,61 @@ index 0000000..d6b9d90
5005 + return Cond;
5006 +}
5007 +
5008 ++/// LLVM generates byte-addresed pointers. For indirect addressing, we need to
5009 ++/// convert these pointers to a register index. Each register holds
5010 ++/// 16 bytes, (4 x 32bit sub-register), but we need to take into account the
5011 ++/// \p StackWidth, which tells us how many of the 4 sub-registrers will be used
5012 ++/// for indirect addressing.
5013 ++SDValue R600TargetLowering::stackPtrToRegIndex(SDValue Ptr,
5014 ++ unsigned StackWidth,
5015 ++ SelectionDAG &DAG) const {
5016 ++ unsigned SRLPad;
5017 ++ switch(StackWidth) {
5018 ++ case 1:
5019 ++ SRLPad = 2;
5020 ++ break;
5021 ++ case 2:
5022 ++ SRLPad = 3;
5023 ++ break;
5024 ++ case 4:
5025 ++ SRLPad = 4;
5026 ++ break;
5027 ++ default: llvm_unreachable("Invalid stack width");
5028 ++ }
5029 ++
5030 ++ return DAG.getNode(ISD::SRL, Ptr.getDebugLoc(), Ptr.getValueType(), Ptr,
5031 ++ DAG.getConstant(SRLPad, MVT::i32));
5032 ++}
5033 ++
5034 ++void R600TargetLowering::getStackAddress(unsigned StackWidth,
5035 ++ unsigned ElemIdx,
5036 ++ unsigned &Channel,
5037 ++ unsigned &PtrIncr) const {
5038 ++ switch (StackWidth) {
5039 ++ default:
5040 ++ case 1:
5041 ++ Channel = 0;
5042 ++ if (ElemIdx > 0) {
5043 ++ PtrIncr = 1;
5044 ++ } else {
5045 ++ PtrIncr = 0;
5046 ++ }
5047 ++ break;
5048 ++ case 2:
5049 ++ Channel = ElemIdx % 2;
5050 ++ if (ElemIdx == 2) {
5051 ++ PtrIncr = 1;
5052 ++ } else {
5053 ++ PtrIncr = 0;
5054 ++ }
5055 ++ break;
5056 ++ case 4:
5057 ++ Channel = ElemIdx;
5058 ++ PtrIncr = 0;
5059 ++ break;
5060 ++ }
5061 ++}
5062 ++
5063 +SDValue R600TargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {
5064 + DebugLoc DL = Op.getDebugLoc();
5065 + StoreSDNode *StoreNode = cast<StoreSDNode>(Op);
5066 @@ -14391,23 +14791,202 @@ index 0000000..d6b9d90
5067 + }
5068 + return Chain;
5069 + }
5070 -+ return SDValue();
5071 -+}
5072 +
5073 ++ EVT ValueVT = Value.getValueType();
5074 +
5075 -+SDValue R600TargetLowering::LowerFPOW(SDValue Op,
5076 -+ SelectionDAG &DAG) const {
5077 -+ DebugLoc DL = Op.getDebugLoc();
5078 -+ EVT VT = Op.getValueType();
5079 -+ SDValue LogBase = DAG.getNode(ISD::FLOG2, DL, VT, Op.getOperand(0));
5080 -+ SDValue MulLogBase = DAG.getNode(ISD::FMUL, DL, VT, Op.getOperand(1), LogBase);
5081 -+ return DAG.getNode(ISD::FEXP2, DL, VT, MulLogBase);
5082 ++ if (StoreNode->getAddressSpace() != AMDGPUAS::PRIVATE_ADDRESS) {
5083 ++ return SDValue();
5084 ++ }
5085 ++
5086 ++ // Lowering for indirect addressing
5087 ++
5088 ++ const MachineFunction &MF = DAG.getMachineFunction();
5089 ++ const AMDGPUFrameLowering *TFL = static_cast<const AMDGPUFrameLowering*>(
5090 ++ getTargetMachine().getFrameLowering());
5091 ++ unsigned StackWidth = TFL->getStackWidth(MF);
5092 ++
5093 ++ Ptr = stackPtrToRegIndex(Ptr, StackWidth, DAG);
5094 ++
5095 ++ if (ValueVT.isVector()) {
5096 ++ unsigned NumElemVT = ValueVT.getVectorNumElements();
5097 ++ EVT ElemVT = ValueVT.getVectorElementType();
5098 ++ SDValue Stores[4];
5099 ++
5100 ++ assert(NumElemVT >= StackWidth && "Stack width cannot be greater than "
5101 ++ "vector width in load");
5102 ++
5103 ++ for (unsigned i = 0; i < NumElemVT; ++i) {
5104 ++ unsigned Channel, PtrIncr;
5105 ++ getStackAddress(StackWidth, i, Channel, PtrIncr);
5106 ++ Ptr = DAG.getNode(ISD::ADD, DL, MVT::i32, Ptr,
5107 ++ DAG.getConstant(PtrIncr, MVT::i32));
5108 ++ SDValue Elem = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ElemVT,
5109 ++ Value, DAG.getConstant(i, MVT::i32));
5110 ++
5111 ++ Stores[i] = DAG.getNode(AMDGPUISD::REGISTER_STORE, DL, MVT::Other,
5112 ++ Chain, Elem, Ptr,
5113 ++ DAG.getTargetConstant(Channel, MVT::i32));
5114 ++ }
5115 ++ Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Stores, NumElemVT);
5116 ++ } else {
5117 ++ if (ValueVT == MVT::i8) {
5118 ++ Value = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Value);
5119 ++ }
5120 ++ Chain = DAG.getNode(AMDGPUISD::REGISTER_STORE, DL, MVT::Other, Chain, Value, Ptr,
5121 ++ DAG.getTargetConstant(0, MVT::i32)); // Channel
5122 ++ }
5123 ++
5124 ++ return Chain;
5125 +}
5126 +
5127 -+/// XXX Only kernel functions are supported, so we can assume for now that
5128 -+/// every function is a kernel function, but in the future we should use
5129 -+/// separate calling conventions for kernel and non-kernel functions.
5130 -+SDValue R600TargetLowering::LowerFormalArguments(
5131 ++// return (512 + (kc_bank << 12)
5132 ++static int
5133 ++ConstantAddressBlock(unsigned AddressSpace) {
5134 ++ switch (AddressSpace) {
5135 ++ case AMDGPUAS::CONSTANT_BUFFER_0:
5136 ++ return 512;
5137 ++ case AMDGPUAS::CONSTANT_BUFFER_1:
5138 ++ return 512 + 4096;
5139 ++ case AMDGPUAS::CONSTANT_BUFFER_2:
5140 ++ return 512 + 4096 * 2;
5141 ++ case AMDGPUAS::CONSTANT_BUFFER_3:
5142 ++ return 512 + 4096 * 3;
5143 ++ case AMDGPUAS::CONSTANT_BUFFER_4:
5144 ++ return 512 + 4096 * 4;
5145 ++ case AMDGPUAS::CONSTANT_BUFFER_5:
5146 ++ return 512 + 4096 * 5;
5147 ++ case AMDGPUAS::CONSTANT_BUFFER_6:
5148 ++ return 512 + 4096 * 6;
5149 ++ case AMDGPUAS::CONSTANT_BUFFER_7:
5150 ++ return 512 + 4096 * 7;
5151 ++ case AMDGPUAS::CONSTANT_BUFFER_8:
5152 ++ return 512 + 4096 * 8;
5153 ++ case AMDGPUAS::CONSTANT_BUFFER_9:
5154 ++ return 512 + 4096 * 9;
5155 ++ case AMDGPUAS::CONSTANT_BUFFER_10:
5156 ++ return 512 + 4096 * 10;
5157 ++ case AMDGPUAS::CONSTANT_BUFFER_11:
5158 ++ return 512 + 4096 * 11;
5159 ++ case AMDGPUAS::CONSTANT_BUFFER_12:
5160 ++ return 512 + 4096 * 12;
5161 ++ case AMDGPUAS::CONSTANT_BUFFER_13:
5162 ++ return 512 + 4096 * 13;
5163 ++ case AMDGPUAS::CONSTANT_BUFFER_14:
5164 ++ return 512 + 4096 * 14;
5165 ++ case AMDGPUAS::CONSTANT_BUFFER_15:
5166 ++ return 512 + 4096 * 15;
5167 ++ default:
5168 ++ return -1;
5169 ++ }
5170 ++}
5171 ++
5172 ++SDValue R600TargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const
5173 ++{
5174 ++ EVT VT = Op.getValueType();
5175 ++ DebugLoc DL = Op.getDebugLoc();
5176 ++ LoadSDNode *LoadNode = cast<LoadSDNode>(Op);
5177 ++ SDValue Chain = Op.getOperand(0);
5178 ++ SDValue Ptr = Op.getOperand(1);
5179 ++ SDValue LoweredLoad;
5180 ++
5181 ++ int ConstantBlock = ConstantAddressBlock(LoadNode->getAddressSpace());
5182 ++ if (ConstantBlock > -1) {
5183 ++ SDValue Result;
5184 ++ if (dyn_cast<ConstantExpr>(LoadNode->getSrcValue()) ||
5185 ++ dyn_cast<Constant>(LoadNode->getSrcValue())) {
5186 ++ SDValue Slots[4];
5187 ++ for (unsigned i = 0; i < 4; i++) {
5188 ++ // We want Const position encoded with the following formula :
5189 ++ // (((512 + (kc_bank << 12) + const_index) << 2) + chan)
5190 ++ // const_index is Ptr computed by llvm using an alignment of 16.
5191 ++ // Thus we add (((512 + (kc_bank << 12)) + chan ) * 4 here and
5192 ++ // then div by 4 at the ISel step
5193 ++ SDValue NewPtr = DAG.getNode(ISD::ADD, DL, Ptr.getValueType(), Ptr,
5194 ++ DAG.getConstant(4 * i + ConstantBlock * 16, MVT::i32));
5195 ++ Slots[i] = DAG.getNode(AMDGPUISD::CONST_ADDRESS, DL, MVT::i32, NewPtr);
5196 ++ }
5197 ++ Result = DAG.getNode(ISD::BUILD_VECTOR, DL, MVT::v4i32, Slots, 4);
5198 ++ } else {
5199 ++ // non constant ptr cant be folded, keeps it as a v4f32 load
5200 ++ Result = DAG.getNode(AMDGPUISD::CONST_ADDRESS, DL, MVT::v4i32,
5201 ++ DAG.getNode(ISD::SRL, DL, MVT::i32, Ptr, DAG.getConstant(4, MVT::i32))
5202 ++ );
5203 ++ }
5204 ++
5205 ++ if (!VT.isVector()) {
5206 ++ Result = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i32, Result,
5207 ++ DAG.getConstant(0, MVT::i32));
5208 ++ }
5209 ++
5210 ++ SDValue MergedValues[2] = {
5211 ++ Result,
5212 ++ Chain
5213 ++ };
5214 ++ return DAG.getMergeValues(MergedValues, 2, DL);
5215 ++ }
5216 ++
5217 ++ if (LoadNode->getAddressSpace() != AMDGPUAS::PRIVATE_ADDRESS) {
5218 ++ return SDValue();
5219 ++ }
5220 ++
5221 ++ // Lowering for indirect addressing
5222 ++ const MachineFunction &MF = DAG.getMachineFunction();
5223 ++ const AMDGPUFrameLowering *TFL = static_cast<const AMDGPUFrameLowering*>(
5224 ++ getTargetMachine().getFrameLowering());
5225 ++ unsigned StackWidth = TFL->getStackWidth(MF);
5226 ++
5227 ++ Ptr = stackPtrToRegIndex(Ptr, StackWidth, DAG);
5228 ++
5229 ++ if (VT.isVector()) {
5230 ++ unsigned NumElemVT = VT.getVectorNumElements();
5231 ++ EVT ElemVT = VT.getVectorElementType();
5232 ++ SDValue Loads[4];
5233 ++
5234 ++ assert(NumElemVT >= StackWidth && "Stack width cannot be greater than "
5235 ++ "vector width in load");
5236 ++
5237 ++ for (unsigned i = 0; i < NumElemVT; ++i) {
5238 ++ unsigned Channel, PtrIncr;
5239 ++ getStackAddress(StackWidth, i, Channel, PtrIncr);
5240 ++ Ptr = DAG.getNode(ISD::ADD, DL, MVT::i32, Ptr,
5241 ++ DAG.getConstant(PtrIncr, MVT::i32));
5242 ++ Loads[i] = DAG.getNode(AMDGPUISD::REGISTER_LOAD, DL, ElemVT,
5243 ++ Chain, Ptr,
5244 ++ DAG.getTargetConstant(Channel, MVT::i32),
5245 ++ Op.getOperand(2));
5246 ++ }
5247 ++ for (unsigned i = NumElemVT; i < 4; ++i) {
5248 ++ Loads[i] = DAG.getUNDEF(ElemVT);
5249 ++ }
5250 ++ EVT TargetVT = EVT::getVectorVT(*DAG.getContext(), ElemVT, 4);
5251 ++ LoweredLoad = DAG.getNode(ISD::BUILD_VECTOR, DL, TargetVT, Loads, 4);
5252 ++ } else {
5253 ++ LoweredLoad = DAG.getNode(AMDGPUISD::REGISTER_LOAD, DL, VT,
5254 ++ Chain, Ptr,
5255 ++ DAG.getTargetConstant(0, MVT::i32), // Channel
5256 ++ Op.getOperand(2));
5257 ++ }
5258 ++
5259 ++ SDValue Ops[2];
5260 ++ Ops[0] = LoweredLoad;
5261 ++ Ops[1] = Chain;
5262 ++
5263 ++ return DAG.getMergeValues(Ops, 2, DL);
5264 ++}
5265 ++
5266 ++SDValue R600TargetLowering::LowerFPOW(SDValue Op,
5267 ++ SelectionDAG &DAG) const {
5268 ++ DebugLoc DL = Op.getDebugLoc();
5269 ++ EVT VT = Op.getValueType();
5270 ++ SDValue LogBase = DAG.getNode(ISD::FLOG2, DL, VT, Op.getOperand(0));
5271 ++ SDValue MulLogBase = DAG.getNode(ISD::FMUL, DL, VT, Op.getOperand(1), LogBase);
5272 ++ return DAG.getNode(ISD::FEXP2, DL, VT, MulLogBase);
5273 ++}
5274 ++
5275 ++/// XXX Only kernel functions are supported, so we can assume for now that
5276 ++/// every function is a kernel function, but in the future we should use
5277 ++/// separate calling conventions for kernel and non-kernel functions.
5278 ++SDValue R600TargetLowering::LowerFormalArguments(
5279 + SDValue Chain,
5280 + CallingConv::ID CallConv,
5281 + bool isVarArg,
5282 @@ -14435,7 +15014,7 @@ index 0000000..d6b9d90
5283 + AMDGPUAS::PARAM_I_ADDRESS);
5284 + SDValue Arg = DAG.getExtLoad(ISD::ZEXTLOAD, DL, VT, DAG.getRoot(),
5285 + DAG.getConstant(ParamOffsetBytes, MVT::i32),
5286 -+ MachinePointerInfo(new Argument(PtrTy)),
5287 ++ MachinePointerInfo(UndefValue::get(PtrTy)),
5288 + ArgVT, false, false, ArgBytes);
5289 + InVals.push_back(Arg);
5290 + ParamOffsetBytes += ArgBytes;
5291 @@ -14466,15 +15045,94 @@ index 0000000..d6b9d90
5292 + }
5293 + break;
5294 + }
5295 ++
5296 ++ // (i32 fp_to_sint (fneg (select_cc f32, f32, 1.0, 0.0 cc))) ->
5297 ++ // (i32 select_cc f32, f32, -1, 0 cc)
5298 ++ //
5299 ++ // Mesa's GLSL frontend generates the above pattern a lot and we can lower
5300 ++ // this to one of the SET*_DX10 instructions.
5301 ++ case ISD::FP_TO_SINT: {
5302 ++ SDValue FNeg = N->getOperand(0);
5303 ++ if (FNeg.getOpcode() != ISD::FNEG) {
5304 ++ return SDValue();
5305 ++ }
5306 ++ SDValue SelectCC = FNeg.getOperand(0);
5307 ++ if (SelectCC.getOpcode() != ISD::SELECT_CC ||
5308 ++ SelectCC.getOperand(0).getValueType() != MVT::f32 || // LHS
5309 ++ SelectCC.getOperand(2).getValueType() != MVT::f32 || // True
5310 ++ !isHWTrueValue(SelectCC.getOperand(2)) ||
5311 ++ !isHWFalseValue(SelectCC.getOperand(3))) {
5312 ++ return SDValue();
5313 ++ }
5314 ++
5315 ++ return DAG.getNode(ISD::SELECT_CC, N->getDebugLoc(), N->getValueType(0),
5316 ++ SelectCC.getOperand(0), // LHS
5317 ++ SelectCC.getOperand(1), // RHS
5318 ++ DAG.getConstant(-1, MVT::i32), // True
5319 ++ DAG.getConstant(0, MVT::i32), // Flase
5320 ++ SelectCC.getOperand(4)); // CC
5321 ++
5322 ++ break;
5323 ++ }
5324 ++ // Extract_vec (Build_vector) generated by custom lowering
5325 ++ // also needs to be customly combined
5326 ++ case ISD::EXTRACT_VECTOR_ELT: {
5327 ++ SDValue Arg = N->getOperand(0);
5328 ++ if (Arg.getOpcode() == ISD::BUILD_VECTOR) {
5329 ++ if (ConstantSDNode *Const = dyn_cast<ConstantSDNode>(N->getOperand(1))) {
5330 ++ unsigned Element = Const->getZExtValue();
5331 ++ return Arg->getOperand(Element);
5332 ++ }
5333 ++ }
5334 ++ if (Arg.getOpcode() == ISD::BITCAST &&
5335 ++ Arg.getOperand(0).getOpcode() == ISD::BUILD_VECTOR) {
5336 ++ if (ConstantSDNode *Const = dyn_cast<ConstantSDNode>(N->getOperand(1))) {
5337 ++ unsigned Element = Const->getZExtValue();
5338 ++ return DAG.getNode(ISD::BITCAST, N->getDebugLoc(), N->getVTList(),
5339 ++ Arg->getOperand(0).getOperand(Element));
5340 ++ }
5341 ++ }
5342 ++ }
5343 ++
5344 ++ case ISD::SELECT_CC: {
5345 ++ // fold selectcc (selectcc x, y, a, b, cc), b, a, b, seteq ->
5346 ++ // selectcc x, y, a, b, inv(cc)
5347 ++ SDValue LHS = N->getOperand(0);
5348 ++ if (LHS.getOpcode() != ISD::SELECT_CC) {
5349 ++ return SDValue();
5350 ++ }
5351 ++
5352 ++ SDValue RHS = N->getOperand(1);
5353 ++ SDValue True = N->getOperand(2);
5354 ++ SDValue False = N->getOperand(3);
5355 ++
5356 ++ if (LHS.getOperand(2).getNode() != True.getNode() ||
5357 ++ LHS.getOperand(3).getNode() != False.getNode() ||
5358 ++ RHS.getNode() != False.getNode() ||
5359 ++ cast<CondCodeSDNode>(N->getOperand(4))->get() != ISD::SETEQ) {
5360 ++ return SDValue();
5361 ++ }
5362 ++
5363 ++ ISD::CondCode CCOpcode = cast<CondCodeSDNode>(LHS->getOperand(4))->get();
5364 ++ CCOpcode = ISD::getSetCCInverse(
5365 ++ CCOpcode, LHS.getOperand(0).getValueType().isInteger());
5366 ++ return DAG.getSelectCC(N->getDebugLoc(),
5367 ++ LHS.getOperand(0),
5368 ++ LHS.getOperand(1),
5369 ++ LHS.getOperand(2),
5370 ++ LHS.getOperand(3),
5371 ++ CCOpcode);
5372 ++
5373 ++ }
5374 + }
5375 + return SDValue();
5376 +}
5377 diff --git a/lib/Target/R600/R600ISelLowering.h b/lib/Target/R600/R600ISelLowering.h
5378 new file mode 100644
5379 -index 0000000..2b954da
5380 +index 0000000..afa3897
5381 --- /dev/null
5382 +++ b/lib/Target/R600/R600ISelLowering.h
5383 -@@ -0,0 +1,72 @@
5384 +@@ -0,0 +1,78 @@
5385 +//===-- R600ISelLowering.h - R600 DAG Lowering Interface -*- C++ -*--------===//
5386 +//
5387 +// The LLVM Compiler Infrastructure
5388 @@ -14540,7 +15198,13 @@ index 0000000..2b954da
5389 + SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
5390 + SDValue LowerFPTOUINT(SDValue Op, SelectionDAG &DAG) const;
5391 + SDValue LowerFPOW(SDValue Op, SelectionDAG &DAG) const;
5392 -+
5393 ++ SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
5394 ++ SDValue LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const;
5395 ++
5396 ++ SDValue stackPtrToRegIndex(SDValue Ptr, unsigned StackWidth,
5397 ++ SelectionDAG &DAG) const;
5398 ++ void getStackAddress(unsigned StackWidth, unsigned ElemIdx,
5399 ++ unsigned &Channel, unsigned &PtrIncr) const;
5400 + bool isZero(SDValue Op) const;
5401 +};
5402 +
5403 @@ -14549,10 +15213,10 @@ index 0000000..2b954da
5404 +#endif // R600ISELLOWERING_H
5405 diff --git a/lib/Target/R600/R600InstrInfo.cpp b/lib/Target/R600/R600InstrInfo.cpp
5406 new file mode 100644
5407 -index 0000000..70ed41aba
5408 +index 0000000..31671ea
5409 --- /dev/null
5410 +++ b/lib/Target/R600/R600InstrInfo.cpp
5411 -@@ -0,0 +1,665 @@
5412 +@@ -0,0 +1,776 @@
5413 +//===-- R600InstrInfo.cpp - R600 Instruction Information ------------------===//
5414 +//
5415 +// The LLVM Compiler Infrastructure
5416 @@ -14571,8 +15235,12 @@ index 0000000..70ed41aba
5417 +#include "AMDGPUTargetMachine.h"
5418 +#include "AMDGPUSubtarget.h"
5419 +#include "R600Defines.h"
5420 ++#include "R600MachineFunctionInfo.h"
5421 +#include "R600RegisterInfo.h"
5422 +#include "llvm/CodeGen/MachineInstrBuilder.h"
5423 ++#include "llvm/CodeGen/MachineFrameInfo.h"
5424 ++#include "llvm/CodeGen/MachineRegisterInfo.h"
5425 ++#include "llvm/Instructions.h"
5426 +
5427 +#define GET_INSTRINFO_CTOR
5428 +#include "AMDGPUGenDFAPacketizer.inc"
5429 @@ -14627,11 +15295,10 @@ index 0000000..70ed41aba
5430 +MachineInstr * R600InstrInfo::getMovImmInstr(MachineFunction *MF,
5431 + unsigned DstReg, int64_t Imm) const {
5432 + MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::MOV), DebugLoc());
5433 -+ MachineInstrBuilder MIB(*MF, MI);
5434 -+ MIB.addReg(DstReg, RegState::Define);
5435 -+ MIB.addReg(AMDGPU::ALU_LITERAL_X);
5436 -+ MIB.addImm(Imm);
5437 -+ MIB.addReg(0); // PREDICATE_BIT
5438 ++ MachineInstrBuilder(MI).addReg(DstReg, RegState::Define);
5439 ++ MachineInstrBuilder(MI).addReg(AMDGPU::ALU_LITERAL_X);
5440 ++ MachineInstrBuilder(MI).addImm(Imm);
5441 ++ MachineInstrBuilder(MI).addReg(0); // PREDICATE_BIT
5442 +
5443 + return MI;
5444 +}
5445 @@ -14659,7 +15326,6 @@ index 0000000..70ed41aba
5446 + switch (Opcode) {
5447 + default: return false;
5448 + case AMDGPU::RETURN:
5449 -+ case AMDGPU::RESERVE_REG:
5450 + return true;
5451 + }
5452 +}
5453 @@ -15005,8 +15671,7 @@ index 0000000..70ed41aba
5454 + if (PIdx != -1) {
5455 + MachineOperand &PMO = MI->getOperand(PIdx);
5456 + PMO.setReg(Pred[2].getReg());
5457 -+ MachineInstrBuilder MIB(*MI->getParent()->getParent(), MI);
5458 -+ MIB.addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit);
5459 ++ MachineInstrBuilder(MI).addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit);
5460 + return true;
5461 + }
5462 +
5463 @@ -15021,6 +15686,124 @@ index 0000000..70ed41aba
5464 + return 2;
5465 +}
5466 +
5467 ++int R600InstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const {
5468 ++ const MachineRegisterInfo &MRI = MF.getRegInfo();
5469 ++ const MachineFrameInfo *MFI = MF.getFrameInfo();
5470 ++ int Offset = 0;
5471 ++
5472 ++ if (MFI->getNumObjects() == 0) {
5473 ++ return -1;
5474 ++ }
5475 ++
5476 ++ if (MRI.livein_empty()) {
5477 ++ return 0;
5478 ++ }
5479 ++
5480 ++ for (MachineRegisterInfo::livein_iterator LI = MRI.livein_begin(),
5481 ++ LE = MRI.livein_end();
5482 ++ LI != LE; ++LI) {
5483 ++ Offset = std::max(Offset,
5484 ++ GET_REG_INDEX(RI.getEncodingValue(LI->first)));
5485 ++ }
5486 ++
5487 ++ return Offset + 1;
5488 ++}
5489 ++
5490 ++int R600InstrInfo::getIndirectIndexEnd(const MachineFunction &MF) const {
5491 ++ int Offset = 0;
5492 ++ const MachineFrameInfo *MFI = MF.getFrameInfo();
5493 ++
5494 ++ // Variable sized objects are not supported
5495 ++ assert(!MFI->hasVarSizedObjects());
5496 ++
5497 ++ if (MFI->getNumObjects() == 0) {
5498 ++ return -1;
5499 ++ }
5500 ++
5501 ++ Offset = TM.getFrameLowering()->getFrameIndexOffset(MF, -1);
5502 ++
5503 ++ return getIndirectIndexBegin(MF) + Offset;
5504 ++}
5505 ++
5506 ++std::vector<unsigned> R600InstrInfo::getIndirectReservedRegs(
5507 ++ const MachineFunction &MF) const {
5508 ++ const AMDGPUFrameLowering *TFL =
5509 ++ static_cast<const AMDGPUFrameLowering*>(TM.getFrameLowering());
5510 ++ std::vector<unsigned> Regs;
5511 ++
5512 ++ unsigned StackWidth = TFL->getStackWidth(MF);
5513 ++ int End = getIndirectIndexEnd(MF);
5514 ++
5515 ++ if (End == -1) {
5516 ++ return Regs;
5517 ++ }
5518 ++
5519 ++ for (int Index = getIndirectIndexBegin(MF); Index <= End; ++Index) {
5520 ++ unsigned SuperReg = AMDGPU::R600_Reg128RegClass.getRegister(Index);
5521 ++ Regs.push_back(SuperReg);
5522 ++ for (unsigned Chan = 0; Chan < StackWidth; ++Chan) {
5523 ++ unsigned Reg = AMDGPU::R600_TReg32RegClass.getRegister((4 * Index) + Chan);
5524 ++ Regs.push_back(Reg);
5525 ++ }
5526 ++ }
5527 ++ return Regs;
5528 ++}
5529 ++
5530 ++unsigned R600InstrInfo::calculateIndirectAddress(unsigned RegIndex,
5531 ++ unsigned Channel) const {
5532 ++ // XXX: Remove when we support a stack width > 2
5533 ++ assert(Channel == 0);
5534 ++ return RegIndex;
5535 ++}
5536 ++
5537 ++const TargetRegisterClass * R600InstrInfo::getIndirectAddrStoreRegClass(
5538 ++ unsigned SourceReg) const {
5539 ++ return &AMDGPU::R600_TReg32RegClass;
5540 ++}
5541 ++
5542 ++const TargetRegisterClass *R600InstrInfo::getIndirectAddrLoadRegClass() const {
5543 ++ return &AMDGPU::TRegMemRegClass;
5544 ++}
5545 ++
5546 ++MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,
5547 ++ MachineBasicBlock::iterator I,
5548 ++ unsigned ValueReg, unsigned Address,
5549 ++ unsigned OffsetReg) const {
5550 ++ unsigned AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address);
5551 ++ MachineInstr *MOVA = buildDefaultInstruction(*MBB, I, AMDGPU::MOVA_INT_eg,
5552 ++ AMDGPU::AR_X, OffsetReg);
5553 ++ setImmOperand(MOVA, R600Operands::WRITE, 0);
5554 ++
5555 ++ MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV,
5556 ++ AddrReg, ValueReg)
5557 ++ .addReg(AMDGPU::AR_X, RegState::Implicit);
5558 ++ setImmOperand(Mov, R600Operands::DST_REL, 1);
5559 ++ return Mov;
5560 ++}
5561 ++
5562 ++MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,
5563 ++ MachineBasicBlock::iterator I,
5564 ++ unsigned ValueReg, unsigned Address,
5565 ++ unsigned OffsetReg) const {
5566 ++ unsigned AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address);
5567 ++ MachineInstr *MOVA = buildDefaultInstruction(*MBB, I, AMDGPU::MOVA_INT_eg,
5568 ++ AMDGPU::AR_X,
5569 ++ OffsetReg);
5570 ++ setImmOperand(MOVA, R600Operands::WRITE, 0);
5571 ++ MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV,
5572 ++ ValueReg,
5573 ++ AddrReg)
5574 ++ .addReg(AMDGPU::AR_X, RegState::Implicit);
5575 ++ setImmOperand(Mov, R600Operands::SRC0_REL, 1);
5576 ++
5577 ++ return Mov;
5578 ++}
5579 ++
5580 ++const TargetRegisterClass *R600InstrInfo::getSuperIndirectRegClass() const {
5581 ++ return &AMDGPU::IndirectRegRegClass;
5582 ++}
5583 ++
5584 ++
5585 +MachineInstrBuilder R600InstrInfo::buildDefaultInstruction(MachineBasicBlock &MBB,
5586 + MachineBasicBlock::iterator I,
5587 + unsigned Opcode,
5588 @@ -15041,13 +15824,15 @@ index 0000000..70ed41aba
5589 + .addReg(Src0Reg) // $src0
5590 + .addImm(0) // $src0_neg
5591 + .addImm(0) // $src0_rel
5592 -+ .addImm(0); // $src0_abs
5593 ++ .addImm(0) // $src0_abs
5594 ++ .addImm(-1); // $src0_sel
5595 +
5596 + if (Src1Reg) {
5597 + MIB.addReg(Src1Reg) // $src1
5598 + .addImm(0) // $src1_neg
5599 + .addImm(0) // $src1_rel
5600 -+ .addImm(0); // $src1_abs
5601 ++ .addImm(0) // $src1_abs
5602 ++ .addImm(-1); // $src1_sel
5603 + }
5604 +
5605 + //XXX: The r600g finalizer expects this to be 1, once we've moved the
5606 @@ -15076,16 +15861,6 @@ index 0000000..70ed41aba
5607 +
5608 +int R600InstrInfo::getOperandIdx(unsigned Opcode,
5609 + R600Operands::Ops Op) const {
5610 -+ const static int OpTable[3][R600Operands::COUNT] = {
5611 -+// W C S S S S S S S S
5612 -+// R O D L S R R R S R R R S R R L P
5613 -+// D U I M R A R C C C C C C C R C C A R I
5614 -+// S E U T O E M C 0 0 0 C 1 1 1 C 2 2 S E M
5615 -+// T M P E D L P 0 N R A 1 N R A 2 N R T D M
5616 -+ {0,-1,-1, 1, 2, 3, 4, 5, 6, 7, 8,-1,-1,-1,-1,-1,-1,-1, 9,10,11},
5617 -+ {0, 1, 2, 3, 4 ,5 ,6 ,7, 8, 9,10,11,12,-1,-1,-1,13,14,15,16,17},
5618 -+ {0,-1,-1,-1,-1, 1, 2, 3, 4, 5,-1, 6, 7, 8,-1, 9,10,11,12,13,14}
5619 -+ };
5620 + unsigned TargetFlags = get(Opcode).TSFlags;
5621 + unsigned OpTableIdx;
5622 +
5623 @@ -15111,7 +15886,7 @@ index 0000000..70ed41aba
5624 + OpTableIdx = 2;
5625 + }
5626 +
5627 -+ return OpTable[OpTableIdx][Op];
5628 ++ return R600Operands::ALUOpTable[OpTableIdx][Op];
5629 +}
5630 +
5631 +void R600InstrInfo::setImmOperand(MachineInstr *MI, R600Operands::Ops Op,
5632 @@ -15220,10 +15995,10 @@ index 0000000..70ed41aba
5633 +}
5634 diff --git a/lib/Target/R600/R600InstrInfo.h b/lib/Target/R600/R600InstrInfo.h
5635 new file mode 100644
5636 -index 0000000..6bb0ca9
5637 +index 0000000..278fad1
5638 --- /dev/null
5639 +++ b/lib/Target/R600/R600InstrInfo.h
5640 -@@ -0,0 +1,169 @@
5641 +@@ -0,0 +1,201 @@
5642 +//===-- R600InstrInfo.h - R600 Instruction Info Interface -------*- C++ -*-===//
5643 +//
5644 +// The LLVM Compiler Infrastructure
5645 @@ -15340,6 +16115,38 @@ index 0000000..6bb0ca9
5646 + virtual int getInstrLatency(const InstrItineraryData *ItinData,
5647 + SDNode *Node) const { return 1;}
5648 +
5649 ++ /// \returns a list of all the registers that may be accesed using indirect
5650 ++ /// addressing.
5651 ++ std::vector<unsigned> getIndirectReservedRegs(const MachineFunction &MF) const;
5652 ++
5653 ++ virtual int getIndirectIndexBegin(const MachineFunction &MF) const;
5654 ++
5655 ++ virtual int getIndirectIndexEnd(const MachineFunction &MF) const;
5656 ++
5657 ++
5658 ++ virtual unsigned calculateIndirectAddress(unsigned RegIndex,
5659 ++ unsigned Channel) const;
5660 ++
5661 ++ virtual const TargetRegisterClass *getIndirectAddrStoreRegClass(
5662 ++ unsigned SourceReg) const;
5663 ++
5664 ++ virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const;
5665 ++
5666 ++ virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB,
5667 ++ MachineBasicBlock::iterator I,
5668 ++ unsigned ValueReg, unsigned Address,
5669 ++ unsigned OffsetReg) const;
5670 ++
5671 ++ virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB,
5672 ++ MachineBasicBlock::iterator I,
5673 ++ unsigned ValueReg, unsigned Address,
5674 ++ unsigned OffsetReg) const;
5675 ++
5676 ++ virtual const TargetRegisterClass *getSuperIndirectRegClass() const;
5677 ++
5678 ++
5679 ++ ///buildDefaultInstruction - This function returns a MachineInstr with
5680 ++ /// all the instruction modifiers initialized to their default values.
5681 + /// You can use this function to avoid manually specifying each instruction
5682 + /// modifier operand when building a new instruction.
5683 + ///
5684 @@ -15395,10 +16202,10 @@ index 0000000..6bb0ca9
5685 +#endif // R600INSTRINFO_H_
5686 diff --git a/lib/Target/R600/R600Instructions.td b/lib/Target/R600/R600Instructions.td
5687 new file mode 100644
5688 -index 0000000..64bab18
5689 +index 0000000..409da07
5690 --- /dev/null
5691 +++ b/lib/Target/R600/R600Instructions.td
5692 -@@ -0,0 +1,1724 @@
5693 +@@ -0,0 +1,1976 @@
5694 +//===-- R600Instructions.td - R600 Instruction defs -------*- tablegen -*-===//
5695 +//
5696 +// The LLVM Compiler Infrastructure
5697 @@ -15471,6 +16278,11 @@ index 0000000..64bab18
5698 + let PrintMethod = PM;
5699 +}
5700 +
5701 ++// src_sel for ALU src operands, see also ALU_CONST, ALU_PARAM registers
5702 ++def SEL : OperandWithDefaultOps <i32, (ops (i32 -1))> {
5703 ++ let PrintMethod = "printSel";
5704 ++}
5705 ++
5706 +def LITERAL : InstFlag<"printLiteral">;
5707 +
5708 +def WRITE : InstFlag <"printWrite", 1>;
5709 @@ -15487,9 +16299,16 @@ index 0000000..64bab18
5710 +// default to 0.
5711 +def LAST : InstFlag<"printLast", 1>;
5712 +
5713 ++def FRAMEri : Operand<iPTR> {
5714 ++ let MIOperandInfo = (ops R600_Reg32:$ptr, i32imm:$index);
5715 ++}
5716 ++
5717 +def ADDRParam : ComplexPattern<i32, 2, "SelectADDRParam", [], []>;
5718 +def ADDRDWord : ComplexPattern<i32, 1, "SelectADDRDWord", [], []>;
5719 +def ADDRVTX_READ : ComplexPattern<i32, 2, "SelectADDRVTX_READ", [], []>;
5720 ++def ADDRGA_CONST_OFFSET : ComplexPattern<i32, 1, "SelectGlobalValueConstantOffset", [], []>;
5721 ++def ADDRGA_VAR_OFFSET : ComplexPattern<i32, 2, "SelectGlobalValueVariableOffset", [], []>;
5722 ++def ADDRIndirect : ComplexPattern<iPTR, 2, "SelectADDRIndirect", [], []>;
5723 +
5724 +class R600ALU_Word0 {
5725 + field bits<32> Word0;
5726 @@ -15574,6 +16393,55 @@ index 0000000..64bab18
5727 + let Word1{17-13} = alu_inst;
5728 +}
5729 +
5730 ++class VTX_WORD0 {
5731 ++ field bits<32> Word0;
5732 ++ bits<7> SRC_GPR;
5733 ++ bits<5> VC_INST;
5734 ++ bits<2> FETCH_TYPE;
5735 ++ bits<1> FETCH_WHOLE_QUAD;
5736 ++ bits<8> BUFFER_ID;
5737 ++ bits<1> SRC_REL;
5738 ++ bits<2> SRC_SEL_X;
5739 ++ bits<6> MEGA_FETCH_COUNT;
5740 ++
5741 ++ let Word0{4-0} = VC_INST;
5742 ++ let Word0{6-5} = FETCH_TYPE;
5743 ++ let Word0{7} = FETCH_WHOLE_QUAD;
5744 ++ let Word0{15-8} = BUFFER_ID;
5745 ++ let Word0{22-16} = SRC_GPR;
5746 ++ let Word0{23} = SRC_REL;
5747 ++ let Word0{25-24} = SRC_SEL_X;
5748 ++ let Word0{31-26} = MEGA_FETCH_COUNT;
5749 ++}
5750 ++
5751 ++class VTX_WORD1_GPR {
5752 ++ field bits<32> Word1;
5753 ++ bits<7> DST_GPR;
5754 ++ bits<1> DST_REL;
5755 ++ bits<3> DST_SEL_X;
5756 ++ bits<3> DST_SEL_Y;
5757 ++ bits<3> DST_SEL_Z;
5758 ++ bits<3> DST_SEL_W;
5759 ++ bits<1> USE_CONST_FIELDS;
5760 ++ bits<6> DATA_FORMAT;
5761 ++ bits<2> NUM_FORMAT_ALL;
5762 ++ bits<1> FORMAT_COMP_ALL;
5763 ++ bits<1> SRF_MODE_ALL;
5764 ++
5765 ++ let Word1{6-0} = DST_GPR;
5766 ++ let Word1{7} = DST_REL;
5767 ++ let Word1{8} = 0; // Reserved
5768 ++ let Word1{11-9} = DST_SEL_X;
5769 ++ let Word1{14-12} = DST_SEL_Y;
5770 ++ let Word1{17-15} = DST_SEL_Z;
5771 ++ let Word1{20-18} = DST_SEL_W;
5772 ++ let Word1{21} = USE_CONST_FIELDS;
5773 ++ let Word1{27-22} = DATA_FORMAT;
5774 ++ let Word1{29-28} = NUM_FORMAT_ALL;
5775 ++ let Word1{30} = FORMAT_COMP_ALL;
5776 ++ let Word1{31} = SRF_MODE_ALL;
5777 ++}
5778 ++
5779 +/*
5780 +XXX: R600 subtarget uses a slightly different encoding than the other
5781 +subtargets. We currently handle this in R600MCCodeEmitter, but we may
5782 @@ -15615,11 +16483,11 @@ index 0000000..64bab18
5783 + InstR600 <0,
5784 + (outs R600_Reg32:$dst),
5785 + (ins WRITE:$write, OMOD:$omod, REL:$dst_rel, CLAMP:$clamp,
5786 -+ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs,
5787 ++ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs, SEL:$src0_sel,
5788 + LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal),
5789 + !strconcat(opName,
5790 + "$clamp $dst$write$dst_rel$omod, "
5791 -+ "$src0_neg$src0_abs$src0$src0_abs$src0_rel, "
5792 ++ "$src0_neg$src0_abs$src0$src0_sel$src0_abs$src0_rel, "
5793 + "$literal $pred_sel$last"),
5794 + pattern,
5795 + itin>,
5796 @@ -15655,13 +16523,13 @@ index 0000000..64bab18
5797 + (outs R600_Reg32:$dst),
5798 + (ins UEM:$update_exec_mask, UP:$update_pred, WRITE:$write,
5799 + OMOD:$omod, REL:$dst_rel, CLAMP:$clamp,
5800 -+ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs,
5801 -+ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, ABS:$src1_abs,
5802 ++ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs, SEL:$src0_sel,
5803 ++ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, ABS:$src1_abs, SEL:$src1_sel,
5804 + LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal),
5805 + !strconcat(opName,
5806 + "$clamp $update_exec_mask$update_pred$dst$write$dst_rel$omod, "
5807 -+ "$src0_neg$src0_abs$src0$src0_abs$src0_rel, "
5808 -+ "$src1_neg$src1_abs$src1$src1_abs$src1_rel, "
5809 ++ "$src0_neg$src0_abs$src0$src0_sel$src0_abs$src0_rel, "
5810 ++ "$src1_neg$src1_abs$src1$src1_sel$src1_abs$src1_rel, "
5811 + "$literal $pred_sel$last"),
5812 + pattern,
5813 + itin>,
5814 @@ -15692,14 +16560,14 @@ index 0000000..64bab18
5815 + InstR600 <0,
5816 + (outs R600_Reg32:$dst),
5817 + (ins REL:$dst_rel, CLAMP:$clamp,
5818 -+ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel,
5819 -+ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel,
5820 -+ R600_Reg32:$src2, NEG:$src2_neg, REL:$src2_rel,
5821 ++ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, SEL:$src0_sel,
5822 ++ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, SEL:$src1_sel,
5823 ++ R600_Reg32:$src2, NEG:$src2_neg, REL:$src2_rel, SEL:$src2_sel,
5824 + LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal),
5825 + !strconcat(opName, "$clamp $dst$dst_rel, "
5826 -+ "$src0_neg$src0$src0_rel, "
5827 -+ "$src1_neg$src1$src1_rel, "
5828 -+ "$src2_neg$src2$src2_rel, "
5829 ++ "$src0_neg$src0$src0_sel$src0_rel, "
5830 ++ "$src1_neg$src1$src1_sel$src1_rel, "
5831 ++ "$src2_neg$src2$src2_sel$src2_rel, "
5832 + "$literal $pred_sel$last"),
5833 + pattern,
5834 + itin>,
5835 @@ -15743,6 +16611,27 @@ index 0000000..64bab18
5836 + }]
5837 +>;
5838 +
5839 ++def TEX_RECT : PatLeaf<
5840 ++ (imm),
5841 ++ [{uint32_t TType = (uint32_t)N->getZExtValue();
5842 ++ return TType == 5;
5843 ++ }]
5844 ++>;
5845 ++
5846 ++def TEX_ARRAY : PatLeaf<
5847 ++ (imm),
5848 ++ [{uint32_t TType = (uint32_t)N->getZExtValue();
5849 ++ return TType == 9 || TType == 10 || TType == 15 || TType == 16;
5850 ++ }]
5851 ++>;
5852 ++
5853 ++def TEX_SHADOW_ARRAY : PatLeaf<
5854 ++ (imm),
5855 ++ [{uint32_t TType = (uint32_t)N->getZExtValue();
5856 ++ return TType == 11 || TType == 12 || TType == 17;
5857 ++ }]
5858 ++>;
5859 ++
5860 +class EG_CF_RAT <bits <8> cf_inst, bits <6> rat_inst, bits<4> rat_id, dag outs,
5861 + dag ins, string asm, list<dag> pattern> :
5862 + InstR600ISA <outs, ins, asm, pattern> {
5863 @@ -15815,32 +16704,35 @@ index 0000000..64bab18
5864 + "Subtarget.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX">;
5865 +
5866 +//===----------------------------------------------------------------------===//
5867 -+// Interpolation Instructions
5868 ++// R600 SDNodes
5869 +//===----------------------------------------------------------------------===//
5870 +
5871 -+def INTERP: SDNode<"AMDGPUISD::INTERP",
5872 -+ SDTypeProfile<1, 2, [SDTCisFP<0>, SDTCisInt<1>, SDTCisInt<2>]>
5873 -+ >;
5874 -+
5875 -+def INTERP_P0: SDNode<"AMDGPUISD::INTERP_P0",
5876 -+ SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisInt<1>]>
5877 -+ >;
5878 ++def INTERP_PAIR_XY : AMDGPUShaderInst <
5879 ++ (outs R600_TReg32_X:$dst0, R600_TReg32_Y:$dst1),
5880 ++ (ins i32imm:$src0, R600_Reg32:$src1, R600_Reg32:$src2),
5881 ++ "INTERP_PAIR_XY $src0 $src1 $src2 : $dst0 dst1",
5882 ++ []>;
5883 ++
5884 ++def INTERP_PAIR_ZW : AMDGPUShaderInst <
5885 ++ (outs R600_TReg32_Z:$dst0, R600_TReg32_W:$dst1),
5886 ++ (ins i32imm:$src0, R600_Reg32:$src1, R600_Reg32:$src2),
5887 ++ "INTERP_PAIR_ZW $src0 $src1 $src2 : $dst0 dst1",
5888 ++ []>;
5889 ++
5890 ++def CONST_ADDRESS: SDNode<"AMDGPUISD::CONST_ADDRESS",
5891 ++ SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisPtrTy<1>]>,
5892 ++ [SDNPMayLoad]
5893 ++>;
5894 +
5895 -+let usesCustomInserter = 1 in {
5896 -+def input_perspective : AMDGPUShaderInst <
5897 -+ (outs R600_Reg128:$dst),
5898 -+ (ins i32imm:$src0, i32imm:$src1),
5899 -+ "input_perspective $src0 $src1 : dst",
5900 -+ [(set R600_Reg128:$dst, (INTERP (i32 imm:$src0), (i32 imm:$src1)))]>;
5901 -+} // End usesCustomInserter = 1
5902 ++//===----------------------------------------------------------------------===//
5903 ++// Interpolation Instructions
5904 ++//===----------------------------------------------------------------------===//
5905 +
5906 -+def input_constant : AMDGPUShaderInst <
5907 ++def INTERP_VEC_LOAD : AMDGPUShaderInst <
5908 + (outs R600_Reg128:$dst),
5909 -+ (ins i32imm:$src),
5910 -+ "input_perspective $src : dst",
5911 -+ [(set R600_Reg128:$dst, (INTERP_P0 (i32 imm:$src)))]>;
5912 -+
5913 -+
5914 ++ (ins i32imm:$src0),
5915 ++ "INTERP_LOAD $src0 : $dst",
5916 ++ []>;
5917 +
5918 +def INTERP_XY : R600_2OP <0xD6, "INTERP_XY", []> {
5919 + let bank_swizzle = 5;
5920 @@ -15908,19 +16800,24 @@ index 0000000..64bab18
5921 +multiclass ExportPattern<Instruction ExportInst, bits<8> cf_inst> {
5922 + def : Pat<(int_R600_store_pixel_depth R600_Reg32:$reg),
5923 + (ExportInst
5924 -+ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sel_x),
5925 ++ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sub0),
5926 + 0, 61, 0, 7, 7, 7, cf_inst, 0)
5927 + >;
5928 +
5929 + def : Pat<(int_R600_store_pixel_stencil R600_Reg32:$reg),
5930 + (ExportInst
5931 -+ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sel_x),
5932 ++ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sub0),
5933 + 0, 61, 7, 0, 7, 7, cf_inst, 0)
5934 + >;
5935 +
5936 -+ def : Pat<(int_R600_store_pixel_dummy),
5937 ++ def : Pat<(int_R600_store_dummy (i32 imm:$type)),
5938 ++ (ExportInst
5939 ++ (v4f32 (IMPLICIT_DEF)), imm:$type, 0, 7, 7, 7, 7, cf_inst, 0)
5940 ++ >;
5941 ++
5942 ++ def : Pat<(int_R600_store_dummy 1),
5943 + (ExportInst
5944 -+ (v4f32 (IMPLICIT_DEF)), 0, 0, 7, 7, 7, 7, cf_inst, 0)
5945 ++ (v4f32 (IMPLICIT_DEF)), 1, 60, 7, 7, 7, 7, cf_inst, 0)
5946 + >;
5947 +
5948 + def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 0),
5949 @@ -15928,29 +16825,40 @@ index 0000000..64bab18
5950 + (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
5951 + 0, 1, 2, 3, cf_inst, 0)
5952 + >;
5953 ++ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 1),
5954 ++ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm)),
5955 ++ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
5956 ++ 0, 1, 2, 3, cf_inst, 0)
5957 ++ >;
5958 ++
5959 ++ def : Pat<(int_R600_store_swizzle (v4f32 R600_Reg128:$src), imm:$arraybase,
5960 ++ imm:$type),
5961 ++ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
5962 ++ 0, 1, 2, 3, cf_inst, 0)
5963 ++ >;
5964 +}
5965 +
5966 +multiclass SteamOutputExportPattern<Instruction ExportInst,
5967 + bits<8> buf0inst, bits<8> buf1inst, bits<8> buf2inst, bits<8> buf3inst> {
5968 +// Stream0
5969 -+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 1),
5970 -+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
5971 -+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
5972 ++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
5973 ++ (i32 imm:$arraybase), (i32 0), (i32 imm:$mask)),
5974 ++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
5975 + 4095, imm:$mask, buf0inst, 0)>;
5976 +// Stream1
5977 -+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 2),
5978 -+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
5979 -+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
5980 ++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
5981 ++ (i32 imm:$arraybase), (i32 1), (i32 imm:$mask)),
5982 ++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
5983 + 4095, imm:$mask, buf1inst, 0)>;
5984 +// Stream2
5985 -+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 3),
5986 -+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
5987 -+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
5988 ++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
5989 ++ (i32 imm:$arraybase), (i32 2), (i32 imm:$mask)),
5990 ++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
5991 + 4095, imm:$mask, buf2inst, 0)>;
5992 +// Stream3
5993 -+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 4),
5994 -+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
5995 -+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
5996 ++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
5997 ++ (i32 imm:$arraybase), (i32 3), (i32 imm:$mask)),
5998 ++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
5999 + 4095, imm:$mask, buf3inst, 0)>;
6000 +}
6001 +
6002 @@ -16025,6 +16933,34 @@ index 0000000..64bab18
6003 + COND_NE))]
6004 +>;
6005 +
6006 ++def SETE_DX10 : R600_2OP <
6007 ++ 0xC, "SETE_DX10",
6008 ++ [(set R600_Reg32:$dst,
6009 ++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
6010 ++ COND_EQ))]
6011 ++>;
6012 ++
6013 ++def SETGT_DX10 : R600_2OP <
6014 ++ 0xD, "SETGT_DX10",
6015 ++ [(set R600_Reg32:$dst,
6016 ++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
6017 ++ COND_GT))]
6018 ++>;
6019 ++
6020 ++def SETGE_DX10 : R600_2OP <
6021 ++ 0xE, "SETGE_DX10",
6022 ++ [(set R600_Reg32:$dst,
6023 ++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
6024 ++ COND_GE))]
6025 ++>;
6026 ++
6027 ++def SETNE_DX10 : R600_2OP <
6028 ++ 0xF, "SETNE_DX10",
6029 ++ [(set R600_Reg32:$dst,
6030 ++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
6031 ++ COND_NE))]
6032 ++>;
6033 ++
6034 +def FRACT : R600_1OP_Helper <0x10, "FRACT", AMDGPUfract>;
6035 +def TRUNC : R600_1OP_Helper <0x11, "TRUNC", int_AMDGPU_trunc>;
6036 +def CEIL : R600_1OP_Helper <0x12, "CEIL", fceil>;
6037 @@ -16085,7 +17021,7 @@ index 0000000..64bab18
6038 +>;
6039 +
6040 +def SETGT_INT : R600_2OP <
6041 -+ 0x3B, "SGT_INT",
6042 ++ 0x3B, "SETGT_INT",
6043 + [(set (i32 R600_Reg32:$dst),
6044 + (selectcc (i32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETGT))]
6045 +>;
6046 @@ -16539,6 +17475,10 @@ index 0000000..64bab18
6047 + defm DOT4_eg : DOT4_Common<0xBE>;
6048 + defm CUBE_eg : CUBE_Common<0xC0>;
6049 +
6050 ++let hasSideEffects = 1 in {
6051 ++ def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", []>;
6052 ++}
6053 ++
6054 + def TGSI_LIT_Z_eg : TGSI_LIT_Z_Common<MUL_LIT_eg, LOG_CLAMPED_eg, EXP_IEEE_eg>;
6055 +
6056 + def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {
6057 @@ -16629,37 +17569,30 @@ index 0000000..64bab18
6058 +>;
6059 +
6060 +class VTX_READ_eg <string name, bits<8> buffer_id, dag outs, list<dag> pattern>
6061 -+ : InstR600ISA <outs, (ins MEMxi:$ptr), name#" $dst, $ptr", pattern> {
6062 -+
6063 -+ // Operands
6064 -+ bits<7> DST_GPR;
6065 -+ bits<7> SRC_GPR;
6066 ++ : InstR600ISA <outs, (ins MEMxi:$ptr), name#" $dst, $ptr", pattern>,
6067 ++ VTX_WORD1_GPR, VTX_WORD0 {
6068 +
6069 + // Static fields
6070 -+ bits<5> VC_INST = 0;
6071 -+ bits<2> FETCH_TYPE = 2;
6072 -+ bits<1> FETCH_WHOLE_QUAD = 0;
6073 -+ bits<8> BUFFER_ID = buffer_id;
6074 -+ bits<1> SRC_REL = 0;
6075 ++ let VC_INST = 0;
6076 ++ let FETCH_TYPE = 2;
6077 ++ let FETCH_WHOLE_QUAD = 0;
6078 ++ let BUFFER_ID = buffer_id;
6079 ++ let SRC_REL = 0;
6080 + // XXX: We can infer this field based on the SRC_GPR. This would allow us
6081 + // to store vertex addresses in any channel, not just X.
6082 -+ bits<2> SRC_SEL_X = 0;
6083 -+ bits<6> MEGA_FETCH_COUNT;
6084 -+ bits<1> DST_REL = 0;
6085 -+ bits<3> DST_SEL_X;
6086 -+ bits<3> DST_SEL_Y;
6087 -+ bits<3> DST_SEL_Z;
6088 -+ bits<3> DST_SEL_W;
6089 ++ let SRC_SEL_X = 0;
6090 ++ let DST_REL = 0;
6091 + // The docs say that if this bit is set, then DATA_FORMAT, NUM_FORMAT_ALL,
6092 + // FORMAT_COMP_ALL, SRF_MODE_ALL, and ENDIAN_SWAP fields will be ignored,
6093 + // however, based on my testing if USE_CONST_FIELDS is set, then all
6094 + // these fields need to be set to 0.
6095 -+ bits<1> USE_CONST_FIELDS = 0;
6096 -+ bits<6> DATA_FORMAT;
6097 -+ bits<2> NUM_FORMAT_ALL = 1;
6098 -+ bits<1> FORMAT_COMP_ALL = 0;
6099 -+ bits<1> SRF_MODE_ALL = 0;
6100 ++ let USE_CONST_FIELDS = 0;
6101 ++ let NUM_FORMAT_ALL = 1;
6102 ++ let FORMAT_COMP_ALL = 0;
6103 ++ let SRF_MODE_ALL = 0;
6104 +
6105 ++ let Inst{31-0} = Word0;
6106 ++ let Inst{63-32} = Word1;
6107 + // LLVM can only encode 64-bit instructions, so these fields are manually
6108 + // encoded in R600CodeEmitter
6109 + //
6110 @@ -16670,29 +17603,7 @@ index 0000000..64bab18
6111 + // bits<1> ALT_CONST = 0;
6112 + // bits<2> BUFFER_INDEX_MODE = 0;
6113 +
6114 -+ // VTX_WORD0
6115 -+ let Inst{4-0} = VC_INST;
6116 -+ let Inst{6-5} = FETCH_TYPE;
6117 -+ let Inst{7} = FETCH_WHOLE_QUAD;
6118 -+ let Inst{15-8} = BUFFER_ID;
6119 -+ let Inst{22-16} = SRC_GPR;
6120 -+ let Inst{23} = SRC_REL;
6121 -+ let Inst{25-24} = SRC_SEL_X;
6122 -+ let Inst{31-26} = MEGA_FETCH_COUNT;
6123 -+
6124 -+ // VTX_WORD1_GPR
6125 -+ let Inst{38-32} = DST_GPR;
6126 -+ let Inst{39} = DST_REL;
6127 -+ let Inst{40} = 0; // Reserved
6128 -+ let Inst{43-41} = DST_SEL_X;
6129 -+ let Inst{46-44} = DST_SEL_Y;
6130 -+ let Inst{49-47} = DST_SEL_Z;
6131 -+ let Inst{52-50} = DST_SEL_W;
6132 -+ let Inst{53} = USE_CONST_FIELDS;
6133 -+ let Inst{59-54} = DATA_FORMAT;
6134 -+ let Inst{61-60} = NUM_FORMAT_ALL;
6135 -+ let Inst{62} = FORMAT_COMP_ALL;
6136 -+ let Inst{63} = SRF_MODE_ALL;
6137 ++
6138 +
6139 + // VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding
6140 + // is done in R600CodeEmitter
6141 @@ -16788,6 +17699,10 @@ index 0000000..64bab18
6142 + [(set (i32 R600_TReg32_X:$dst), (load_param ADDRVTX_READ:$ptr))]
6143 +>;
6144 +
6145 ++def VTX_READ_PARAM_128_eg : VTX_READ_128_eg <0,
6146 ++ [(set (v4i32 R600_Reg128:$dst), (load_param ADDRVTX_READ:$ptr))]
6147 ++>;
6148 ++
6149 +//===----------------------------------------------------------------------===//
6150 +// VTX Read from global memory space
6151 +//===----------------------------------------------------------------------===//
6152 @@ -16818,6 +17733,12 @@ index 0000000..64bab18
6153 +
6154 +}
6155 +
6156 ++//===----------------------------------------------------------------------===//
6157 ++// Regist loads and stores - for indirect addressing
6158 ++//===----------------------------------------------------------------------===//
6159 ++
6160 ++defm R600_ : RegisterLoadStore <R600_Reg32, FRAMEri, ADDRIndirect>;
6161 ++
6162 +let Predicates = [isCayman] in {
6163 +
6164 +let isVector = 1 in {
6165 @@ -16877,6 +17798,7 @@ index 0000000..64bab18
6166 + (ins R600_Reg32:$src0, i32imm:$src1, i32imm:$flags),
6167 + "", [], NullALU> {
6168 + let FlagOperandIdx = 3;
6169 ++ let isTerminator = 1;
6170 +}
6171 +
6172 +let isTerminator = 1, isBranch = 1, isBarrier = 1 in {
6173 @@ -16903,19 +17825,6 @@ index 0000000..64bab18
6174 +
6175 +} // End mayLoad = 0, mayStore = 0, hasSideEffects = 1
6176 +
6177 -+def R600_LOAD_CONST : AMDGPUShaderInst <
6178 -+ (outs R600_Reg32:$dst),
6179 -+ (ins i32imm:$src0),
6180 -+ "R600_LOAD_CONST $dst, $src0",
6181 -+ [(set R600_Reg32:$dst, (int_AMDGPU_load_const imm:$src0))]
6182 -+>;
6183 -+
6184 -+def RESERVE_REG : AMDGPUShaderInst <
6185 -+ (outs),
6186 -+ (ins i32imm:$src),
6187 -+ "RESERVE_REG $src",
6188 -+ [(int_AMDGPU_reserve_reg imm:$src)]
6189 -+>;
6190 +
6191 +def TXD: AMDGPUShaderInst <
6192 + (outs R600_Reg128:$dst),
6193 @@ -16946,22 +17855,148 @@ index 0000000..64bab18
6194 + "RETURN", [(IL_retflag)]>;
6195 +}
6196 +
6197 -+//===--------------------------------------------------------------------===//
6198 -+// Instructions support
6199 -+//===--------------------------------------------------------------------===//
6200 -+//===---------------------------------------------------------------------===//
6201 -+// Custom Inserter for Branches and returns, this eventually will be a
6202 -+// seperate pass
6203 -+//===---------------------------------------------------------------------===//
6204 -+let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1 in {
6205 -+ def BRANCH : ILFormat<(outs), (ins brtarget:$target),
6206 -+ "; Pseudo unconditional branch instruction",
6207 -+ [(br bb:$target)]>;
6208 -+ defm BRANCH_COND : BranchConditional<IL_brcond>;
6209 -+}
6210 +
6211 -+//===---------------------------------------------------------------------===//
6212 -+// Flow and Program control Instructions
6213 ++//===----------------------------------------------------------------------===//
6214 ++// Constant Buffer Addressing Support
6215 ++//===----------------------------------------------------------------------===//
6216 ++
6217 ++let isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU" in {
6218 ++def CONST_COPY : Instruction {
6219 ++ let OutOperandList = (outs R600_Reg32:$dst);
6220 ++ let InOperandList = (ins i32imm:$src);
6221 ++ let Pattern = [(set R600_Reg32:$dst, (CONST_ADDRESS ADDRGA_CONST_OFFSET:$src))];
6222 ++ let AsmString = "CONST_COPY";
6223 ++ let neverHasSideEffects = 1;
6224 ++ let isAsCheapAsAMove = 1;
6225 ++ let Itinerary = NullALU;
6226 ++}
6227 ++} // end isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU"
6228 ++
6229 ++def TEX_VTX_CONSTBUF :
6230 ++ InstR600ISA <(outs R600_Reg128:$dst), (ins MEMxi:$ptr), "VTX_READ_eg $dst, $ptr",
6231 ++ [(set R600_Reg128:$dst, (CONST_ADDRESS ADDRGA_VAR_OFFSET:$ptr))]>,
6232 ++ VTX_WORD1_GPR, VTX_WORD0 {
6233 ++
6234 ++ let VC_INST = 0;
6235 ++ let FETCH_TYPE = 2;
6236 ++ let FETCH_WHOLE_QUAD = 0;
6237 ++ let BUFFER_ID = 0;
6238 ++ let SRC_REL = 0;
6239 ++ let SRC_SEL_X = 0;
6240 ++ let DST_REL = 0;
6241 ++ let USE_CONST_FIELDS = 0;
6242 ++ let NUM_FORMAT_ALL = 2;
6243 ++ let FORMAT_COMP_ALL = 1;
6244 ++ let SRF_MODE_ALL = 1;
6245 ++ let MEGA_FETCH_COUNT = 16;
6246 ++ let DST_SEL_X = 0;
6247 ++ let DST_SEL_Y = 1;
6248 ++ let DST_SEL_Z = 2;
6249 ++ let DST_SEL_W = 3;
6250 ++ let DATA_FORMAT = 35;
6251 ++
6252 ++ let Inst{31-0} = Word0;
6253 ++ let Inst{63-32} = Word1;
6254 ++
6255 ++// LLVM can only encode 64-bit instructions, so these fields are manually
6256 ++// encoded in R600CodeEmitter
6257 ++//
6258 ++// bits<16> OFFSET;
6259 ++// bits<2> ENDIAN_SWAP = 0;
6260 ++// bits<1> CONST_BUF_NO_STRIDE = 0;
6261 ++// bits<1> MEGA_FETCH = 0;
6262 ++// bits<1> ALT_CONST = 0;
6263 ++// bits<2> BUFFER_INDEX_MODE = 0;
6264 ++
6265 ++
6266 ++
6267 ++// VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding
6268 ++// is done in R600CodeEmitter
6269 ++//
6270 ++// Inst{79-64} = OFFSET;
6271 ++// Inst{81-80} = ENDIAN_SWAP;
6272 ++// Inst{82} = CONST_BUF_NO_STRIDE;
6273 ++// Inst{83} = MEGA_FETCH;
6274 ++// Inst{84} = ALT_CONST;
6275 ++// Inst{86-85} = BUFFER_INDEX_MODE;
6276 ++// Inst{95-86} = 0; Reserved
6277 ++
6278 ++// VTX_WORD3 (Padding)
6279 ++//
6280 ++// Inst{127-96} = 0;
6281 ++}
6282 ++
6283 ++def TEX_VTX_TEXBUF:
6284 ++ InstR600ISA <(outs R600_Reg128:$dst), (ins MEMxi:$ptr, i32imm:$BUFFER_ID), "TEX_VTX_EXPLICIT_READ $dst, $ptr",
6285 ++ [(set R600_Reg128:$dst, (int_R600_load_texbuf ADDRGA_VAR_OFFSET:$ptr, imm:$BUFFER_ID))]>,
6286 ++VTX_WORD1_GPR, VTX_WORD0 {
6287 ++
6288 ++let VC_INST = 0;
6289 ++let FETCH_TYPE = 2;
6290 ++let FETCH_WHOLE_QUAD = 0;
6291 ++let SRC_REL = 0;
6292 ++let SRC_SEL_X = 0;
6293 ++let DST_REL = 0;
6294 ++let USE_CONST_FIELDS = 1;
6295 ++let NUM_FORMAT_ALL = 0;
6296 ++let FORMAT_COMP_ALL = 0;
6297 ++let SRF_MODE_ALL = 1;
6298 ++let MEGA_FETCH_COUNT = 16;
6299 ++let DST_SEL_X = 0;
6300 ++let DST_SEL_Y = 1;
6301 ++let DST_SEL_Z = 2;
6302 ++let DST_SEL_W = 3;
6303 ++let DATA_FORMAT = 0;
6304 ++
6305 ++let Inst{31-0} = Word0;
6306 ++let Inst{63-32} = Word1;
6307 ++
6308 ++// LLVM can only encode 64-bit instructions, so these fields are manually
6309 ++// encoded in R600CodeEmitter
6310 ++//
6311 ++// bits<16> OFFSET;
6312 ++// bits<2> ENDIAN_SWAP = 0;
6313 ++// bits<1> CONST_BUF_NO_STRIDE = 0;
6314 ++// bits<1> MEGA_FETCH = 0;
6315 ++// bits<1> ALT_CONST = 0;
6316 ++// bits<2> BUFFER_INDEX_MODE = 0;
6317 ++
6318 ++
6319 ++
6320 ++// VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding
6321 ++// is done in R600CodeEmitter
6322 ++//
6323 ++// Inst{79-64} = OFFSET;
6324 ++// Inst{81-80} = ENDIAN_SWAP;
6325 ++// Inst{82} = CONST_BUF_NO_STRIDE;
6326 ++// Inst{83} = MEGA_FETCH;
6327 ++// Inst{84} = ALT_CONST;
6328 ++// Inst{86-85} = BUFFER_INDEX_MODE;
6329 ++// Inst{95-86} = 0; Reserved
6330 ++
6331 ++// VTX_WORD3 (Padding)
6332 ++//
6333 ++// Inst{127-96} = 0;
6334 ++}
6335 ++
6336 ++
6337 ++
6338 ++//===--------------------------------------------------------------------===//
6339 ++// Instructions support
6340 ++//===--------------------------------------------------------------------===//
6341 ++//===---------------------------------------------------------------------===//
6342 ++// Custom Inserter for Branches and returns, this eventually will be a
6343 ++// seperate pass
6344 ++//===---------------------------------------------------------------------===//
6345 ++let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1 in {
6346 ++ def BRANCH : ILFormat<(outs), (ins brtarget:$target),
6347 ++ "; Pseudo unconditional branch instruction",
6348 ++ [(br bb:$target)]>;
6349 ++ defm BRANCH_COND : BranchConditional<IL_brcond>;
6350 ++}
6351 ++
6352 ++//===---------------------------------------------------------------------===//
6353 ++// Flow and Program control Instructions
6354 +//===---------------------------------------------------------------------===//
6355 +let isTerminator=1 in {
6356 + def SWITCH : ILFormat< (outs), (ins GPRI32:$src),
6357 @@ -17045,6 +18080,18 @@ index 0000000..64bab18
6358 + (SGE R600_Reg32:$src1, R600_Reg32:$src0)
6359 +>;
6360 +
6361 ++// SETGT_DX10 reverse args
6362 ++def : Pat <
6363 ++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, COND_LT),
6364 ++ (SETGT_DX10 R600_Reg32:$src1, R600_Reg32:$src0)
6365 ++>;
6366 ++
6367 ++// SETGE_DX10 reverse args
6368 ++def : Pat <
6369 ++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, COND_LE),
6370 ++ (SETGE_DX10 R600_Reg32:$src1, R600_Reg32:$src0)
6371 ++>;
6372 ++
6373 +// SETGT_INT reverse args
6374 +def : Pat <
6375 + (selectcc (i32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETLT),
6376 @@ -17083,31 +18130,43 @@ index 0000000..64bab18
6377 + (SETE R600_Reg32:$src0, R600_Reg32:$src1)
6378 +>;
6379 +
6380 ++//SETE_DX10 - 'true if ordered'
6381 ++def : Pat <
6382 ++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETO),
6383 ++ (SETE_DX10 R600_Reg32:$src0, R600_Reg32:$src1)
6384 ++>;
6385 ++
6386 +//SNE - 'true if unordered'
6387 +def : Pat <
6388 + (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, FP_ONE, FP_ZERO, SETUO),
6389 + (SNE R600_Reg32:$src0, R600_Reg32:$src1)
6390 +>;
6391 +
6392 -+def : Extract_Element <f32, v4f32, R600_Reg128, 0, sel_x>;
6393 -+def : Extract_Element <f32, v4f32, R600_Reg128, 1, sel_y>;
6394 -+def : Extract_Element <f32, v4f32, R600_Reg128, 2, sel_z>;
6395 -+def : Extract_Element <f32, v4f32, R600_Reg128, 3, sel_w>;
6396 ++//SETNE_DX10 - 'true if ordered'
6397 ++def : Pat <
6398 ++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETUO),
6399 ++ (SETNE_DX10 R600_Reg32:$src0, R600_Reg32:$src1)
6400 ++>;
6401 +
6402 -+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 0, sel_x>;
6403 -+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 1, sel_y>;
6404 -+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 2, sel_z>;
6405 -+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 3, sel_w>;
6406 ++def : Extract_Element <f32, v4f32, R600_Reg128, 0, sub0>;
6407 ++def : Extract_Element <f32, v4f32, R600_Reg128, 1, sub1>;
6408 ++def : Extract_Element <f32, v4f32, R600_Reg128, 2, sub2>;
6409 ++def : Extract_Element <f32, v4f32, R600_Reg128, 3, sub3>;
6410 +
6411 -+def : Extract_Element <i32, v4i32, R600_Reg128, 0, sel_x>;
6412 -+def : Extract_Element <i32, v4i32, R600_Reg128, 1, sel_y>;
6413 -+def : Extract_Element <i32, v4i32, R600_Reg128, 2, sel_z>;
6414 -+def : Extract_Element <i32, v4i32, R600_Reg128, 3, sel_w>;
6415 ++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 0, sub0>;
6416 ++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 1, sub1>;
6417 ++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 2, sub2>;
6418 ++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 3, sub3>;
6419 +
6420 -+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 0, sel_x>;
6421 -+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 1, sel_y>;
6422 -+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 2, sel_z>;
6423 -+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 3, sel_w>;
6424 ++def : Extract_Element <i32, v4i32, R600_Reg128, 0, sub0>;
6425 ++def : Extract_Element <i32, v4i32, R600_Reg128, 1, sub1>;
6426 ++def : Extract_Element <i32, v4i32, R600_Reg128, 2, sub2>;
6427 ++def : Extract_Element <i32, v4i32, R600_Reg128, 3, sub3>;
6428 ++
6429 ++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 0, sub0>;
6430 ++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 1, sub1>;
6431 ++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 2, sub2>;
6432 ++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 3, sub3>;
6433 +
6434 +def : Vector_Build <v4f32, R600_Reg128, f32, R600_Reg32>;
6435 +def : Vector_Build <v4i32, R600_Reg128, i32, R600_Reg32>;
6436 @@ -17125,10 +18184,10 @@ index 0000000..64bab18
6437 +} // End isR600toCayman Predicate
6438 diff --git a/lib/Target/R600/R600Intrinsics.td b/lib/Target/R600/R600Intrinsics.td
6439 new file mode 100644
6440 -index 0000000..3825bc4
6441 +index 0000000..6046f0d
6442 --- /dev/null
6443 +++ b/lib/Target/R600/R600Intrinsics.td
6444 -@@ -0,0 +1,32 @@
6445 +@@ -0,0 +1,57 @@
6446 +//===-- R600Intrinsics.td - R600 Instrinsic defs -------*- tablegen -*-----===//
6447 +//
6448 +// The LLVM Compiler Infrastructure
6449 @@ -17143,30 +18202,283 @@ index 0000000..3825bc4
6450 +//===----------------------------------------------------------------------===//
6451 +
6452 +let TargetPrefix = "R600", isTarget = 1 in {
6453 -+ def int_R600_load_input : Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem]>;
6454 -+ def int_R600_load_input_perspective :
6455 -+ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>;
6456 -+ def int_R600_load_input_constant :
6457 -+ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>;
6458 -+ def int_R600_load_input_linear :
6459 -+ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>;
6460 ++ def int_R600_load_input :
6461 ++ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem]>;
6462 ++ def int_R600_interp_input :
6463 ++ Intrinsic<[llvm_float_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>;
6464 ++ def int_R600_load_texbuf :
6465 ++ Intrinsic<[llvm_v4f32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>;
6466 ++ def int_R600_store_swizzle :
6467 ++ Intrinsic<[], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty], []>;
6468 ++
6469 + def int_R600_store_stream_output :
6470 -+ Intrinsic<[], [llvm_float_ty, llvm_i32_ty, llvm_i32_ty], []>;
6471 ++ Intrinsic<[], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], []>;
6472 + def int_R600_store_pixel_color :
6473 + Intrinsic<[], [llvm_float_ty, llvm_i32_ty], []>;
6474 + def int_R600_store_pixel_depth :
6475 + Intrinsic<[], [llvm_float_ty], []>;
6476 + def int_R600_store_pixel_stencil :
6477 + Intrinsic<[], [llvm_float_ty], []>;
6478 -+ def int_R600_store_pixel_dummy :
6479 -+ Intrinsic<[], [], []>;
6480 ++ def int_R600_store_dummy :
6481 ++ Intrinsic<[], [llvm_i32_ty], []>;
6482 ++}
6483 ++let TargetPrefix = "r600", isTarget = 1 in {
6484 ++
6485 ++class R600ReadPreloadRegisterIntrinsic<string name>
6486 ++ : Intrinsic<[llvm_i32_ty], [], [IntrNoMem]>,
6487 ++ GCCBuiltin<name>;
6488 ++
6489 ++multiclass R600ReadPreloadRegisterIntrinsic_xyz<string prefix> {
6490 ++ def _x : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_x")>;
6491 ++ def _y : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_y")>;
6492 ++ def _z : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_z")>;
6493 ++}
6494 ++
6495 ++defm int_r600_read_global_size : R600ReadPreloadRegisterIntrinsic_xyz <
6496 ++ "__builtin_r600_read_global_size">;
6497 ++defm int_r600_read_local_size : R600ReadPreloadRegisterIntrinsic_xyz <
6498 ++ "__builtin_r600_read_local_size">;
6499 ++defm int_r600_read_ngroups : R600ReadPreloadRegisterIntrinsic_xyz <
6500 ++ "__builtin_r600_read_ngroups">;
6501 ++defm int_r600_read_tgid : R600ReadPreloadRegisterIntrinsic_xyz <
6502 ++ "__builtin_r600_read_tgid">;
6503 ++defm int_r600_read_tidig : R600ReadPreloadRegisterIntrinsic_xyz <
6504 ++ "__builtin_r600_read_tidig">;
6505 ++}
6506 +diff --git a/lib/Target/R600/R600LowerConstCopy.cpp b/lib/Target/R600/R600LowerConstCopy.cpp
6507 +new file mode 100644
6508 +index 0000000..c8c27a8
6509 +--- /dev/null
6510 ++++ b/lib/Target/R600/R600LowerConstCopy.cpp
6511 +@@ -0,0 +1,222 @@
6512 ++//===-- R600LowerConstCopy.cpp - Propagate ConstCopy / lower them to MOV---===//
6513 ++//
6514 ++// The LLVM Compiler Infrastructure
6515 ++//
6516 ++// This file is distributed under the University of Illinois Open Source
6517 ++// License. See LICENSE.TXT for details.
6518 ++//
6519 ++//===----------------------------------------------------------------------===//
6520 ++//
6521 ++/// \file
6522 ++/// This pass is intended to handle remaining ConstCopy pseudo MachineInstr.
6523 ++/// ISel will fold each Const Buffer read inside scalar ALU. However it cannot
6524 ++/// fold them inside vector instruction, like DOT4 or Cube ; ISel emits
6525 ++/// ConstCopy instead. This pass (executed after ExpandingSpecialInstr) will try
6526 ++/// to fold them if possible or replace them by MOV otherwise.
6527 ++//
6528 ++//===----------------------------------------------------------------------===//
6529 ++
6530 ++#include "AMDGPU.h"
6531 ++#include "llvm/CodeGen/MachineFunction.h"
6532 ++#include "llvm/CodeGen/MachineFunctionPass.h"
6533 ++#include "R600InstrInfo.h"
6534 ++#include "llvm/GlobalValue.h"
6535 ++#include "llvm/CodeGen/MachineInstrBuilder.h"
6536 ++
6537 ++namespace llvm {
6538 ++
6539 ++class R600LowerConstCopy : public MachineFunctionPass {
6540 ++private:
6541 ++ static char ID;
6542 ++ const R600InstrInfo *TII;
6543 ++
6544 ++ struct ConstPairs {
6545 ++ unsigned XYPair;
6546 ++ unsigned ZWPair;
6547 ++ };
6548 ++
6549 ++ bool canFoldInBundle(ConstPairs &UsedConst, unsigned ReadConst) const;
6550 ++public:
6551 ++ R600LowerConstCopy(TargetMachine &tm);
6552 ++ virtual bool runOnMachineFunction(MachineFunction &MF);
6553 ++
6554 ++ const char *getPassName() const { return "R600 Eliminate Symbolic Operand"; }
6555 ++};
6556 ++
6557 ++char R600LowerConstCopy::ID = 0;
6558 ++
6559 ++R600LowerConstCopy::R600LowerConstCopy(TargetMachine &tm) :
6560 ++ MachineFunctionPass(ID),
6561 ++ TII (static_cast<const R600InstrInfo *>(tm.getInstrInfo()))
6562 ++{
6563 ++}
6564 ++
6565 ++bool R600LowerConstCopy::canFoldInBundle(ConstPairs &UsedConst,
6566 ++ unsigned ReadConst) const {
6567 ++ unsigned ReadConstChan = ReadConst & 3;
6568 ++ unsigned ReadConstIndex = ReadConst & (~3);
6569 ++ if (ReadConstChan < 2) {
6570 ++ if (!UsedConst.XYPair) {
6571 ++ UsedConst.XYPair = ReadConstIndex;
6572 ++ }
6573 ++ return UsedConst.XYPair == ReadConstIndex;
6574 ++ } else {
6575 ++ if (!UsedConst.ZWPair) {
6576 ++ UsedConst.ZWPair = ReadConstIndex;
6577 ++ }
6578 ++ return UsedConst.ZWPair == ReadConstIndex;
6579 ++ }
6580 ++}
6581 ++
6582 ++static bool isControlFlow(const MachineInstr &MI) {
6583 ++ return (MI.getOpcode() == AMDGPU::IF_PREDICATE_SET) ||
6584 ++ (MI.getOpcode() == AMDGPU::ENDIF) ||
6585 ++ (MI.getOpcode() == AMDGPU::ELSE) ||
6586 ++ (MI.getOpcode() == AMDGPU::WHILELOOP) ||
6587 ++ (MI.getOpcode() == AMDGPU::BREAK);
6588 ++}
6589 ++
6590 ++bool R600LowerConstCopy::runOnMachineFunction(MachineFunction &MF) {
6591 ++
6592 ++ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
6593 ++ BB != BB_E; ++BB) {
6594 ++ MachineBasicBlock &MBB = *BB;
6595 ++ DenseMap<unsigned, MachineInstr *> RegToConstIndex;
6596 ++ for (MachineBasicBlock::instr_iterator I = MBB.instr_begin(),
6597 ++ E = MBB.instr_end(); I != E;) {
6598 ++
6599 ++ if (I->getOpcode() == AMDGPU::CONST_COPY) {
6600 ++ MachineInstr &MI = *I;
6601 ++ I = llvm::next(I);
6602 ++ unsigned DstReg = MI.getOperand(0).getReg();
6603 ++ DenseMap<unsigned, MachineInstr *>::iterator SrcMI =
6604 ++ RegToConstIndex.find(DstReg);
6605 ++ if (SrcMI != RegToConstIndex.end()) {
6606 ++ SrcMI->second->eraseFromParent();
6607 ++ RegToConstIndex.erase(SrcMI);
6608 ++ }
6609 ++ MachineInstr *NewMI =
6610 ++ TII->buildDefaultInstruction(MBB, &MI, AMDGPU::MOV,
6611 ++ MI.getOperand(0).getReg(), AMDGPU::ALU_CONST);
6612 ++ TII->setImmOperand(NewMI, R600Operands::SRC0_SEL,
6613 ++ MI.getOperand(1).getImm());
6614 ++ RegToConstIndex[DstReg] = NewMI;
6615 ++ MI.eraseFromParent();
6616 ++ continue;
6617 ++ }
6618 ++
6619 ++ std::vector<unsigned> Defs;
6620 ++ // We consider all Instructions as bundled because algorithm that handle
6621 ++ // const read port limitations inside an IG is still valid with single
6622 ++ // instructions.
6623 ++ std::vector<MachineInstr *> Bundle;
6624 ++
6625 ++ if (I->isBundle()) {
6626 ++ unsigned BundleSize = I->getBundleSize();
6627 ++ for (unsigned i = 0; i < BundleSize; i++) {
6628 ++ I = llvm::next(I);
6629 ++ Bundle.push_back(I);
6630 ++ }
6631 ++ } else if (TII->isALUInstr(I->getOpcode())){
6632 ++ Bundle.push_back(I);
6633 ++ } else if (isControlFlow(*I)) {
6634 ++ RegToConstIndex.clear();
6635 ++ I = llvm::next(I);
6636 ++ continue;
6637 ++ } else {
6638 ++ MachineInstr &MI = *I;
6639 ++ for (MachineInstr::mop_iterator MOp = MI.operands_begin(),
6640 ++ MOpE = MI.operands_end(); MOp != MOpE; ++MOp) {
6641 ++ MachineOperand &MO = *MOp;
6642 ++ if (!MO.isReg())
6643 ++ continue;
6644 ++ if (MO.isDef()) {
6645 ++ Defs.push_back(MO.getReg());
6646 ++ } else {
6647 ++ // Either a TEX or an Export inst, prevent from erasing def of used
6648 ++ // operand
6649 ++ RegToConstIndex.erase(MO.getReg());
6650 ++ for (MCSubRegIterator SR(MO.getReg(), &TII->getRegisterInfo());
6651 ++ SR.isValid(); ++SR) {
6652 ++ RegToConstIndex.erase(*SR);
6653 ++ }
6654 ++ }
6655 ++ }
6656 ++ }
6657 ++
6658 ++
6659 ++ R600Operands::Ops OpTable[3][2] = {
6660 ++ {R600Operands::SRC0, R600Operands::SRC0_SEL},
6661 ++ {R600Operands::SRC1, R600Operands::SRC1_SEL},
6662 ++ {R600Operands::SRC2, R600Operands::SRC2_SEL},
6663 ++ };
6664 ++
6665 ++ for(std::vector<MachineInstr *>::iterator It = Bundle.begin(),
6666 ++ ItE = Bundle.end(); It != ItE; ++It) {
6667 ++ MachineInstr *MI = *It;
6668 ++ if (TII->isPredicated(MI)) {
6669 ++ // We don't want to erase previous assignment
6670 ++ RegToConstIndex.erase(MI->getOperand(0).getReg());
6671 ++ } else {
6672 ++ int WriteIDX = TII->getOperandIdx(MI->getOpcode(), R600Operands::WRITE);
6673 ++ if (WriteIDX < 0 || MI->getOperand(WriteIDX).getImm())
6674 ++ Defs.push_back(MI->getOperand(0).getReg());
6675 ++ }
6676 ++ }
6677 ++
6678 ++ ConstPairs CP = {0,0};
6679 ++ for (unsigned SrcOp = 0; SrcOp < 3; SrcOp++) {
6680 ++ for(std::vector<MachineInstr *>::iterator It = Bundle.begin(),
6681 ++ ItE = Bundle.end(); It != ItE; ++It) {
6682 ++ MachineInstr *MI = *It;
6683 ++ int SrcIdx = TII->getOperandIdx(MI->getOpcode(), OpTable[SrcOp][0]);
6684 ++ if (SrcIdx < 0)
6685 ++ continue;
6686 ++ MachineOperand &MO = MI->getOperand(SrcIdx);
6687 ++ DenseMap<unsigned, MachineInstr *>::iterator SrcMI =
6688 ++ RegToConstIndex.find(MO.getReg());
6689 ++ if (SrcMI != RegToConstIndex.end()) {
6690 ++ MachineInstr *CstMov = SrcMI->second;
6691 ++ int ConstMovSel =
6692 ++ TII->getOperandIdx(CstMov->getOpcode(), R600Operands::SRC0_SEL);
6693 ++ unsigned ConstIndex = CstMov->getOperand(ConstMovSel).getImm();
6694 ++ if (MI->isInsideBundle() && canFoldInBundle(CP, ConstIndex)) {
6695 ++ TII->setImmOperand(MI, OpTable[SrcOp][1], ConstIndex);
6696 ++ MI->getOperand(SrcIdx).setReg(AMDGPU::ALU_CONST);
6697 ++ } else {
6698 ++ RegToConstIndex.erase(SrcMI);
6699 ++ }
6700 ++ }
6701 ++ }
6702 ++ }
6703 ++
6704 ++ for (std::vector<unsigned>::iterator It = Defs.begin(), ItE = Defs.end();
6705 ++ It != ItE; ++It) {
6706 ++ DenseMap<unsigned, MachineInstr *>::iterator SrcMI =
6707 ++ RegToConstIndex.find(*It);
6708 ++ if (SrcMI != RegToConstIndex.end()) {
6709 ++ SrcMI->second->eraseFromParent();
6710 ++ RegToConstIndex.erase(SrcMI);
6711 ++ }
6712 ++ }
6713 ++ I = llvm::next(I);
6714 ++ }
6715 ++
6716 ++ if (MBB.succ_empty()) {
6717 ++ for (DenseMap<unsigned, MachineInstr *>::iterator
6718 ++ DI = RegToConstIndex.begin(), DE = RegToConstIndex.end();
6719 ++ DI != DE; ++DI) {
6720 ++ DI->second->eraseFromParent();
6721 ++ }
6722 ++ }
6723 ++ }
6724 ++ return false;
6725 ++}
6726 ++
6727 ++FunctionPass *createR600LowerConstCopy(TargetMachine &tm) {
6728 ++ return new R600LowerConstCopy(tm);
6729 ++}
6730 ++
6731 +}
6732 ++
6733 ++
6734 diff --git a/lib/Target/R600/R600MachineFunctionInfo.cpp b/lib/Target/R600/R600MachineFunctionInfo.cpp
6735 new file mode 100644
6736 -index 0000000..4eb5efa
6737 +index 0000000..40aec83
6738 --- /dev/null
6739 +++ b/lib/Target/R600/R600MachineFunctionInfo.cpp
6740 -@@ -0,0 +1,34 @@
6741 +@@ -0,0 +1,18 @@
6742 +//===-- R600MachineFunctionInfo.cpp - R600 Machine Function Info-*- C++ -*-===//
6743 +//
6744 +// The LLVM Compiler Infrastructure
6745 @@ -17182,31 +18494,15 @@ index 0000000..4eb5efa
6746 +using namespace llvm;
6747 +
6748 +R600MachineFunctionInfo::R600MachineFunctionInfo(const MachineFunction &MF)
6749 -+ : MachineFunctionInfo(),
6750 -+ HasLinearInterpolation(false),
6751 -+ HasPerspectiveInterpolation(false) {
6752 ++ : MachineFunctionInfo() {
6753 + memset(Outputs, 0, sizeof(Outputs));
6754 -+ memset(StreamOutputs, 0, sizeof(StreamOutputs));
6755 + }
6756 -+
6757 -+unsigned R600MachineFunctionInfo::GetIJPerspectiveIndex() const {
6758 -+ assert(HasPerspectiveInterpolation);
6759 -+ return 0;
6760 -+}
6761 -+
6762 -+unsigned R600MachineFunctionInfo::GetIJLinearIndex() const {
6763 -+ assert(HasLinearInterpolation);
6764 -+ if (HasPerspectiveInterpolation)
6765 -+ return 1;
6766 -+ else
6767 -+ return 0;
6768 -+}
6769 diff --git a/lib/Target/R600/R600MachineFunctionInfo.h b/lib/Target/R600/R600MachineFunctionInfo.h
6770 new file mode 100644
6771 -index 0000000..e97fb5b
6772 +index 0000000..41e4894
6773 --- /dev/null
6774 +++ b/lib/Target/R600/R600MachineFunctionInfo.h
6775 -@@ -0,0 +1,39 @@
6776 +@@ -0,0 +1,33 @@
6777 +//===-- R600MachineFunctionInfo.h - R600 Machine Function Info ----*- C++ -*-=//
6778 +//
6779 +// The LLVM Compiler Infrastructure
6780 @@ -17222,6 +18518,7 @@ index 0000000..e97fb5b
6781 +#ifndef R600MACHINEFUNCTIONINFO_H
6782 +#define R600MACHINEFUNCTIONINFO_H
6783 +
6784 ++#include "llvm/ADT/BitVector.h"
6785 +#include "llvm/CodeGen/MachineFunction.h"
6786 +#include "llvm/CodeGen/SelectionDAG.h"
6787 +#include <vector>
6788 @@ -17232,15 +18529,8 @@ index 0000000..e97fb5b
6789 +
6790 +public:
6791 + R600MachineFunctionInfo(const MachineFunction &MF);
6792 -+ std::vector<unsigned> ReservedRegs;
6793 ++ std::vector<unsigned> IndirectRegs;
6794 + SDNode *Outputs[16];
6795 -+ SDNode *StreamOutputs[64][4];
6796 -+ bool HasLinearInterpolation;
6797 -+ bool HasPerspectiveInterpolation;
6798 -+
6799 -+ unsigned GetIJLinearIndex() const;
6800 -+ unsigned GetIJPerspectiveIndex() const;
6801 -+
6802 +};
6803 +
6804 +} // End llvm namespace
6805 @@ -17248,10 +18538,10 @@ index 0000000..e97fb5b
6806 +#endif //R600MACHINEFUNCTIONINFO_H
6807 diff --git a/lib/Target/R600/R600RegisterInfo.cpp b/lib/Target/R600/R600RegisterInfo.cpp
6808 new file mode 100644
6809 -index 0000000..a39f83d
6810 +index 0000000..bbd7995
6811 --- /dev/null
6812 +++ b/lib/Target/R600/R600RegisterInfo.cpp
6813 -@@ -0,0 +1,89 @@
6814 +@@ -0,0 +1,99 @@
6815 +//===-- R600RegisterInfo.cpp - R600 Register Information ------------------===//
6816 +//
6817 +// The LLVM Compiler Infrastructure
6818 @@ -17269,6 +18559,7 @@ index 0000000..a39f83d
6819 +#include "R600RegisterInfo.h"
6820 +#include "AMDGPUTargetMachine.h"
6821 +#include "R600Defines.h"
6822 ++#include "R600InstrInfo.h"
6823 +#include "R600MachineFunctionInfo.h"
6824 +
6825 +using namespace llvm;
6826 @@ -17282,7 +18573,6 @@ index 0000000..a39f83d
6827 +
6828 +BitVector R600RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
6829 + BitVector Reserved(getNumRegs());
6830 -+ const R600MachineFunctionInfo * MFI = MF.getInfo<R600MachineFunctionInfo>();
6831 +
6832 + Reserved.set(AMDGPU::ZERO);
6833 + Reserved.set(AMDGPU::HALF);
6834 @@ -17292,21 +18582,30 @@ index 0000000..a39f83d
6835 + Reserved.set(AMDGPU::NEG_ONE);
6836 + Reserved.set(AMDGPU::PV_X);
6837 + Reserved.set(AMDGPU::ALU_LITERAL_X);
6838 ++ Reserved.set(AMDGPU::ALU_CONST);
6839 + Reserved.set(AMDGPU::PREDICATE_BIT);
6840 + Reserved.set(AMDGPU::PRED_SEL_OFF);
6841 + Reserved.set(AMDGPU::PRED_SEL_ZERO);
6842 + Reserved.set(AMDGPU::PRED_SEL_ONE);
6843 +
6844 -+ for (TargetRegisterClass::iterator I = AMDGPU::R600_CReg32RegClass.begin(),
6845 -+ E = AMDGPU::R600_CReg32RegClass.end(); I != E; ++I) {
6846 ++ for (TargetRegisterClass::iterator I = AMDGPU::R600_AddrRegClass.begin(),
6847 ++ E = AMDGPU::R600_AddrRegClass.end(); I != E; ++I) {
6848 + Reserved.set(*I);
6849 + }
6850 +
6851 -+ for (std::vector<unsigned>::const_iterator I = MFI->ReservedRegs.begin(),
6852 -+ E = MFI->ReservedRegs.end(); I != E; ++I) {
6853 ++ for (TargetRegisterClass::iterator I = AMDGPU::TRegMemRegClass.begin(),
6854 ++ E = AMDGPU::TRegMemRegClass.end();
6855 ++ I != E; ++I) {
6856 + Reserved.set(*I);
6857 + }
6858 +
6859 ++ const R600InstrInfo *RII = static_cast<const R600InstrInfo*>(&TII);
6860 ++ std::vector<unsigned> IndirectRegs = RII->getIndirectReservedRegs(MF);
6861 ++ for (std::vector<unsigned>::iterator I = IndirectRegs.begin(),
6862 ++ E = IndirectRegs.end();
6863 ++ I != E; ++I) {
6864 ++ Reserved.set(*I);
6865 ++ }
6866 + return Reserved;
6867 +}
6868 +
6869 @@ -17335,12 +18634,13 @@ index 0000000..a39f83d
6870 +unsigned R600RegisterInfo::getSubRegFromChannel(unsigned Channel) const {
6871 + switch (Channel) {
6872 + default: assert(!"Invalid channel index"); return 0;
6873 -+ case 0: return AMDGPU::sel_x;
6874 -+ case 1: return AMDGPU::sel_y;
6875 -+ case 2: return AMDGPU::sel_z;
6876 -+ case 3: return AMDGPU::sel_w;
6877 ++ case 0: return AMDGPU::sub0;
6878 ++ case 1: return AMDGPU::sub1;
6879 ++ case 2: return AMDGPU::sub2;
6880 ++ case 3: return AMDGPU::sub3;
6881 + }
6882 +}
6883 ++
6884 diff --git a/lib/Target/R600/R600RegisterInfo.h b/lib/Target/R600/R600RegisterInfo.h
6885 new file mode 100644
6886 index 0000000..c170ccb
6887 @@ -17404,10 +18704,10 @@ index 0000000..c170ccb
6888 +#endif // AMDIDSAREGISTERINFO_H_
6889 diff --git a/lib/Target/R600/R600RegisterInfo.td b/lib/Target/R600/R600RegisterInfo.td
6890 new file mode 100644
6891 -index 0000000..d3d6d25
6892 +index 0000000..a7d847a
6893 --- /dev/null
6894 +++ b/lib/Target/R600/R600RegisterInfo.td
6895 -@@ -0,0 +1,107 @@
6896 +@@ -0,0 +1,146 @@
6897 +
6898 +class R600Reg <string name, bits<16> encoding> : Register<name> {
6899 + let Namespace = "AMDGPU";
6900 @@ -17429,7 +18729,7 @@ index 0000000..d3d6d25
6901 +class R600Reg_128<string n, list<Register> subregs, bits<16> encoding> :
6902 + RegisterWithSubRegs<n, subregs> {
6903 + let Namespace = "AMDGPU";
6904 -+ let SubRegIndices = [sel_x, sel_y, sel_z, sel_w];
6905 ++ let SubRegIndices = [sub0, sub1, sub2, sub3];
6906 + let HWEncoding = encoding;
6907 +}
6908 +
6909 @@ -17438,9 +18738,11 @@ index 0000000..d3d6d25
6910 + // 32-bit Temporary Registers
6911 + def T#Index#_#Chan : R600RegWithChan <"T"#Index#"."#Chan, Index, Chan>;
6912 +
6913 -+ // 32-bit Constant Registers (There are more than 128, this the number
6914 -+ // that is currently supported.
6915 -+ def C#Index#_#Chan : R600RegWithChan <"C"#Index#"."#Chan, Index, Chan>;
6916 ++ // Indirect addressing offset registers
6917 ++ def Addr#Index#_#Chan : R600RegWithChan <"T("#Index#" + AR.x)."#Chan,
6918 ++ Index, Chan>;
6919 ++ def TRegMem#Index#_#Chan : R600RegWithChan <"T"#Index#"."#Chan, Index,
6920 ++ Chan>;
6921 + }
6922 + // 128-bit Temporary Registers
6923 + def T#Index#_XYZW : R600Reg_128 <"T"#Index#".XYZW",
6924 @@ -17471,19 +18773,25 @@ index 0000000..d3d6d25
6925 +def PRED_SEL_OFF: R600Reg<"Pred_sel_off", 0>;
6926 +def PRED_SEL_ZERO : R600Reg<"Pred_sel_zero", 2>;
6927 +def PRED_SEL_ONE : R600Reg<"Pred_sel_one", 3>;
6928 ++def AR_X : R600Reg<"AR.x", 0>;
6929 +
6930 +def R600_ArrayBase : RegisterClass <"AMDGPU", [f32, i32], 32,
6931 + (add (sequence "ArrayBase%u", 448, 464))>;
6932 ++// special registers for ALU src operands
6933 ++// const buffer reference, SRCx_SEL contains index
6934 ++def ALU_CONST : R600Reg<"CBuf", 0>;
6935 ++// interpolation param reference, SRCx_SEL contains index
6936 ++def ALU_PARAM : R600Reg<"Param", 0>;
6937 ++
6938 ++let isAllocatable = 0 in {
6939 ++
6940 ++// XXX: Only use the X channel, until we support wider stack widths
6941 ++def R600_Addr : RegisterClass <"AMDGPU", [i32], 127, (add (sequence "Addr%u_X", 0, 127))>;
6942 +
6943 -+def R600_CReg32 : RegisterClass <"AMDGPU", [f32, i32], 32,
6944 -+ (add (interleave
6945 -+ (interleave (sequence "C%u_X", 0, 127),
6946 -+ (sequence "C%u_Z", 0, 127)),
6947 -+ (interleave (sequence "C%u_Y", 0, 127),
6948 -+ (sequence "C%u_W", 0, 127))))>;
6949 ++} // End isAllocatable = 0
6950 +
6951 +def R600_TReg32_X : RegisterClass <"AMDGPU", [f32, i32], 32,
6952 -+ (add (sequence "T%u_X", 0, 127))>;
6953 ++ (add (sequence "T%u_X", 0, 127), AR_X)>;
6954 +
6955 +def R600_TReg32_Y : RegisterClass <"AMDGPU", [f32, i32], 32,
6956 + (add (sequence "T%u_Y", 0, 127))>;
6957 @@ -17495,15 +18803,16 @@ index 0000000..d3d6d25
6958 + (add (sequence "T%u_W", 0, 127))>;
6959 +
6960 +def R600_TReg32 : RegisterClass <"AMDGPU", [f32, i32], 32,
6961 -+ (add (interleave
6962 -+ (interleave R600_TReg32_X, R600_TReg32_Z),
6963 -+ (interleave R600_TReg32_Y, R600_TReg32_W)))>;
6964 ++ (interleave R600_TReg32_X, R600_TReg32_Y,
6965 ++ R600_TReg32_Z, R600_TReg32_W)>;
6966 +
6967 +def R600_Reg32 : RegisterClass <"AMDGPU", [f32, i32], 32, (add
6968 + R600_TReg32,
6969 -+ R600_CReg32,
6970 + R600_ArrayBase,
6971 -+ ZERO, HALF, ONE, ONE_INT, PV_X, ALU_LITERAL_X, NEG_ONE, NEG_HALF)>;
6972 ++ R600_Addr,
6973 ++ ZERO, HALF, ONE, ONE_INT, PV_X, ALU_LITERAL_X, NEG_ONE, NEG_HALF,
6974 ++ ALU_CONST, ALU_PARAM
6975 ++ )>;
6976 +
6977 +def R600_Predicate : RegisterClass <"AMDGPU", [i32], 32, (add
6978 + PRED_SEL_OFF, PRED_SEL_ZERO, PRED_SEL_ONE)>;
6979 @@ -17515,6 +18824,36 @@ index 0000000..d3d6d25
6980 + (add (sequence "T%u_XYZW", 0, 127))> {
6981 + let CopyCost = -1;
6982 +}
6983 ++
6984 ++//===----------------------------------------------------------------------===//
6985 ++// Register classes for indirect addressing
6986 ++//===----------------------------------------------------------------------===//
6987 ++
6988 ++// Super register for all the Indirect Registers. This register class is used
6989 ++// by the REG_SEQUENCE instruction to specify the registers to use for direct
6990 ++// reads / writes which may be written / read by an indirect address.
6991 ++class IndirectSuper<string n, list<Register> subregs> :
6992 ++ RegisterWithSubRegs<n, subregs> {
6993 ++ let Namespace = "AMDGPU";
6994 ++ let SubRegIndices =
6995 ++ [sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7,
6996 ++ sub8, sub9, sub10, sub11, sub12, sub13, sub14, sub15];
6997 ++}
6998 ++
6999 ++def IndirectSuperReg : IndirectSuper<"Indirect",
7000 ++ [TRegMem0_X, TRegMem1_X, TRegMem2_X, TRegMem3_X, TRegMem4_X, TRegMem5_X,
7001 ++ TRegMem6_X, TRegMem7_X, TRegMem8_X, TRegMem9_X, TRegMem10_X, TRegMem11_X,
7002 ++ TRegMem12_X, TRegMem13_X, TRegMem14_X, TRegMem15_X]
7003 ++>;
7004 ++
7005 ++def IndirectReg : RegisterClass<"AMDGPU", [f32, i32], 32, (add IndirectSuperReg)>;
7006 ++
7007 ++// This register class defines the registers that are the storage units for
7008 ++// the "Indirect Addressing" pseudo memory space.
7009 ++// XXX: Only use the X channel, until we support wider stack widths
7010 ++def TRegMem : RegisterClass<"AMDGPU", [f32, i32], 32,
7011 ++ (add (sequence "TRegMem%u_X", 0, 16))
7012 ++>;
7013 diff --git a/lib/Target/R600/R600Schedule.td b/lib/Target/R600/R600Schedule.td
7014 new file mode 100644
7015 index 0000000..7ede181
7016 @@ -18053,10 +19392,10 @@ index 0000000..832e44d
7017 +}
7018 diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
7019 new file mode 100644
7020 -index 0000000..cd6e0e9
7021 +index 0000000..694c045
7022 --- /dev/null
7023 +++ b/lib/Target/R600/SIISelLowering.cpp
7024 -@@ -0,0 +1,512 @@
7025 +@@ -0,0 +1,399 @@
7026 +//===-- SIISelLowering.cpp - SI DAG Lowering Implementation ---------------===//
7027 +//
7028 +// The LLVM Compiler Infrastructure
7029 @@ -18090,16 +19429,16 @@ index 0000000..cd6e0e9
7030 + addRegisterClass(MVT::f32, &AMDGPU::VReg_32RegClass);
7031 + addRegisterClass(MVT::i32, &AMDGPU::VReg_32RegClass);
7032 + addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);
7033 -+ addRegisterClass(MVT::i1, &AMDGPU::SCCRegRegClass);
7034 -+ addRegisterClass(MVT::i1, &AMDGPU::VCCRegRegClass);
7035 ++ addRegisterClass(MVT::i1, &AMDGPU::SReg_64RegClass);
7036 +
7037 -+ addRegisterClass(MVT::v4i32, &AMDGPU::SReg_128RegClass);
7038 -+ addRegisterClass(MVT::v8i32, &AMDGPU::SReg_256RegClass);
7039 ++ addRegisterClass(MVT::v1i32, &AMDGPU::VReg_32RegClass);
7040 ++ addRegisterClass(MVT::v2i32, &AMDGPU::VReg_64RegClass);
7041 ++ addRegisterClass(MVT::v4i32, &AMDGPU::VReg_128RegClass);
7042 ++ addRegisterClass(MVT::v8i32, &AMDGPU::VReg_256RegClass);
7043 ++ addRegisterClass(MVT::v16i32, &AMDGPU::VReg_512RegClass);
7044 +
7045 + computeRegisterProperties();
7046 +
7047 -+ setOperationAction(ISD::AND, MVT::i1, Custom);
7048 -+
7049 + setOperationAction(ISD::ADD, MVT::i64, Legal);
7050 + setOperationAction(ISD::ADD, MVT::i32, Legal);
7051 +
7052 @@ -18125,23 +19464,16 @@ index 0000000..cd6e0e9
7053 + MachineRegisterInfo & MRI = BB->getParent()->getRegInfo();
7054 + MachineBasicBlock::iterator I = MI;
7055 +
7056 -+ if (TII->get(MI->getOpcode()).TSFlags & SIInstrFlags::NEED_WAIT) {
7057 -+ AppendS_WAITCNT(MI, *BB, llvm::next(I));
7058 -+ return BB;
7059 -+ }
7060 -+
7061 + switch (MI->getOpcode()) {
7062 + default:
7063 + return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);
7064 + case AMDGPU::BRANCH: return BB;
7065 + case AMDGPU::CLAMP_SI:
7066 -+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64))
7067 ++ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
7068 + .addOperand(MI->getOperand(0))
7069 + .addOperand(MI->getOperand(1))
7070 -+ // VSRC1-2 are unused, but we still need to fill all the
7071 -+ // operand slots, so we just reuse the VSRC0 operand
7072 -+ .addOperand(MI->getOperand(1))
7073 -+ .addOperand(MI->getOperand(1))
7074 ++ .addImm(0x80) // SRC1
7075 ++ .addImm(0x80) // SRC2
7076 + .addImm(0) // ABS
7077 + .addImm(1) // CLAMP
7078 + .addImm(0) // OMOD
7079 @@ -18150,13 +19482,11 @@ index 0000000..cd6e0e9
7080 + break;
7081 +
7082 + case AMDGPU::FABS_SI:
7083 -+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64))
7084 ++ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
7085 + .addOperand(MI->getOperand(0))
7086 + .addOperand(MI->getOperand(1))
7087 -+ // VSRC1-2 are unused, but we still need to fill all the
7088 -+ // operand slots, so we just reuse the VSRC0 operand
7089 -+ .addOperand(MI->getOperand(1))
7090 -+ .addOperand(MI->getOperand(1))
7091 ++ .addImm(0x80) // SRC1
7092 ++ .addImm(0x80) // SRC2
7093 + .addImm(1) // ABS
7094 + .addImm(0) // CLAMP
7095 + .addImm(0) // OMOD
7096 @@ -18165,13 +19495,11 @@ index 0000000..cd6e0e9
7097 + break;
7098 +
7099 + case AMDGPU::FNEG_SI:
7100 -+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64))
7101 ++ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
7102 + .addOperand(MI->getOperand(0))
7103 + .addOperand(MI->getOperand(1))
7104 -+ // VSRC1-2 are unused, but we still need to fill all the
7105 -+ // operand slots, so we just reuse the VSRC0 operand
7106 -+ .addOperand(MI->getOperand(1))
7107 -+ .addOperand(MI->getOperand(1))
7108 ++ .addImm(0x80) // SRC1
7109 ++ .addImm(0x80) // SRC2
7110 + .addImm(0) // ABS
7111 + .addImm(0) // CLAMP
7112 + .addImm(0) // OMOD
7113 @@ -18187,29 +19515,13 @@ index 0000000..cd6e0e9
7114 + case AMDGPU::SI_INTERP:
7115 + LowerSI_INTERP(MI, *BB, I, MRI);
7116 + break;
7117 -+ case AMDGPU::SI_INTERP_CONST:
7118 -+ LowerSI_INTERP_CONST(MI, *BB, I, MRI);
7119 -+ break;
7120 -+ case AMDGPU::SI_KIL:
7121 -+ LowerSI_KIL(MI, *BB, I, MRI);
7122 -+ break;
7123 + case AMDGPU::SI_WQM:
7124 + LowerSI_WQM(MI, *BB, I, MRI);
7125 + break;
7126 -+ case AMDGPU::SI_V_CNDLT:
7127 -+ LowerSI_V_CNDLT(MI, *BB, I, MRI);
7128 -+ break;
7129 + }
7130 + return BB;
7131 +}
7132 +
7133 -+void SITargetLowering::AppendS_WAITCNT(MachineInstr *MI, MachineBasicBlock &BB,
7134 -+ MachineBasicBlock::iterator I) const {
7135 -+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_WAITCNT))
7136 -+ .addImm(0);
7137 -+}
7138 -+
7139 -+
7140 +void SITargetLowering::LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB,
7141 + MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const {
7142 + BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_WQM_B64), AMDGPU::EXEC)
7143 @@ -18249,57 +19561,6 @@ index 0000000..cd6e0e9
7144 + MI->eraseFromParent();
7145 +}
7146 +
7147 -+void SITargetLowering::LowerSI_INTERP_CONST(MachineInstr *MI,
7148 -+ MachineBasicBlock &BB, MachineBasicBlock::iterator I,
7149 -+ MachineRegisterInfo &MRI) const {
7150 -+ MachineOperand dst = MI->getOperand(0);
7151 -+ MachineOperand attr_chan = MI->getOperand(1);
7152 -+ MachineOperand attr = MI->getOperand(2);
7153 -+ MachineOperand params = MI->getOperand(3);
7154 -+ unsigned M0 = MRI.createVirtualRegister(&AMDGPU::M0RegRegClass);
7155 -+
7156 -+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_MOV_B32), M0)
7157 -+ .addOperand(params);
7158 -+
7159 -+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_INTERP_MOV_F32))
7160 -+ .addOperand(dst)
7161 -+ .addOperand(attr_chan)
7162 -+ .addOperand(attr)
7163 -+ .addReg(M0);
7164 -+
7165 -+ MI->eraseFromParent();
7166 -+}
7167 -+
7168 -+void SITargetLowering::LowerSI_KIL(MachineInstr *MI, MachineBasicBlock &BB,
7169 -+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const {
7170 -+ // Clear this pixel from the exec mask if the operand is negative
7171 -+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CMPX_LE_F32_e32),
7172 -+ AMDGPU::VCC)
7173 -+ .addReg(AMDGPU::SREG_LIT_0)
7174 -+ .addOperand(MI->getOperand(0));
7175 -+
7176 -+ MI->eraseFromParent();
7177 -+}
7178 -+
7179 -+void SITargetLowering::LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB,
7180 -+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const {
7181 -+ unsigned VCC = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);
7182 -+
7183 -+ BuildMI(BB, I, BB.findDebugLoc(I),
7184 -+ TII->get(AMDGPU::V_CMP_GT_F32_e32),
7185 -+ VCC)
7186 -+ .addReg(AMDGPU::SREG_LIT_0)
7187 -+ .addOperand(MI->getOperand(1));
7188 -+
7189 -+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CNDMASK_B32_e32))
7190 -+ .addOperand(MI->getOperand(0))
7191 -+ .addOperand(MI->getOperand(3))
7192 -+ .addOperand(MI->getOperand(2))
7193 -+ .addReg(VCC);
7194 -+
7195 -+ MI->eraseFromParent();
7196 -+}
7197 -+
7198 +EVT SITargetLowering::getSetCCResultType(EVT VT) const {
7199 + return MVT::i1;
7200 +}
7201 @@ -18314,7 +19575,6 @@ index 0000000..cd6e0e9
7202 + case ISD::BRCOND: return LowerBRCOND(Op, DAG);
7203 + case ISD::LOAD: return LowerLOAD(Op, DAG);
7204 + case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG);
7205 -+ case ISD::AND: return Loweri1ContextSwitch(Op, DAG, ISD::AND);
7206 + case ISD::INTRINSIC_WO_CHAIN: {
7207 + unsigned IntrinsicID =
7208 + cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
7209 @@ -18331,30 +19591,6 @@ index 0000000..cd6e0e9
7210 + return SDValue();
7211 +}
7212 +
7213 -+/// \brief The function is for lowering i1 operations on the
7214 -+/// VCC register.
7215 -+///
7216 -+/// In the VALU context, VCC is a one bit register, but in the
7217 -+/// SALU context the VCC is a 64-bit register (1-bit per thread). Since only
7218 -+/// the SALU can perform operations on the VCC register, we need to promote
7219 -+/// the operand types from i1 to i64 in order for tablegen to be able to match
7220 -+/// this operation to the correct SALU instruction. We do this promotion by
7221 -+/// wrapping the operands in a CopyToReg node.
7222 -+///
7223 -+SDValue SITargetLowering::Loweri1ContextSwitch(SDValue Op,
7224 -+ SelectionDAG &DAG,
7225 -+ unsigned VCCNode) const {
7226 -+ DebugLoc DL = Op.getDebugLoc();
7227 -+
7228 -+ SDValue OpNode = DAG.getNode(VCCNode, DL, MVT::i64,
7229 -+ DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i64,
7230 -+ Op.getOperand(0)),
7231 -+ DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i64,
7232 -+ Op.getOperand(1)));
7233 -+
7234 -+ return DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i1, OpNode);
7235 -+}
7236 -+
7237 +/// \brief Helper function for LowerBRCOND
7238 +static SDNode *findUser(SDValue Value, unsigned Opcode) {
7239 +
7240 @@ -18559,22 +19795,12 @@ index 0000000..cd6e0e9
7241 + }
7242 + return SDValue();
7243 +}
7244 -+
7245 -+#define NODE_NAME_CASE(node) case SIISD::node: return #node;
7246 -+
7247 -+const char* SITargetLowering::getTargetNodeName(unsigned Opcode) const {
7248 -+ switch (Opcode) {
7249 -+ default: return AMDGPUTargetLowering::getTargetNodeName(Opcode);
7250 -+ NODE_NAME_CASE(VCC_AND)
7251 -+ NODE_NAME_CASE(VCC_BITCAST)
7252 -+ }
7253 -+}
7254 diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h
7255 new file mode 100644
7256 -index 0000000..c088112
7257 +index 0000000..5d048f8
7258 --- /dev/null
7259 +++ b/lib/Target/R600/SIISelLowering.h
7260 -@@ -0,0 +1,62 @@
7261 +@@ -0,0 +1,48 @@
7262 +//===-- SIISelLowering.h - SI DAG Lowering Interface ------------*- C++ -*-===//
7263 +//
7264 +// The LLVM Compiler Infrastructure
7265 @@ -18600,26 +19826,13 @@ index 0000000..c088112
7266 +class SITargetLowering : public AMDGPUTargetLowering {
7267 + const SIInstrInfo * TII;
7268 +
7269 -+ /// Memory reads and writes are syncronized using the S_WAITCNT instruction.
7270 -+ /// This function takes the most conservative approach and inserts an
7271 -+ /// S_WAITCNT instruction after every read and write.
7272 -+ void AppendS_WAITCNT(MachineInstr *MI, MachineBasicBlock &BB,
7273 -+ MachineBasicBlock::iterator I) const;
7274 + void LowerMOV_IMM(MachineInstr *MI, MachineBasicBlock &BB,
7275 + MachineBasicBlock::iterator I, unsigned Opocde) const;
7276 + void LowerSI_INTERP(MachineInstr *MI, MachineBasicBlock &BB,
7277 + MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
7278 -+ void LowerSI_INTERP_CONST(MachineInstr *MI, MachineBasicBlock &BB,
7279 -+ MachineBasicBlock::iterator I, MachineRegisterInfo &MRI) const;
7280 -+ void LowerSI_KIL(MachineInstr *MI, MachineBasicBlock &BB,
7281 -+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
7282 + void LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB,
7283 + MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
7284 -+ void LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB,
7285 -+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
7286 +
7287 -+ SDValue Loweri1ContextSwitch(SDValue Op, SelectionDAG &DAG,
7288 -+ unsigned VCCNode) const;
7289 + SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
7290 + SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
7291 + SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
7292 @@ -18631,18 +19844,376 @@ index 0000000..c088112
7293 + virtual EVT getSetCCResultType(EVT VT) const;
7294 + virtual SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const;
7295 + virtual SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const;
7296 -+ virtual const char* getTargetNodeName(unsigned Opcode) const;
7297 +};
7298 +
7299 +} // End namespace llvm
7300 +
7301 +#endif //SIISELLOWERING_H
7302 +diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp
7303 +new file mode 100644
7304 +index 0000000..24fc929
7305 +--- /dev/null
7306 ++++ b/lib/Target/R600/SIInsertWaits.cpp
7307 +@@ -0,0 +1,353 @@
7308 ++//===-- SILowerControlFlow.cpp - Use predicates for control flow ----------===//
7309 ++//
7310 ++// The LLVM Compiler Infrastructure
7311 ++//
7312 ++// This file is distributed under the University of Illinois Open Source
7313 ++// License. See LICENSE.TXT for details.
7314 ++//
7315 ++//===----------------------------------------------------------------------===//
7316 ++//
7317 ++/// \file
7318 ++/// \brief Insert wait instructions for memory reads and writes.
7319 ++///
7320 ++/// Memory reads and writes are issued asynchronously, so we need to insert
7321 ++/// S_WAITCNT instructions when we want to access any of their results or
7322 ++/// overwrite any register that's used asynchronously.
7323 ++//
7324 ++//===----------------------------------------------------------------------===//
7325 ++
7326 ++#include "AMDGPU.h"
7327 ++#include "SIInstrInfo.h"
7328 ++#include "SIMachineFunctionInfo.h"
7329 ++#include "llvm/CodeGen/MachineFunction.h"
7330 ++#include "llvm/CodeGen/MachineFunctionPass.h"
7331 ++#include "llvm/CodeGen/MachineInstrBuilder.h"
7332 ++#include "llvm/CodeGen/MachineRegisterInfo.h"
7333 ++
7334 ++using namespace llvm;
7335 ++
7336 ++namespace {
7337 ++
7338 ++/// \brief One variable for each of the hardware counters
7339 ++typedef union {
7340 ++ struct {
7341 ++ unsigned VM;
7342 ++ unsigned EXP;
7343 ++ unsigned LGKM;
7344 ++ } Named;
7345 ++ unsigned Array[3];
7346 ++
7347 ++} Counters;
7348 ++
7349 ++typedef Counters RegCounters[512];
7350 ++typedef std::pair<unsigned, unsigned> RegInterval;
7351 ++
7352 ++class SIInsertWaits : public MachineFunctionPass {
7353 ++
7354 ++private:
7355 ++ static char ID;
7356 ++ const SIInstrInfo *TII;
7357 ++ const SIRegisterInfo &TRI;
7358 ++ const MachineRegisterInfo *MRI;
7359 ++
7360 ++ /// \brief Constant hardware limits
7361 ++ static const Counters WaitCounts;
7362 ++
7363 ++ /// \brief Constant zero value
7364 ++ static const Counters ZeroCounts;
7365 ++
7366 ++ /// \brief Counter values we have already waited on.
7367 ++ Counters WaitedOn;
7368 ++
7369 ++ /// \brief Counter values for last instruction issued.
7370 ++ Counters LastIssued;
7371 ++
7372 ++ /// \brief Registers used by async instructions.
7373 ++ RegCounters UsedRegs;
7374 ++
7375 ++ /// \brief Registers defined by async instructions.
7376 ++ RegCounters DefinedRegs;
7377 ++
7378 ++ /// \brief Different export instruction types seen since last wait.
7379 ++ unsigned ExpInstrTypesSeen;
7380 ++
7381 ++ /// \brief Get increment/decrement amount for this instruction.
7382 ++ Counters getHwCounts(MachineInstr &MI);
7383 ++
7384 ++ /// \brief Is operand relevant for async execution?
7385 ++ bool isOpRelevant(MachineOperand &Op);
7386 ++
7387 ++ /// \brief Get register interval an operand affects.
7388 ++ RegInterval getRegInterval(MachineOperand &Op);
7389 ++
7390 ++ /// \brief Handle instructions async components
7391 ++ void pushInstruction(MachineInstr &MI);
7392 ++
7393 ++ /// \brief Insert the actual wait instruction
7394 ++ bool insertWait(MachineBasicBlock &MBB,
7395 ++ MachineBasicBlock::iterator I,
7396 ++ const Counters &Counts);
7397 ++
7398 ++ /// \brief Resolve all operand dependencies to counter requirements
7399 ++ Counters handleOperands(MachineInstr &MI);
7400 ++
7401 ++public:
7402 ++ SIInsertWaits(TargetMachine &tm) :
7403 ++ MachineFunctionPass(ID),
7404 ++ TII(static_cast<const SIInstrInfo*>(tm.getInstrInfo())),
7405 ++ TRI(TII->getRegisterInfo()) { }
7406 ++
7407 ++ virtual bool runOnMachineFunction(MachineFunction &MF);
7408 ++
7409 ++ const char *getPassName() const {
7410 ++ return "SI insert wait instructions";
7411 ++ }
7412 ++
7413 ++};
7414 ++
7415 ++} // End anonymous namespace
7416 ++
7417 ++char SIInsertWaits::ID = 0;
7418 ++
7419 ++const Counters SIInsertWaits::WaitCounts = { { 15, 7, 7 } };
7420 ++const Counters SIInsertWaits::ZeroCounts = { { 0, 0, 0 } };
7421 ++
7422 ++FunctionPass *llvm::createSIInsertWaits(TargetMachine &tm) {
7423 ++ return new SIInsertWaits(tm);
7424 ++}
7425 ++
7426 ++Counters SIInsertWaits::getHwCounts(MachineInstr &MI) {
7427 ++
7428 ++ uint64_t TSFlags = TII->get(MI.getOpcode()).TSFlags;
7429 ++ Counters Result;
7430 ++
7431 ++ Result.Named.VM = !!(TSFlags & SIInstrFlags::VM_CNT);
7432 ++
7433 ++ // Only consider stores or EXP for EXP_CNT
7434 ++ Result.Named.EXP = !!(TSFlags & SIInstrFlags::EXP_CNT &&
7435 ++ (MI.getOpcode() == AMDGPU::EXP || !MI.getDesc().mayStore()));
7436 ++
7437 ++ // LGKM may uses larger values
7438 ++ if (TSFlags & SIInstrFlags::LGKM_CNT) {
7439 ++
7440 ++ MachineOperand &Op = MI.getOperand(0);
7441 ++ assert(Op.isReg() && "First LGKM operand must be a register!");
7442 ++
7443 ++ unsigned Reg = Op.getReg();
7444 ++ unsigned Size = TRI.getMinimalPhysRegClass(Reg)->getSize();
7445 ++ Result.Named.LGKM = Size > 4 ? 2 : 1;
7446 ++
7447 ++ } else {
7448 ++ Result.Named.LGKM = 0;
7449 ++ }
7450 ++
7451 ++ return Result;
7452 ++}
7453 ++
7454 ++bool SIInsertWaits::isOpRelevant(MachineOperand &Op) {
7455 ++
7456 ++ // Constants are always irrelevant
7457 ++ if (!Op.isReg())
7458 ++ return false;
7459 ++
7460 ++ // Defines are always relevant
7461 ++ if (Op.isDef())
7462 ++ return true;
7463 ++
7464 ++ // For exports all registers are relevant
7465 ++ MachineInstr &MI = *Op.getParent();
7466 ++ if (MI.getOpcode() == AMDGPU::EXP)
7467 ++ return true;
7468 ++
7469 ++ // For stores the stored value is also relevant
7470 ++ if (!MI.getDesc().mayStore())
7471 ++ return false;
7472 ++
7473 ++ for (MachineInstr::mop_iterator I = MI.operands_begin(),
7474 ++ E = MI.operands_end(); I != E; ++I) {
7475 ++
7476 ++ if (I->isReg() && I->isUse())
7477 ++ return Op.isIdenticalTo(*I);
7478 ++ }
7479 ++
7480 ++ return false;
7481 ++}
7482 ++
7483 ++RegInterval SIInsertWaits::getRegInterval(MachineOperand &Op) {
7484 ++
7485 ++ if (!Op.isReg())
7486 ++ return std::make_pair(0, 0);
7487 ++
7488 ++ unsigned Reg = Op.getReg();
7489 ++ unsigned Size = TRI.getMinimalPhysRegClass(Reg)->getSize();
7490 ++
7491 ++ assert(Size >= 4);
7492 ++
7493 ++ RegInterval Result;
7494 ++ Result.first = TRI.getEncodingValue(Reg);
7495 ++ Result.second = Result.first + Size / 4;
7496 ++
7497 ++ return Result;
7498 ++}
7499 ++
7500 ++void SIInsertWaits::pushInstruction(MachineInstr &MI) {
7501 ++
7502 ++ // Get the hardware counter increments and sum them up
7503 ++ Counters Increment = getHwCounts(MI);
7504 ++ unsigned Sum = 0;
7505 ++
7506 ++ for (unsigned i = 0; i < 3; ++i) {
7507 ++ LastIssued.Array[i] += Increment.Array[i];
7508 ++ Sum += Increment.Array[i];
7509 ++ }
7510 ++
7511 ++ // If we don't increase anything then that's it
7512 ++ if (Sum == 0)
7513 ++ return;
7514 ++
7515 ++ // Remember which export instructions we have seen
7516 ++ if (Increment.Named.EXP) {
7517 ++ ExpInstrTypesSeen |= MI.getOpcode() == AMDGPU::EXP ? 1 : 2;
7518 ++ }
7519 ++
7520 ++ for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
7521 ++
7522 ++ MachineOperand &Op = MI.getOperand(i);
7523 ++ if (!isOpRelevant(Op))
7524 ++ continue;
7525 ++
7526 ++ RegInterval Interval = getRegInterval(Op);
7527 ++ for (unsigned j = Interval.first; j < Interval.second; ++j) {
7528 ++
7529 ++ // Remember which registers we define
7530 ++ if (Op.isDef())
7531 ++ DefinedRegs[j] = LastIssued;
7532 ++
7533 ++ // and which one we are using
7534 ++ if (Op.isUse())
7535 ++ UsedRegs[j] = LastIssued;
7536 ++ }
7537 ++ }
7538 ++}
7539 ++
7540 ++bool SIInsertWaits::insertWait(MachineBasicBlock &MBB,
7541 ++ MachineBasicBlock::iterator I,
7542 ++ const Counters &Required) {
7543 ++
7544 ++ // End of program? No need to wait on anything
7545 ++ if (I != MBB.end() && I->getOpcode() == AMDGPU::S_ENDPGM)
7546 ++ return false;
7547 ++
7548 ++ // Figure out if the async instructions execute in order
7549 ++ bool Ordered[3];
7550 ++
7551 ++ // VM_CNT is always ordered
7552 ++ Ordered[0] = true;
7553 ++
7554 ++ // EXP_CNT is unordered if we have both EXP & VM-writes
7555 ++ Ordered[1] = ExpInstrTypesSeen == 3;
7556 ++
7557 ++ // LGKM_CNT is handled as always unordered. TODO: Handle LDS and GDS
7558 ++ Ordered[2] = false;
7559 ++
7560 ++ // The values we are going to put into the S_WAITCNT instruction
7561 ++ Counters Counts = WaitCounts;
7562 ++
7563 ++ // Do we really need to wait?
7564 ++ bool NeedWait = false;
7565 ++
7566 ++ for (unsigned i = 0; i < 3; ++i) {
7567 ++
7568 ++ if (Required.Array[i] <= WaitedOn.Array[i])
7569 ++ continue;
7570 ++
7571 ++ NeedWait = true;
7572 ++
7573 ++ if (Ordered[i]) {
7574 ++ unsigned Value = LastIssued.Array[i] - Required.Array[i];
7575 ++
7576 ++ // adjust the value to the real hardware posibilities
7577 ++ Counts.Array[i] = std::min(Value, WaitCounts.Array[i]);
7578 ++
7579 ++ } else
7580 ++ Counts.Array[i] = 0;
7581 ++
7582 ++ // Remember on what we have waited on
7583 ++ WaitedOn.Array[i] = LastIssued.Array[i] - Counts.Array[i];
7584 ++ }
7585 ++
7586 ++ if (!NeedWait)
7587 ++ return false;
7588 ++
7589 ++ // Reset EXP_CNT instruction types
7590 ++ if (Counts.Named.EXP == 0)
7591 ++ ExpInstrTypesSeen = 0;
7592 ++
7593 ++ // Build the wait instruction
7594 ++ BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
7595 ++ .addImm((Counts.Named.VM & 0xF) |
7596 ++ ((Counts.Named.EXP & 0x7) << 4) |
7597 ++ ((Counts.Named.LGKM & 0x7) << 8));
7598 ++
7599 ++ return true;
7600 ++}
7601 ++
7602 ++/// \brief helper function for handleOperands
7603 ++static void increaseCounters(Counters &Dst, const Counters &Src) {
7604 ++
7605 ++ for (unsigned i = 0; i < 3; ++i)
7606 ++ Dst.Array[i] = std::max(Dst.Array[i], Src.Array[i]);
7607 ++}
7608 ++
7609 ++Counters SIInsertWaits::handleOperands(MachineInstr &MI) {
7610 ++
7611 ++ Counters Result = ZeroCounts;
7612 ++
7613 ++ // For each register affected by this
7614 ++ // instruction increase the result sequence
7615 ++ for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
7616 ++
7617 ++ MachineOperand &Op = MI.getOperand(i);
7618 ++ RegInterval Interval = getRegInterval(Op);
7619 ++ for (unsigned j = Interval.first; j < Interval.second; ++j) {
7620 ++
7621 ++ if (Op.isDef())
7622 ++ increaseCounters(Result, UsedRegs[j]);
7623 ++
7624 ++ if (Op.isUse())
7625 ++ increaseCounters(Result, DefinedRegs[j]);
7626 ++ }
7627 ++ }
7628 ++
7629 ++ return Result;
7630 ++}
7631 ++
7632 ++bool SIInsertWaits::runOnMachineFunction(MachineFunction &MF) {
7633 ++
7634 ++ bool Changes = false;
7635 ++
7636 ++ MRI = &MF.getRegInfo();
7637 ++
7638 ++ WaitedOn = ZeroCounts;
7639 ++ LastIssued = ZeroCounts;
7640 ++
7641 ++ memset(&UsedRegs, 0, sizeof(UsedRegs));
7642 ++ memset(&DefinedRegs, 0, sizeof(DefinedRegs));
7643 ++
7644 ++ for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
7645 ++ BI != BE; ++BI) {
7646 ++
7647 ++ MachineBasicBlock &MBB = *BI;
7648 ++ for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
7649 ++ I != E; ++I) {
7650 ++
7651 ++ Changes |= insertWait(MBB, I, handleOperands(*I));
7652 ++ pushInstruction(*I);
7653 ++ }
7654 ++
7655 ++ // Wait for everything at the end of the MBB
7656 ++ Changes |= insertWait(MBB, MBB.getFirstTerminator(), LastIssued);
7657 ++ }
7658 ++
7659 ++ return Changes;
7660 ++}
7661 diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td
7662 new file mode 100644
7663 -index 0000000..aea3b5a
7664 +index 0000000..40e37aa
7665 --- /dev/null
7666 +++ b/lib/Target/R600/SIInstrFormats.td
7667 -@@ -0,0 +1,146 @@
7668 +@@ -0,0 +1,188 @@
7669 +//===-- SIInstrFormats.td - SI Instruction Formats ------------------------===//
7670 +//
7671 +// The LLVM Compiler Infrastructure
7672 @@ -18666,40 +20237,23 @@ index 0000000..aea3b5a
7673 +//
7674 +//===----------------------------------------------------------------------===//
7675 +
7676 -+class VOP3b_2IN <bits<9> op, string opName, RegisterClass dstClass,
7677 -+ RegisterClass src0Class, RegisterClass src1Class,
7678 -+ list<dag> pattern>
7679 -+ : VOP3b <op, (outs dstClass:$vdst),
7680 -+ (ins src0Class:$src0, src1Class:$src1, InstFlag:$src2, InstFlag:$sdst,
7681 -+ InstFlag:$omod, InstFlag:$neg),
7682 -+ opName, pattern
7683 -+>;
7684 -+
7685 -+
7686 -+class VOP3_1_32 <bits<9> op, string opName, list<dag> pattern>
7687 -+ : VOP3b_2IN <op, opName, SReg_1, AllReg_32, VReg_32, pattern>;
7688 -+
7689 +class VOP3_32 <bits<9> op, string opName, list<dag> pattern>
7690 -+ : VOP3 <op, (outs VReg_32:$dst), (ins AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
7691 ++ : VOP3 <op, (outs VReg_32:$dst), (ins VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
7692 +
7693 +class VOP3_64 <bits<9> op, string opName, list<dag> pattern>
7694 -+ : VOP3 <op, (outs VReg_64:$dst), (ins AllReg_64:$src0, VReg_64:$src1, VReg_64:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
7695 -+
7696 ++ : VOP3 <op, (outs VReg_64:$dst), (ins VSrc_64:$src0, VReg_64:$src1, VReg_64:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
7697 +
7698 +class SOP1_32 <bits<8> op, string opName, list<dag> pattern>
7699 -+ : SOP1 <op, (outs SReg_32:$dst), (ins SReg_32:$src0), opName, pattern>;
7700 ++ : SOP1 <op, (outs SReg_32:$dst), (ins SSrc_32:$src0), opName, pattern>;
7701 +
7702 +class SOP1_64 <bits<8> op, string opName, list<dag> pattern>
7703 -+ : SOP1 <op, (outs SReg_64:$dst), (ins SReg_64:$src0), opName, pattern>;
7704 ++ : SOP1 <op, (outs SReg_64:$dst), (ins SSrc_64:$src0), opName, pattern>;
7705 +
7706 +class SOP2_32 <bits<7> op, string opName, list<dag> pattern>
7707 -+ : SOP2 <op, (outs SReg_32:$dst), (ins SReg_32:$src0, SReg_32:$src1), opName, pattern>;
7708 ++ : SOP2 <op, (outs SReg_32:$dst), (ins SSrc_32:$src0, SSrc_32:$src1), opName, pattern>;
7709 +
7710 +class SOP2_64 <bits<7> op, string opName, list<dag> pattern>
7711 -+ : SOP2 <op, (outs SReg_64:$dst), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>;
7712 -+
7713 -+class SOP2_VCC <bits<7> op, string opName, list<dag> pattern>
7714 -+ : SOP2 <op, (outs SReg_1:$vcc), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>;
7715 ++ : SOP2 <op, (outs SReg_64:$dst), (ins SSrc_64:$src0, SSrc_64:$src1), opName, pattern>;
7716 +
7717 +class VOP1_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc,
7718 + string opName, list<dag> pattern> :
7719 @@ -18708,7 +20262,7 @@ index 0000000..aea3b5a
7720 + >;
7721 +
7722 +multiclass VOP1_32 <bits<8> op, string opName, list<dag> pattern> {
7723 -+ def _e32: VOP1_Helper <op, VReg_32, AllReg_32, opName, pattern>;
7724 ++ def _e32: VOP1_Helper <op, VReg_32, VSrc_32, opName, pattern>;
7725 + def _e64 : VOP3_32 <{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
7726 + opName, []
7727 + >;
7728 @@ -18716,7 +20270,7 @@ index 0000000..aea3b5a
7729 +
7730 +multiclass VOP1_64 <bits<8> op, string opName, list<dag> pattern> {
7731 +
7732 -+ def _e32 : VOP1_Helper <op, VReg_64, AllReg_64, opName, pattern>;
7733 ++ def _e32 : VOP1_Helper <op, VReg_64, VSrc_64, opName, pattern>;
7734 +
7735 + def _e64 : VOP3_64 <
7736 + {1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
7737 @@ -18732,7 +20286,7 @@ index 0000000..aea3b5a
7738 +
7739 +multiclass VOP2_32 <bits<6> op, string opName, list<dag> pattern> {
7740 +
7741 -+ def _e32 : VOP2_Helper <op, VReg_32, AllReg_32, opName, pattern>;
7742 ++ def _e32 : VOP2_Helper <op, VReg_32, VSrc_32, opName, pattern>;
7743 +
7744 + def _e64 : VOP3_32 <{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
7745 + opName, []
7746 @@ -18740,7 +20294,7 @@ index 0000000..aea3b5a
7747 +}
7748 +
7749 +multiclass VOP2_64 <bits<6> op, string opName, list<dag> pattern> {
7750 -+ def _e32: VOP2_Helper <op, VReg_64, AllReg_64, opName, pattern>;
7751 ++ def _e32: VOP2_Helper <op, VReg_64, VSrc_64, opName, pattern>;
7752 +
7753 + def _e64 : VOP3_64 <
7754 + {1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
7755 @@ -18754,47 +20308,106 @@ index 0000000..aea3b5a
7756 +class SOPK_64 <bits<5> op, string opName, list<dag> pattern>
7757 + : SOPK <op, (outs SReg_64:$dst), (ins i16imm:$src0), opName, pattern>;
7758 +
7759 -+class VOPC_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc,
7760 -+ string opName, list<dag> pattern> :
7761 -+ VOPC <
7762 -+ op, (ins arc:$src0, vrc:$src1), opName, pattern
7763 -+ >;
7764 ++multiclass VOPC_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc,
7765 ++ string opName, list<dag> pattern> {
7766 +
7767 -+multiclass VOPC_32 <bits<9> op, string opName, list<dag> pattern> {
7768 ++ def _e32 : VOPC <op, (ins arc:$src0, vrc:$src1), opName, pattern>;
7769 ++ def _e64 : VOP3 <
7770 ++ {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
7771 ++ (outs SReg_64:$dst),
7772 ++ (ins arc:$src0, vrc:$src1,
7773 ++ InstFlag:$abs, InstFlag:$clamp,
7774 ++ InstFlag:$omod, InstFlag:$neg),
7775 ++ opName, pattern
7776 ++ > {
7777 ++ let SRC2 = 0x80;
7778 ++ }
7779 ++}
7780 +
7781 -+ def _e32 : VOPC_Helper <
7782 -+ {op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
7783 -+ VReg_32, AllReg_32, opName, pattern
7784 -+ >;
7785 ++multiclass VOPC_32 <bits<8> op, string opName, list<dag> pattern>
7786 ++ : VOPC_Helper <op, VReg_32, VSrc_32, opName, pattern>;
7787 +
7788 -+ def _e64 : VOP3_1_32 <
7789 -+ op,
7790 -+ opName, pattern
7791 -+ >;
7792 ++multiclass VOPC_64 <bits<8> op, string opName, list<dag> pattern>
7793 ++ : VOPC_Helper <op, VReg_64, VSrc_64, opName, pattern>;
7794 ++
7795 ++class SOPC_32 <bits<7> op, string opName, list<dag> pattern>
7796 ++ : SOPC <op, (outs SCCReg:$dst), (ins SSrc_32:$src0, SSrc_32:$src1), opName, pattern>;
7797 ++
7798 ++class SOPC_64 <bits<7> op, string opName, list<dag> pattern>
7799 ++ : SOPC <op, (outs SCCReg:$dst), (ins SSrc_64:$src0, SSrc_64:$src1), opName, pattern>;
7800 ++
7801 ++class MIMG_Load_Helper <bits<7> op, string asm> : MIMG <
7802 ++ op,
7803 ++ (outs VReg_128:$vdata),
7804 ++ (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
7805 ++ i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr,
7806 ++ GPR4Align<SReg_256>:$srsrc, GPR4Align<SReg_128>:$ssamp),
7807 ++ asm,
7808 ++ []> {
7809 ++ let mayLoad = 1;
7810 ++ let mayStore = 0;
7811 +}
7812 +
7813 -+multiclass VOPC_64 <bits<8> op, string opName, list<dag> pattern> {
7814 ++class MTBUF_Store_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
7815 ++ op,
7816 ++ (outs),
7817 ++ (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc,
7818 ++ i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr,
7819 ++ GPR4Align<SReg_128>:$srsrc, i1imm:$slc, i1imm:$tfe, SSrc_32:$soffset),
7820 ++ asm,
7821 ++ []> {
7822 ++ let mayStore = 1;
7823 ++ let mayLoad = 0;
7824 ++}
7825 +
7826 -+ def _e32 : VOPC_Helper <op, VReg_64, AllReg_64, opName, pattern>;
7827 ++class MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass> : MUBUF <
7828 ++ op,
7829 ++ (outs regClass:$dst),
7830 ++ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
7831 ++ i1imm:$lds, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc, i1imm:$slc,
7832 ++ i1imm:$tfe, SSrc_32:$soffset),
7833 ++ asm,
7834 ++ []> {
7835 ++ let mayLoad = 1;
7836 ++ let mayStore = 0;
7837 ++}
7838 +
7839 -+ def _e64 : VOP3_64 <
7840 -+ {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
7841 -+ opName, []
7842 -+ >;
7843 ++class MTBUF_Load_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
7844 ++ op,
7845 ++ (outs regClass:$dst),
7846 ++ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
7847 ++ i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc,
7848 ++ i1imm:$slc, i1imm:$tfe, SSrc_32:$soffset),
7849 ++ asm,
7850 ++ []> {
7851 ++ let mayLoad = 1;
7852 ++ let mayStore = 0;
7853 +}
7854 +
7855 -+class SOPC_32 <bits<7> op, string opName, list<dag> pattern>
7856 -+ : SOPC <op, (outs SCCReg:$dst), (ins SReg_32:$src0, SReg_32:$src1), opName, pattern>;
7857 ++multiclass SMRD_Helper <bits<5> op, string asm, RegisterClass dstClass> {
7858 ++ def _IMM : SMRD <
7859 ++ op, 1,
7860 ++ (outs dstClass:$dst),
7861 ++ (ins GPR2Align<SReg_64>:$sbase, i32imm:$offset),
7862 ++ asm,
7863 ++ []
7864 ++ >;
7865 +
7866 -+class SOPC_64 <bits<7> op, string opName, list<dag> pattern>
7867 -+ : SOPC <op, (outs SCCReg:$dst), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>;
7868 ++ def _SGPR : SMRD <
7869 ++ op, 0,
7870 ++ (outs dstClass:$dst),
7871 ++ (ins GPR2Align<SReg_64>:$sbase, SReg_32:$soff),
7872 ++ asm,
7873 ++ []
7874 ++ >;
7875 ++}
7876 +
7877 diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
7878 new file mode 100644
7879 -index 0000000..adcffa8
7880 +index 0000000..1c4b3cf
7881 --- /dev/null
7882 +++ b/lib/Target/R600/SIInstrInfo.cpp
7883 -@@ -0,0 +1,90 @@
7884 +@@ -0,0 +1,143 @@
7885 +//===-- SIInstrInfo.cpp - SI Instruction Information ---------------------===//
7886 +//
7887 +// The LLVM Compiler Infrastructure
7888 @@ -18839,7 +20452,15 @@ index 0000000..adcffa8
7889 + // never be necessary.
7890 + assert(DestReg != AMDGPU::SCC && SrcReg != AMDGPU::SCC);
7891 +
7892 -+ if (AMDGPU::SReg_64RegClass.contains(DestReg)) {
7893 ++ if (AMDGPU::VReg_64RegClass.contains(DestReg)) {
7894 ++ assert(AMDGPU::VReg_64RegClass.contains(SrcReg) ||
7895 ++ AMDGPU::SReg_64RegClass.contains(SrcReg));
7896 ++ BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), RI.getSubReg(DestReg, AMDGPU::sub0))
7897 ++ .addReg(RI.getSubReg(SrcReg, AMDGPU::sub0), getKillRegState(KillSrc))
7898 ++ .addReg(DestReg, RegState::Define | RegState::Implicit);
7899 ++ BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), RI.getSubReg(DestReg, AMDGPU::sub1))
7900 ++ .addReg(RI.getSubReg(SrcReg, AMDGPU::sub1), getKillRegState(KillSrc));
7901 ++ } else if (AMDGPU::SReg_64RegClass.contains(DestReg)) {
7902 + assert(AMDGPU::SReg_64RegClass.contains(SrcReg));
7903 + BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B64), DestReg)
7904 + .addReg(SrcReg, getKillRegState(KillSrc));
7905 @@ -18858,8 +20479,8 @@ index 0000000..adcffa8
7906 +
7907 +MachineInstr * SIInstrInfo::getMovImmInstr(MachineFunction *MF, unsigned DstReg,
7908 + int64_t Imm) const {
7909 -+ MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::V_MOV_IMM_I32), DebugLoc());
7910 -+ MachineInstrBuilder MIB(*MF, MI);
7911 ++ MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::V_MOV_B32_e32), DebugLoc());
7912 ++ MachineInstrBuilder MIB(MI);
7913 + MIB.addReg(DstReg, RegState::Define);
7914 + MIB.addImm(Imm);
7915 +
7916 @@ -18874,9 +20495,6 @@ index 0000000..adcffa8
7917 + case AMDGPU::S_MOV_B64:
7918 + case AMDGPU::V_MOV_B32_e32:
7919 + case AMDGPU::V_MOV_B32_e64:
7920 -+ case AMDGPU::V_MOV_IMM_F32:
7921 -+ case AMDGPU::V_MOV_IMM_I32:
7922 -+ case AMDGPU::S_MOV_IMM_I32:
7923 + return true;
7924 + }
7925 +}
7926 @@ -18885,12 +20503,60 @@ index 0000000..adcffa8
7927 +SIInstrInfo::isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const {
7928 + return RC != &AMDGPU::EXECRegRegClass;
7929 +}
7930 ++
7931 ++//===----------------------------------------------------------------------===//
7932 ++// Indirect addressing callbacks
7933 ++//===----------------------------------------------------------------------===//
7934 ++
7935 ++unsigned SIInstrInfo::calculateIndirectAddress(unsigned RegIndex,
7936 ++ unsigned Channel) const {
7937 ++ assert(Channel == 0);
7938 ++ return RegIndex;
7939 ++}
7940 ++
7941 ++
7942 ++int SIInstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const {
7943 ++ llvm_unreachable("Unimplemented");
7944 ++}
7945 ++
7946 ++int SIInstrInfo::getIndirectIndexEnd(const MachineFunction &MF) const {
7947 ++ llvm_unreachable("Unimplemented");
7948 ++}
7949 ++
7950 ++const TargetRegisterClass *SIInstrInfo::getIndirectAddrStoreRegClass(
7951 ++ unsigned SourceReg) const {
7952 ++ llvm_unreachable("Unimplemented");
7953 ++}
7954 ++
7955 ++const TargetRegisterClass *SIInstrInfo::getIndirectAddrLoadRegClass() const {
7956 ++ llvm_unreachable("Unimplemented");
7957 ++}
7958 ++
7959 ++MachineInstrBuilder SIInstrInfo::buildIndirectWrite(
7960 ++ MachineBasicBlock *MBB,
7961 ++ MachineBasicBlock::iterator I,
7962 ++ unsigned ValueReg,
7963 ++ unsigned Address, unsigned OffsetReg) const {
7964 ++ llvm_unreachable("Unimplemented");
7965 ++}
7966 ++
7967 ++MachineInstrBuilder SIInstrInfo::buildIndirectRead(
7968 ++ MachineBasicBlock *MBB,
7969 ++ MachineBasicBlock::iterator I,
7970 ++ unsigned ValueReg,
7971 ++ unsigned Address, unsigned OffsetReg) const {
7972 ++ llvm_unreachable("Unimplemented");
7973 ++}
7974 ++
7975 ++const TargetRegisterClass *SIInstrInfo::getSuperIndirectRegClass() const {
7976 ++ llvm_unreachable("Unimplemented");
7977 ++}
7978 diff --git a/lib/Target/R600/SIInstrInfo.h b/lib/Target/R600/SIInstrInfo.h
7979 new file mode 100644
7980 -index 0000000..631f6c0
7981 +index 0000000..a65f7b6
7982 --- /dev/null
7983 +++ b/lib/Target/R600/SIInstrInfo.h
7984 -@@ -0,0 +1,62 @@
7985 +@@ -0,0 +1,84 @@
7986 +//===-- SIInstrInfo.h - SI Instruction Info Interface ---------------------===//
7987 +//
7988 +// The LLVM Compiler Infrastructure
7989 @@ -18928,12 +20594,6 @@ index 0000000..631f6c0
7990 + unsigned DestReg, unsigned SrcReg,
7991 + bool KillSrc) const;
7992 +
7993 -+ /// \returns the encoding type of this instruction.
7994 -+ unsigned getEncodingType(const MachineInstr &MI) const;
7995 -+
7996 -+ /// \returns the size of this instructions encoding in number of bytes.
7997 -+ unsigned getEncodingBytes(const MachineInstr &MI) const;
7998 -+
7999 + virtual MachineInstr * getMovImmInstr(MachineFunction *MF, unsigned DstReg,
8000 + int64_t Imm) const;
8001 +
8002 @@ -18941,6 +20601,32 @@ index 0000000..631f6c0
8003 + virtual bool isMov(unsigned Opcode) const;
8004 +
8005 + virtual bool isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const;
8006 ++
8007 ++ virtual int getIndirectIndexBegin(const MachineFunction &MF) const;
8008 ++
8009 ++ virtual int getIndirectIndexEnd(const MachineFunction &MF) const;
8010 ++
8011 ++ virtual unsigned calculateIndirectAddress(unsigned RegIndex,
8012 ++ unsigned Channel) const;
8013 ++
8014 ++ virtual const TargetRegisterClass *getIndirectAddrStoreRegClass(
8015 ++ unsigned SourceReg) const;
8016 ++
8017 ++ virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const;
8018 ++
8019 ++ virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB,
8020 ++ MachineBasicBlock::iterator I,
8021 ++ unsigned ValueReg,
8022 ++ unsigned Address,
8023 ++ unsigned OffsetReg) const;
8024 ++
8025 ++ virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB,
8026 ++ MachineBasicBlock::iterator I,
8027 ++ unsigned ValueReg,
8028 ++ unsigned Address,
8029 ++ unsigned OffsetReg) const;
8030 ++
8031 ++ virtual const TargetRegisterClass *getSuperIndirectRegClass() const;
8032 + };
8033 +
8034 +} // End namespace llvm
8035 @@ -18948,17 +20634,19 @@ index 0000000..631f6c0
8036 +namespace SIInstrFlags {
8037 + enum Flags {
8038 + // First 4 bits are the instruction encoding
8039 -+ NEED_WAIT = 1 << 4
8040 ++ VM_CNT = 1 << 0,
8041 ++ EXP_CNT = 1 << 1,
8042 ++ LGKM_CNT = 1 << 2
8043 + };
8044 +}
8045 +
8046 +#endif //SIINSTRINFO_H
8047 diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
8048 new file mode 100644
8049 -index 0000000..873a451
8050 +index 0000000..8c4e5af
8051 --- /dev/null
8052 +++ b/lib/Target/R600/SIInstrInfo.td
8053 -@@ -0,0 +1,589 @@
8054 +@@ -0,0 +1,465 @@
8055 +//===-- SIInstrInfo.td - SI Instruction Encodings ---------*- tablegen -*--===//
8056 +//
8057 +// The LLVM Compiler Infrastructure
8058 @@ -18969,57 +20657,66 @@ index 0000000..873a451
8059 +//===----------------------------------------------------------------------===//
8060 +
8061 +//===----------------------------------------------------------------------===//
8062 -+// SI DAG Profiles
8063 -+//===----------------------------------------------------------------------===//
8064 -+def SDTVCCBinaryOp : SDTypeProfile<1, 2, [
8065 -+ SDTCisInt<0>, SDTCisInt<1>, SDTCisSameAs<1, 2>
8066 -+]>;
8067 -+
8068 -+//===----------------------------------------------------------------------===//
8069 +// SI DAG Nodes
8070 +//===----------------------------------------------------------------------===//
8071 +
8072 -+// and operation on 64-bit wide vcc
8073 -+def SIsreg1_and : SDNode<"SIISD::VCC_AND", SDTVCCBinaryOp,
8074 -+ [SDNPCommutative, SDNPAssociative]
8075 ++// SMRD takes a 64bit memory address and can only add an 32bit offset
8076 ++def SIadd64bit32bit : SDNode<"ISD::ADD",
8077 ++ SDTypeProfile<1, 2, [SDTCisSameAs<0, 1>, SDTCisVT<0, i64>, SDTCisVT<2, i32>]>
8078 +>;
8079 +
8080 -+// Special bitcast node for sharing VCC register between VALU and SALU
8081 -+def SIsreg1_bitcast : SDNode<"SIISD::VCC_BITCAST",
8082 -+ SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisInt<1>]>
8083 -+>;
8084 ++// Transformation function, extract the lower 32bit of a 64bit immediate
8085 ++def LO32 : SDNodeXForm<imm, [{
8086 ++ return CurDAG->getTargetConstant(N->getZExtValue() & 0xffffffff, MVT::i32);
8087 ++}]>;
8088 +
8089 -+// and operation on 64-bit wide vcc
8090 -+def SIvcc_and : SDNode<"SIISD::VCC_AND", SDTVCCBinaryOp,
8091 -+ [SDNPCommutative, SDNPAssociative]
8092 ++// Transformation function, extract the upper 32bit of a 64bit immediate
8093 ++def HI32 : SDNodeXForm<imm, [{
8094 ++ return CurDAG->getTargetConstant(N->getZExtValue() >> 32, MVT::i32);
8095 ++}]>;
8096 ++
8097 ++def IMM8bitDWORD : ImmLeaf <
8098 ++ i32, [{
8099 ++ return (Imm & ~0x3FC) == 0;
8100 ++ }], SDNodeXForm<imm, [{
8101 ++ return CurDAG->getTargetConstant(
8102 ++ N->getZExtValue() >> 2, MVT::i32);
8103 ++ }]>
8104 +>;
8105 +
8106 -+// Special bitcast node for sharing VCC register between VALU and SALU
8107 -+def SIvcc_bitcast : SDNode<"SIISD::VCC_BITCAST",
8108 -+ SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisInt<1>]>
8109 ++def IMM12bit : ImmLeaf <
8110 ++ i16,
8111 ++ [{return isUInt<12>(Imm);}]
8112 +>;
8113 +
8114 ++class InlineImm <ValueType vt> : ImmLeaf <vt, [{
8115 ++ return -16 <= Imm && Imm <= 64;
8116 ++}]>;
8117 ++
8118 +class InstSI <dag outs, dag ins, string asm, list<dag> pattern> :
8119 + AMDGPUInst<outs, ins, asm, pattern> {
8120 +
8121 -+ field bits<4> EncodingType = 0;
8122 -+ field bits<1> NeedWait = 0;
8123 -+
8124 -+ let TSFlags{3-0} = EncodingType;
8125 -+ let TSFlags{4} = NeedWait;
8126 ++ field bits<1> VM_CNT = 0;
8127 ++ field bits<1> EXP_CNT = 0;
8128 ++ field bits<1> LGKM_CNT = 0;
8129 +
8130 ++ let TSFlags{0} = VM_CNT;
8131 ++ let TSFlags{1} = EXP_CNT;
8132 ++ let TSFlags{2} = LGKM_CNT;
8133 +}
8134 +
8135 +class Enc32 <dag outs, dag ins, string asm, list<dag> pattern> :
8136 + InstSI <outs, ins, asm, pattern> {
8137 +
8138 + field bits<32> Inst;
8139 ++ let Size = 4;
8140 +}
8141 +
8142 +class Enc64 <dag outs, dag ins, string asm, list<dag> pattern> :
8143 + InstSI <outs, ins, asm, pattern> {
8144 +
8145 + field bits<64> Inst;
8146 ++ let Size = 8;
8147 +}
8148 +
8149 +class SIOperand <ValueType vt, dag opInfo>: Operand <vt> {
8150 @@ -19027,49 +20724,16 @@ index 0000000..873a451
8151 + let MIOperandInfo = opInfo;
8152 +}
8153 +
8154 -+def IMM16bit : ImmLeaf <
8155 -+ i16,
8156 -+ [{return isInt<16>(Imm);}]
8157 -+>;
8158 -+
8159 -+def IMM8bit : ImmLeaf <
8160 -+ i32,
8161 -+ [{return (int32_t)Imm >= 0 && (int32_t)Imm <= 0xff;}]
8162 -+>;
8163 -+
8164 -+def IMM12bit : ImmLeaf <
8165 -+ i16,
8166 -+ [{return (int16_t)Imm >= 0 && (int16_t)Imm <= 0xfff;}]
8167 -+>;
8168 -+
8169 -+def IMM32bitIn64bit : ImmLeaf <
8170 -+ i64,
8171 -+ [{return isInt<32>(Imm);}]
8172 -+>;
8173 -+
8174 +class GPR4Align <RegisterClass rc> : Operand <vAny> {
8175 + let EncoderMethod = "GPR4AlignEncode";
8176 + let MIOperandInfo = (ops rc:$reg);
8177 +}
8178 +
8179 -+class GPR2Align <RegisterClass rc, ValueType vt> : Operand <vt> {
8180 ++class GPR2Align <RegisterClass rc> : Operand <iPTR> {
8181 + let EncoderMethod = "GPR2AlignEncode";
8182 + let MIOperandInfo = (ops rc:$reg);
8183 +}
8184 +
8185 -+def SMRDmemrr : Operand<iPTR> {
8186 -+ let MIOperandInfo = (ops SReg_64, SReg_32);
8187 -+ let EncoderMethod = "GPR2AlignEncode";
8188 -+}
8189 -+
8190 -+def SMRDmemri : Operand<iPTR> {
8191 -+ let MIOperandInfo = (ops SReg_64, i32imm);
8192 -+ let EncoderMethod = "SMRDmemriEncode";
8193 -+}
8194 -+
8195 -+def ADDR_Reg : ComplexPattern<i64, 2, "SelectADDRReg", [], []>;
8196 -+def ADDR_Offset8 : ComplexPattern<i64, 2, "SelectADDR8BitOffset", [], []>;
8197 -+
8198 +let Uses = [EXEC] in {
8199 +
8200 +def EXP : Enc64<
8201 @@ -19099,10 +20763,8 @@ index 0000000..873a451
8202 + let Inst{47-40} = VSRC1;
8203 + let Inst{55-48} = VSRC2;
8204 + let Inst{63-56} = VSRC3;
8205 -+ let EncodingType = 0; //SIInstrEncodingType::EXP
8206 +
8207 -+ let NeedWait = 1;
8208 -+ let usesCustomInserter = 1;
8209 ++ let EXP_CNT = 1;
8210 +}
8211 +
8212 +class MIMG <bits<7> op, dag outs, dag ins, string asm, list<dag> pattern> :
8213 @@ -19136,10 +20798,8 @@ index 0000000..873a451
8214 + let Inst{52-48} = SRSRC;
8215 + let Inst{57-53} = SSAMP;
8216 +
8217 -+ let EncodingType = 2; //SIInstrEncodingType::MIMG
8218 -+
8219 -+ let NeedWait = 1;
8220 -+ let usesCustomInserter = 1;
8221 ++ let VM_CNT = 1;
8222 ++ let EXP_CNT = 1;
8223 +}
8224 +
8225 +class MTBUF <bits<3> op, dag outs, dag ins, string asm, list<dag> pattern> :
8226 @@ -19174,10 +20834,10 @@ index 0000000..873a451
8227 + let Inst{54} = SLC;
8228 + let Inst{55} = TFE;
8229 + let Inst{63-56} = SOFFSET;
8230 -+ let EncodingType = 3; //SIInstrEncodingType::MTBUF
8231 +
8232 -+ let NeedWait = 1;
8233 -+ let usesCustomInserter = 1;
8234 ++ let VM_CNT = 1;
8235 ++ let EXP_CNT = 1;
8236 ++
8237 + let neverHasSideEffects = 1;
8238 +}
8239 +
8240 @@ -19211,34 +20871,30 @@ index 0000000..873a451
8241 + let Inst{54} = SLC;
8242 + let Inst{55} = TFE;
8243 + let Inst{63-56} = SOFFSET;
8244 -+ let EncodingType = 4; //SIInstrEncodingType::MUBUF
8245 +
8246 -+ let NeedWait = 1;
8247 -+ let usesCustomInserter = 1;
8248 ++ let VM_CNT = 1;
8249 ++ let EXP_CNT = 1;
8250 ++
8251 + let neverHasSideEffects = 1;
8252 +}
8253 +
8254 +} // End Uses = [EXEC]
8255 +
8256 -+class SMRD <bits<5> op, dag outs, dag ins, string asm, list<dag> pattern> :
8257 -+ Enc32<outs, ins, asm, pattern> {
8258 ++class SMRD <bits<5> op, bits<1> imm, dag outs, dag ins, string asm,
8259 ++ list<dag> pattern> : Enc32<outs, ins, asm, pattern> {
8260 +
8261 + bits<7> SDST;
8262 -+ bits<15> PTR;
8263 -+ bits<8> OFFSET = PTR{7-0};
8264 -+ bits<1> IMM = PTR{8};
8265 -+ bits<6> SBASE = PTR{14-9};
8266 ++ bits<6> SBASE;
8267 ++ bits<8> OFFSET;
8268 +
8269 + let Inst{7-0} = OFFSET;
8270 -+ let Inst{8} = IMM;
8271 ++ let Inst{8} = imm;
8272 + let Inst{14-9} = SBASE;
8273 + let Inst{21-15} = SDST;
8274 + let Inst{26-22} = op;
8275 + let Inst{31-27} = 0x18; //encoding
8276 -+ let EncodingType = 5; //SIInstrEncodingType::SMRD
8277 +
8278 -+ let NeedWait = 1;
8279 -+ let usesCustomInserter = 1;
8280 ++ let LGKM_CNT = 1;
8281 +}
8282 +
8283 +class SOP1 <bits<8> op, dag outs, dag ins, string asm, list<dag> pattern> :
8284 @@ -19251,7 +20907,6 @@ index 0000000..873a451
8285 + let Inst{15-8} = op;
8286 + let Inst{22-16} = SDST;
8287 + let Inst{31-23} = 0x17d; //encoding;
8288 -+ let EncodingType = 6; //SIInstrEncodingType::SOP1
8289 +
8290 + let mayLoad = 0;
8291 + let mayStore = 0;
8292 @@ -19270,7 +20925,6 @@ index 0000000..873a451
8293 + let Inst{22-16} = SDST;
8294 + let Inst{29-23} = op;
8295 + let Inst{31-30} = 0x2; // encoding
8296 -+ let EncodingType = 7; // SIInstrEncodingType::SOP2
8297 +
8298 + let mayLoad = 0;
8299 + let mayStore = 0;
8300 @@ -19287,7 +20941,6 @@ index 0000000..873a451
8301 + let Inst{15-8} = SSRC1;
8302 + let Inst{22-16} = op;
8303 + let Inst{31-23} = 0x17e;
8304 -+ let EncodingType = 8; // SIInstrEncodingType::SOPC
8305 +
8306 + let DisableEncoding = "$dst";
8307 + let mayLoad = 0;
8308 @@ -19305,7 +20958,6 @@ index 0000000..873a451
8309 + let Inst{22-16} = SDST;
8310 + let Inst{27-23} = op;
8311 + let Inst{31-28} = 0xb; //encoding
8312 -+ let EncodingType = 9; // SIInstrEncodingType::SOPK
8313 +
8314 + let mayLoad = 0;
8315 + let mayStore = 0;
8316 @@ -19323,7 +20975,6 @@ index 0000000..873a451
8317 + let Inst{15-0} = SIMM16;
8318 + let Inst{22-16} = op;
8319 + let Inst{31-23} = 0x17f; // encoding
8320 -+ let EncodingType = 10; // SIInstrEncodingType::SOPP
8321 +
8322 + let mayLoad = 0;
8323 + let mayStore = 0;
8324 @@ -19346,7 +20997,6 @@ index 0000000..873a451
8325 + let Inst{17-16} = op;
8326 + let Inst{25-18} = VDST;
8327 + let Inst{31-26} = 0x32; // encoding
8328 -+ let EncodingType = 11; // SIInstrEncodingType::VINTRP
8329 +
8330 + let neverHasSideEffects = 1;
8331 + let mayLoad = 1;
8332 @@ -19364,9 +21014,6 @@ index 0000000..873a451
8333 + let Inst{24-17} = VDST;
8334 + let Inst{31-25} = 0x3f; //encoding
8335 +
8336 -+ let EncodingType = 12; // SIInstrEncodingType::VOP1
8337 -+ let PostEncoderMethod = "VOPPostEncode";
8338 -+
8339 + let mayLoad = 0;
8340 + let mayStore = 0;
8341 + let hasSideEffects = 0;
8342 @@ -19385,9 +21032,6 @@ index 0000000..873a451
8343 + let Inst{30-25} = op;
8344 + let Inst{31} = 0x0; //encoding
8345 +
8346 -+ let EncodingType = 13; // SIInstrEncodingType::VOP2
8347 -+ let PostEncoderMethod = "VOPPostEncode";
8348 -+
8349 + let mayLoad = 0;
8350 + let mayStore = 0;
8351 + let hasSideEffects = 0;
8352 @@ -19416,9 +21060,6 @@ index 0000000..873a451
8353 + let Inst{60-59} = OMOD;
8354 + let Inst{63-61} = NEG;
8355 +
8356 -+ let EncodingType = 14; // SIInstrEncodingType::VOP3
8357 -+ let PostEncoderMethod = "VOPPostEncode";
8358 -+
8359 + let mayLoad = 0;
8360 + let mayStore = 0;
8361 + let hasSideEffects = 0;
8362 @@ -19433,127 +21074,50 @@ index 0000000..873a451
8363 + bits<9> SRC2;
8364 + bits<7> SDST;
8365 + bits<2> OMOD;
8366 -+ bits<3> NEG;
8367 -+
8368 -+ let Inst{7-0} = VDST;
8369 -+ let Inst{14-8} = SDST;
8370 -+ let Inst{25-17} = op;
8371 -+ let Inst{31-26} = 0x34; //encoding
8372 -+ let Inst{40-32} = SRC0;
8373 -+ let Inst{49-41} = SRC1;
8374 -+ let Inst{58-50} = SRC2;
8375 -+ let Inst{60-59} = OMOD;
8376 -+ let Inst{63-61} = NEG;
8377 -+
8378 -+ let EncodingType = 14; // SIInstrEncodingType::VOP3
8379 -+ let PostEncoderMethod = "VOPPostEncode";
8380 -+
8381 -+ let mayLoad = 0;
8382 -+ let mayStore = 0;
8383 -+ let hasSideEffects = 0;
8384 -+}
8385 -+
8386 -+class VOPC <bits<8> op, dag ins, string asm, list<dag> pattern> :
8387 -+ Enc32 <(outs VCCReg:$dst), ins, asm, pattern> {
8388 -+
8389 -+ bits<9> SRC0;
8390 -+ bits<8> VSRC1;
8391 -+
8392 -+ let Inst{8-0} = SRC0;
8393 -+ let Inst{16-9} = VSRC1;
8394 -+ let Inst{24-17} = op;
8395 -+ let Inst{31-25} = 0x3e;
8396 -+
8397 -+ let EncodingType = 15; //SIInstrEncodingType::VOPC
8398 -+ let PostEncoderMethod = "VOPPostEncode";
8399 -+ let DisableEncoding = "$dst";
8400 -+ let mayLoad = 0;
8401 -+ let mayStore = 0;
8402 -+ let hasSideEffects = 0;
8403 -+}
8404 -+
8405 -+} // End Uses = [EXEC]
8406 -+
8407 -+class MIMG_Load_Helper <bits<7> op, string asm> : MIMG <
8408 -+ op,
8409 -+ (outs VReg_128:$vdata),
8410 -+ (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
8411 -+ i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_128:$vaddr,
8412 -+ GPR4Align<SReg_256>:$srsrc, GPR4Align<SReg_128>:$ssamp),
8413 -+ asm,
8414 -+ []> {
8415 -+ let mayLoad = 1;
8416 -+ let mayStore = 0;
8417 -+}
8418 -+
8419 -+class MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass> : MUBUF <
8420 -+ op,
8421 -+ (outs regClass:$dst),
8422 -+ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
8423 -+ i1imm:$lds, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc, i1imm:$slc,
8424 -+ i1imm:$tfe, SReg_32:$soffset),
8425 -+ asm,
8426 -+ []> {
8427 -+ let mayLoad = 1;
8428 -+ let mayStore = 0;
8429 -+}
8430 ++ bits<3> NEG;
8431 +
8432 -+class MTBUF_Load_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
8433 -+ op,
8434 -+ (outs regClass:$dst),
8435 -+ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
8436 -+ i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc,
8437 -+ i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
8438 -+ asm,
8439 -+ []> {
8440 -+ let mayLoad = 1;
8441 -+ let mayStore = 0;
8442 -+}
8443 ++ let Inst{7-0} = VDST;
8444 ++ let Inst{14-8} = SDST;
8445 ++ let Inst{25-17} = op;
8446 ++ let Inst{31-26} = 0x34; //encoding
8447 ++ let Inst{40-32} = SRC0;
8448 ++ let Inst{49-41} = SRC1;
8449 ++ let Inst{58-50} = SRC2;
8450 ++ let Inst{60-59} = OMOD;
8451 ++ let Inst{63-61} = NEG;
8452 +
8453 -+class MTBUF_Store_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
8454 -+ op,
8455 -+ (outs),
8456 -+ (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc,
8457 -+ i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr,
8458 -+ GPR4Align<SReg_128>:$srsrc, i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
8459 -+ asm,
8460 -+ []> {
8461 -+ let mayStore = 1;
8462 + let mayLoad = 0;
8463 ++ let mayStore = 0;
8464 ++ let hasSideEffects = 0;
8465 +}
8466 +
8467 -+multiclass SMRD_Helper <bits<5> op, string asm, RegisterClass dstClass,
8468 -+ ValueType vt> {
8469 -+ def _IMM : SMRD <
8470 -+ op,
8471 -+ (outs dstClass:$dst),
8472 -+ (ins SMRDmemri:$src0),
8473 -+ asm,
8474 -+ [(set (vt dstClass:$dst), (constant_load ADDR_Offset8:$src0))]
8475 -+ >;
8476 ++class VOPC <bits<8> op, dag ins, string asm, list<dag> pattern> :
8477 ++ Enc32 <(outs VCCReg:$dst), ins, asm, pattern> {
8478 +
8479 -+ def _SGPR : SMRD <
8480 -+ op,
8481 -+ (outs dstClass:$dst),
8482 -+ (ins SMRDmemrr:$src0),
8483 -+ asm,
8484 -+ [(set (vt dstClass:$dst), (constant_load ADDR_Reg:$src0))]
8485 -+ >;
8486 -+}
8487 ++ bits<9> SRC0;
8488 ++ bits<8> VSRC1;
8489 +
8490 -+multiclass SMRD_32 <bits<5> op, string asm, RegisterClass dstClass> {
8491 -+ defm _F32 : SMRD_Helper <op, asm, dstClass, f32>;
8492 -+ defm _I32 : SMRD_Helper <op, asm, dstClass, i32>;
8493 ++ let Inst{8-0} = SRC0;
8494 ++ let Inst{16-9} = VSRC1;
8495 ++ let Inst{24-17} = op;
8496 ++ let Inst{31-25} = 0x3e;
8497 ++
8498 ++ let DisableEncoding = "$dst";
8499 ++ let mayLoad = 0;
8500 ++ let mayStore = 0;
8501 ++ let hasSideEffects = 0;
8502 +}
8503 +
8504 ++} // End Uses = [EXEC]
8505 ++
8506 +include "SIInstrFormats.td"
8507 +include "SIInstructions.td"
8508 diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
8509 new file mode 100644
8510 -index 0000000..005be96
8511 +index 0000000..3a9822a
8512 --- /dev/null
8513 +++ b/lib/Target/R600/SIInstructions.td
8514 -@@ -0,0 +1,1351 @@
8515 +@@ -0,0 +1,1462 @@
8516 +//===-- SIInstructions.td - SI Instruction Defintions ---------------------===//
8517 +//
8518 +// The LLVM Compiler Infrastructure
8519 @@ -19567,6 +21131,17 @@ index 0000000..005be96
8520 +// that are not yet supported remain commented out.
8521 +//===----------------------------------------------------------------------===//
8522 +
8523 ++class InterpSlots {
8524 ++int P0 = 2;
8525 ++int P10 = 0;
8526 ++int P20 = 1;
8527 ++}
8528 ++def INTERP : InterpSlots;
8529 ++
8530 ++def InterpSlot : Operand<i32> {
8531 ++ let PrintMethod = "printInterpSlot";
8532 ++}
8533 ++
8534 +def isSI : Predicate<"Subtarget.device()"
8535 + "->getGeneration() == AMDGPUDeviceInfo::HD7XXX">;
8536 +
8537 @@ -19675,33 +21250,33 @@ index 0000000..005be96
8538 +defm V_CMP_F_F32 : VOPC_32 <0x00000000, "V_CMP_F_F32", []>;
8539 +defm V_CMP_LT_F32 : VOPC_32 <0x00000001, "V_CMP_LT_F32", []>;
8540 +def : Pat <
8541 -+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_LT)),
8542 -+ (V_CMP_LT_F32_e64 AllReg_32:$src0, VReg_32:$src1)
8543 ++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LT)),
8544 ++ (V_CMP_LT_F32_e64 VSrc_32:$src0, VReg_32:$src1)
8545 +>;
8546 +defm V_CMP_EQ_F32 : VOPC_32 <0x00000002, "V_CMP_EQ_F32", []>;
8547 +def : Pat <
8548 -+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_EQ)),
8549 -+ (V_CMP_EQ_F32_e64 AllReg_32:$src0, VReg_32:$src1)
8550 ++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_EQ)),
8551 ++ (V_CMP_EQ_F32_e64 VSrc_32:$src0, VReg_32:$src1)
8552 +>;
8553 +defm V_CMP_LE_F32 : VOPC_32 <0x00000003, "V_CMP_LE_F32", []>;
8554 +def : Pat <
8555 -+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_LE)),
8556 -+ (V_CMP_LE_F32_e64 AllReg_32:$src0, VReg_32:$src1)
8557 ++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LE)),
8558 ++ (V_CMP_LE_F32_e64 VSrc_32:$src0, VReg_32:$src1)
8559 +>;
8560 +defm V_CMP_GT_F32 : VOPC_32 <0x00000004, "V_CMP_GT_F32", []>;
8561 +def : Pat <
8562 -+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_GT)),
8563 -+ (V_CMP_GT_F32_e64 AllReg_32:$src0, VReg_32:$src1)
8564 ++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GT)),
8565 ++ (V_CMP_GT_F32_e64 VSrc_32:$src0, VReg_32:$src1)
8566 +>;
8567 +defm V_CMP_LG_F32 : VOPC_32 <0x00000005, "V_CMP_LG_F32", []>;
8568 +def : Pat <
8569 -+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_NE)),
8570 -+ (V_CMP_LG_F32_e64 AllReg_32:$src0, VReg_32:$src1)
8571 ++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
8572 ++ (V_CMP_LG_F32_e64 VSrc_32:$src0, VReg_32:$src1)
8573 +>;
8574 +defm V_CMP_GE_F32 : VOPC_32 <0x00000006, "V_CMP_GE_F32", []>;
8575 +def : Pat <
8576 -+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_GE)),
8577 -+ (V_CMP_GE_F32_e64 AllReg_32:$src0, VReg_32:$src1)
8578 ++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GE)),
8579 ++ (V_CMP_GE_F32_e64 VSrc_32:$src0, VReg_32:$src1)
8580 +>;
8581 +defm V_CMP_O_F32 : VOPC_32 <0x00000007, "V_CMP_O_F32", []>;
8582 +defm V_CMP_U_F32 : VOPC_32 <0x00000008, "V_CMP_U_F32", []>;
8583 @@ -19711,8 +21286,8 @@ index 0000000..005be96
8584 +defm V_CMP_NLE_F32 : VOPC_32 <0x0000000c, "V_CMP_NLE_F32", []>;
8585 +defm V_CMP_NEQ_F32 : VOPC_32 <0x0000000d, "V_CMP_NEQ_F32", []>;
8586 +def : Pat <
8587 -+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_NE)),
8588 -+ (V_CMP_NEQ_F32_e64 AllReg_32:$src0, VReg_32:$src1)
8589 ++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
8590 ++ (V_CMP_NEQ_F32_e64 VSrc_32:$src0, VReg_32:$src1)
8591 +>;
8592 +defm V_CMP_NLT_F32 : VOPC_32 <0x0000000e, "V_CMP_NLT_F32", []>;
8593 +defm V_CMP_TRU_F32 : VOPC_32 <0x0000000f, "V_CMP_TRU_F32", []>;
8594 @@ -19845,33 +21420,33 @@ index 0000000..005be96
8595 +defm V_CMP_F_I32 : VOPC_32 <0x00000080, "V_CMP_F_I32", []>;
8596 +defm V_CMP_LT_I32 : VOPC_32 <0x00000081, "V_CMP_LT_I32", []>;
8597 +def : Pat <
8598 -+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_LT)),
8599 -+ (V_CMP_LT_I32_e64 AllReg_32:$src0, VReg_32:$src1)
8600 ++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_LT)),
8601 ++ (V_CMP_LT_I32_e64 VSrc_32:$src0, VReg_32:$src1)
8602 +>;
8603 +defm V_CMP_EQ_I32 : VOPC_32 <0x00000082, "V_CMP_EQ_I32", []>;
8604 +def : Pat <
8605 -+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_EQ)),
8606 -+ (V_CMP_EQ_I32_e64 AllReg_32:$src0, VReg_32:$src1)
8607 ++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_EQ)),
8608 ++ (V_CMP_EQ_I32_e64 VSrc_32:$src0, VReg_32:$src1)
8609 +>;
8610 +defm V_CMP_LE_I32 : VOPC_32 <0x00000083, "V_CMP_LE_I32", []>;
8611 +def : Pat <
8612 -+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_LE)),
8613 -+ (V_CMP_LE_I32_e64 AllReg_32:$src0, VReg_32:$src1)
8614 ++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_LE)),
8615 ++ (V_CMP_LE_I32_e64 VSrc_32:$src0, VReg_32:$src1)
8616 +>;
8617 +defm V_CMP_GT_I32 : VOPC_32 <0x00000084, "V_CMP_GT_I32", []>;
8618 +def : Pat <
8619 -+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_GT)),
8620 -+ (V_CMP_GT_I32_e64 AllReg_32:$src0, VReg_32:$src1)
8621 ++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_GT)),
8622 ++ (V_CMP_GT_I32_e64 VSrc_32:$src0, VReg_32:$src1)
8623 +>;
8624 +defm V_CMP_NE_I32 : VOPC_32 <0x00000085, "V_CMP_NE_I32", []>;
8625 +def : Pat <
8626 -+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_NE)),
8627 -+ (V_CMP_NE_I32_e64 AllReg_32:$src0, VReg_32:$src1)
8628 ++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
8629 ++ (V_CMP_NE_I32_e64 VSrc_32:$src0, VReg_32:$src1)
8630 +>;
8631 +defm V_CMP_GE_I32 : VOPC_32 <0x00000086, "V_CMP_GE_I32", []>;
8632 +def : Pat <
8633 -+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_GE)),
8634 -+ (V_CMP_GE_I32_e64 AllReg_32:$src0, VReg_32:$src1)
8635 ++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_GE)),
8636 ++ (V_CMP_GE_I32_e64 VSrc_32:$src0, VReg_32:$src1)
8637 +>;
8638 +defm V_CMP_T_I32 : VOPC_32 <0x00000087, "V_CMP_T_I32", []>;
8639 +
8640 @@ -20017,11 +21592,13 @@ index 0000000..005be96
8641 +//def TBUFFER_STORE_FORMAT_XYZ : MTBUF_ <0x00000006, "TBUFFER_STORE_FORMAT_XYZ", []>;
8642 +//def TBUFFER_STORE_FORMAT_XYZW : MTBUF_ <0x00000007, "TBUFFER_STORE_FORMAT_XYZW", []>;
8643 +
8644 -+defm S_LOAD_DWORD : SMRD_32 <0x00000000, "S_LOAD_DWORD", SReg_32>;
8645 ++let mayLoad = 1 in {
8646 ++
8647 ++defm S_LOAD_DWORD : SMRD_Helper <0x00000000, "S_LOAD_DWORD", SReg_32>;
8648 +
8649 +//def S_LOAD_DWORDX2 : SMRD_DWORDX2 <0x00000001, "S_LOAD_DWORDX2", []>;
8650 -+defm S_LOAD_DWORDX4 : SMRD_Helper <0x00000002, "S_LOAD_DWORDX4", SReg_128, v4i32>;
8651 -+defm S_LOAD_DWORDX8 : SMRD_Helper <0x00000003, "S_LOAD_DWORDX8", SReg_256, v8i32>;
8652 ++defm S_LOAD_DWORDX4 : SMRD_Helper <0x00000002, "S_LOAD_DWORDX4", SReg_128>;
8653 ++defm S_LOAD_DWORDX8 : SMRD_Helper <0x00000003, "S_LOAD_DWORDX8", SReg_256>;
8654 +//def S_LOAD_DWORDX16 : SMRD_DWORDX16 <0x00000004, "S_LOAD_DWORDX16", []>;
8655 +//def S_BUFFER_LOAD_DWORD : SMRD_ <0x00000008, "S_BUFFER_LOAD_DWORD", []>;
8656 +//def S_BUFFER_LOAD_DWORDX2 : SMRD_DWORDX2 <0x00000009, "S_BUFFER_LOAD_DWORDX2", []>;
8657 @@ -20029,6 +21606,8 @@ index 0000000..005be96
8658 +//def S_BUFFER_LOAD_DWORDX8 : SMRD_DWORDX8 <0x0000000b, "S_BUFFER_LOAD_DWORDX8", []>;
8659 +//def S_BUFFER_LOAD_DWORDX16 : SMRD_DWORDX16 <0x0000000c, "S_BUFFER_LOAD_DWORDX16", []>;
8660 +
8661 ++} // mayLoad = 1
8662 ++
8663 +//def S_MEMTIME : SMRD_ <0x0000001e, "S_MEMTIME", []>;
8664 +//def S_DCACHE_INV : SMRD_ <0x0000001f, "S_DCACHE_INV", []>;
8665 +//def IMAGE_LOAD : MIMG_NoPattern_ <"IMAGE_LOAD", 0x00000000>;
8666 @@ -20067,12 +21646,12 @@ index 0000000..005be96
8667 +def IMAGE_SAMPLE_B : MIMG_Load_Helper <0x00000025, "IMAGE_SAMPLE_B">;
8668 +//def IMAGE_SAMPLE_B_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_B_CL", 0x00000026>;
8669 +//def IMAGE_SAMPLE_LZ : MIMG_NoPattern_ <"IMAGE_SAMPLE_LZ", 0x00000027>;
8670 -+//def IMAGE_SAMPLE_C : MIMG_NoPattern_ <"IMAGE_SAMPLE_C", 0x00000028>;
8671 ++def IMAGE_SAMPLE_C : MIMG_Load_Helper <0x00000028, "IMAGE_SAMPLE_C">;
8672 +//def IMAGE_SAMPLE_C_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_CL", 0x00000029>;
8673 +//def IMAGE_SAMPLE_C_D : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_D", 0x0000002a>;
8674 +//def IMAGE_SAMPLE_C_D_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_D_CL", 0x0000002b>;
8675 -+//def IMAGE_SAMPLE_C_L : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_L", 0x0000002c>;
8676 -+//def IMAGE_SAMPLE_C_B : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_B", 0x0000002d>;
8677 ++def IMAGE_SAMPLE_C_L : MIMG_Load_Helper <0x0000002c, "IMAGE_SAMPLE_C_L">;
8678 ++def IMAGE_SAMPLE_C_B : MIMG_Load_Helper <0x0000002d, "IMAGE_SAMPLE_C_B">;
8679 +//def IMAGE_SAMPLE_C_B_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_B_CL", 0x0000002e>;
8680 +//def IMAGE_SAMPLE_C_LZ : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_LZ", 0x0000002f>;
8681 +//def IMAGE_SAMPLE_O : MIMG_NoPattern_ <"IMAGE_SAMPLE_O", 0x00000030>;
8682 @@ -20135,12 +21714,12 @@ index 0000000..005be96
8683 +//defm V_CVT_I32_F64 : VOP1_32 <0x00000003, "V_CVT_I32_F64", []>;
8684 +//defm V_CVT_F64_I32 : VOP1_64 <0x00000004, "V_CVT_F64_I32", []>;
8685 +defm V_CVT_F32_I32 : VOP1_32 <0x00000005, "V_CVT_F32_I32",
8686 -+ [(set VReg_32:$dst, (sint_to_fp AllReg_32:$src0))]
8687 ++ [(set VReg_32:$dst, (sint_to_fp VSrc_32:$src0))]
8688 +>;
8689 +//defm V_CVT_F32_U32 : VOP1_32 <0x00000006, "V_CVT_F32_U32", []>;
8690 +//defm V_CVT_U32_F32 : VOP1_32 <0x00000007, "V_CVT_U32_F32", []>;
8691 +defm V_CVT_I32_F32 : VOP1_32 <0x00000008, "V_CVT_I32_F32",
8692 -+ [(set VReg_32:$dst, (fp_to_sint AllReg_32:$src0))]
8693 ++ [(set (i32 VReg_32:$dst), (fp_to_sint VSrc_32:$src0))]
8694 +>;
8695 +defm V_MOV_FED_B32 : VOP1_32 <0x00000009, "V_MOV_FED_B32", []>;
8696 +////def V_CVT_F16_F32 : VOP1_F16 <0x0000000a, "V_CVT_F16_F32", []>;
8697 @@ -20157,31 +21736,35 @@ index 0000000..005be96
8698 +//defm V_CVT_U32_F64 : VOP1_32 <0x00000015, "V_CVT_U32_F64", []>;
8699 +//defm V_CVT_F64_U32 : VOP1_64 <0x00000016, "V_CVT_F64_U32", []>;
8700 +defm V_FRACT_F32 : VOP1_32 <0x00000020, "V_FRACT_F32",
8701 -+ [(set VReg_32:$dst, (AMDGPUfract AllReg_32:$src0))]
8702 ++ [(set VReg_32:$dst, (AMDGPUfract VSrc_32:$src0))]
8703 +>;
8704 +defm V_TRUNC_F32 : VOP1_32 <0x00000021, "V_TRUNC_F32", []>;
8705 -+defm V_CEIL_F32 : VOP1_32 <0x00000022, "V_CEIL_F32", []>;
8706 ++defm V_CEIL_F32 : VOP1_32 <0x00000022, "V_CEIL_F32",
8707 ++ [(set VReg_32:$dst, (fceil VSrc_32:$src0))]
8708 ++>;
8709 +defm V_RNDNE_F32 : VOP1_32 <0x00000023, "V_RNDNE_F32",
8710 -+ [(set VReg_32:$dst, (frint AllReg_32:$src0))]
8711 ++ [(set VReg_32:$dst, (frint VSrc_32:$src0))]
8712 +>;
8713 +defm V_FLOOR_F32 : VOP1_32 <0x00000024, "V_FLOOR_F32",
8714 -+ [(set VReg_32:$dst, (ffloor AllReg_32:$src0))]
8715 ++ [(set VReg_32:$dst, (ffloor VSrc_32:$src0))]
8716 +>;
8717 +defm V_EXP_F32 : VOP1_32 <0x00000025, "V_EXP_F32",
8718 -+ [(set VReg_32:$dst, (fexp2 AllReg_32:$src0))]
8719 ++ [(set VReg_32:$dst, (fexp2 VSrc_32:$src0))]
8720 +>;
8721 +defm V_LOG_CLAMP_F32 : VOP1_32 <0x00000026, "V_LOG_CLAMP_F32", []>;
8722 -+defm V_LOG_F32 : VOP1_32 <0x00000027, "V_LOG_F32", []>;
8723 ++defm V_LOG_F32 : VOP1_32 <0x00000027, "V_LOG_F32",
8724 ++ [(set VReg_32:$dst, (flog2 VSrc_32:$src0))]
8725 ++>;
8726 +defm V_RCP_CLAMP_F32 : VOP1_32 <0x00000028, "V_RCP_CLAMP_F32", []>;
8727 +defm V_RCP_LEGACY_F32 : VOP1_32 <0x00000029, "V_RCP_LEGACY_F32", []>;
8728 +defm V_RCP_F32 : VOP1_32 <0x0000002a, "V_RCP_F32",
8729 -+ [(set VReg_32:$dst, (fdiv FP_ONE, AllReg_32:$src0))]
8730 ++ [(set VReg_32:$dst, (fdiv FP_ONE, VSrc_32:$src0))]
8731 +>;
8732 +defm V_RCP_IFLAG_F32 : VOP1_32 <0x0000002b, "V_RCP_IFLAG_F32", []>;
8733 +defm V_RSQ_CLAMP_F32 : VOP1_32 <0x0000002c, "V_RSQ_CLAMP_F32", []>;
8734 +defm V_RSQ_LEGACY_F32 : VOP1_32 <
8735 + 0x0000002d, "V_RSQ_LEGACY_F32",
8736 -+ [(set VReg_32:$dst, (int_AMDGPU_rsq AllReg_32:$src0))]
8737 ++ [(set VReg_32:$dst, (int_AMDGPU_rsq VSrc_32:$src0))]
8738 +>;
8739 +defm V_RSQ_F32 : VOP1_32 <0x0000002e, "V_RSQ_F32", []>;
8740 +defm V_RCP_F64 : VOP1_64 <0x0000002f, "V_RCP_F64", []>;
8741 @@ -20231,10 +21814,9 @@ index 0000000..005be96
8742 +def V_INTERP_MOV_F32 : VINTRP <
8743 + 0x00000002,
8744 + (outs VReg_32:$dst),
8745 -+ (ins i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
8746 -+ "V_INTERP_MOV_F32",
8747 ++ (ins InterpSlot:$src0, i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
8748 ++ "V_INTERP_MOV_F32 $dst, $src0, $attr_chan, $attr",
8749 + []> {
8750 -+ let VSRC = 0;
8751 + let DisableEncoding = "$m0";
8752 +}
8753 +
8754 @@ -20314,22 +21896,22 @@ index 0000000..005be96
8755 +//def S_TTRACEDATA : SOPP_ <0x00000016, "S_TTRACEDATA", []>;
8756 +
8757 +def V_CNDMASK_B32_e32 : VOP2 <0x00000000, (outs VReg_32:$dst),
8758 -+ (ins AllReg_32:$src0, VReg_32:$src1, VCCReg:$vcc), "V_CNDMASK_B32_e32",
8759 ++ (ins VSrc_32:$src0, VReg_32:$src1, VCCReg:$vcc), "V_CNDMASK_B32_e32",
8760 + []
8761 +>{
8762 + let DisableEncoding = "$vcc";
8763 +}
8764 +
8765 +def V_CNDMASK_B32_e64 : VOP3 <0x00000100, (outs VReg_32:$dst),
8766 -+ (ins VReg_32:$src0, VReg_32:$src1, SReg_1:$src2, InstFlag:$abs, InstFlag:$clamp, InstFlag:$omod, InstFlag:$neg),
8767 ++ (ins VReg_32:$src0, VReg_32:$src1, SReg_64:$src2, InstFlag:$abs, InstFlag:$clamp, InstFlag:$omod, InstFlag:$neg),
8768 + "V_CNDMASK_B32_e64",
8769 -+ [(set (i32 VReg_32:$dst), (select SReg_1:$src2, VReg_32:$src1, VReg_32:$src0))]
8770 ++ [(set (i32 VReg_32:$dst), (select (i1 SReg_64:$src2), VReg_32:$src1, VReg_32:$src0))]
8771 +>;
8772 +
8773 +//f32 pattern for V_CNDMASK_B32_e64
8774 +def : Pat <
8775 -+ (f32 (select SReg_1:$src2, VReg_32:$src1, VReg_32:$src0)),
8776 -+ (V_CNDMASK_B32_e64 VReg_32:$src0, VReg_32:$src1, SReg_1:$src2)
8777 ++ (f32 (select (i1 SReg_64:$src2), VReg_32:$src1, VReg_32:$src0)),
8778 ++ (V_CNDMASK_B32_e64 VReg_32:$src0, VReg_32:$src1, SReg_64:$src2)
8779 +>;
8780 +
8781 +defm V_READLANE_B32 : VOP2_32 <0x00000001, "V_READLANE_B32", []>;
8782 @@ -20337,35 +21919,35 @@ index 0000000..005be96
8783 +
8784 +defm V_ADD_F32 : VOP2_32 <0x00000003, "V_ADD_F32", []>;
8785 +def : Pat <
8786 -+ (f32 (fadd AllReg_32:$src0, VReg_32:$src1)),
8787 -+ (V_ADD_F32_e32 AllReg_32:$src0, VReg_32:$src1)
8788 ++ (f32 (fadd VSrc_32:$src0, VReg_32:$src1)),
8789 ++ (V_ADD_F32_e32 VSrc_32:$src0, VReg_32:$src1)
8790 +>;
8791 +
8792 +defm V_SUB_F32 : VOP2_32 <0x00000004, "V_SUB_F32", []>;
8793 +def : Pat <
8794 -+ (f32 (fsub AllReg_32:$src0, VReg_32:$src1)),
8795 -+ (V_SUB_F32_e32 AllReg_32:$src0, VReg_32:$src1)
8796 ++ (f32 (fsub VSrc_32:$src0, VReg_32:$src1)),
8797 ++ (V_SUB_F32_e32 VSrc_32:$src0, VReg_32:$src1)
8798 +>;
8799 +defm V_SUBREV_F32 : VOP2_32 <0x00000005, "V_SUBREV_F32", []>;
8800 +defm V_MAC_LEGACY_F32 : VOP2_32 <0x00000006, "V_MAC_LEGACY_F32", []>;
8801 +defm V_MUL_LEGACY_F32 : VOP2_32 <
8802 + 0x00000007, "V_MUL_LEGACY_F32",
8803 -+ [(set VReg_32:$dst, (int_AMDGPU_mul AllReg_32:$src0, VReg_32:$src1))]
8804 ++ [(set VReg_32:$dst, (int_AMDGPU_mul VSrc_32:$src0, VReg_32:$src1))]
8805 +>;
8806 +
8807 +defm V_MUL_F32 : VOP2_32 <0x00000008, "V_MUL_F32",
8808 -+ [(set VReg_32:$dst, (fmul AllReg_32:$src0, VReg_32:$src1))]
8809 ++ [(set VReg_32:$dst, (fmul VSrc_32:$src0, VReg_32:$src1))]
8810 +>;
8811 +//defm V_MUL_I32_I24 : VOP2_32 <0x00000009, "V_MUL_I32_I24", []>;
8812 +//defm V_MUL_HI_I32_I24 : VOP2_32 <0x0000000a, "V_MUL_HI_I32_I24", []>;
8813 +//defm V_MUL_U32_U24 : VOP2_32 <0x0000000b, "V_MUL_U32_U24", []>;
8814 +//defm V_MUL_HI_U32_U24 : VOP2_32 <0x0000000c, "V_MUL_HI_U32_U24", []>;
8815 +defm V_MIN_LEGACY_F32 : VOP2_32 <0x0000000d, "V_MIN_LEGACY_F32",
8816 -+ [(set VReg_32:$dst, (AMDGPUfmin AllReg_32:$src0, VReg_32:$src1))]
8817 ++ [(set VReg_32:$dst, (AMDGPUfmin VSrc_32:$src0, VReg_32:$src1))]
8818 +>;
8819 +
8820 +defm V_MAX_LEGACY_F32 : VOP2_32 <0x0000000e, "V_MAX_LEGACY_F32",
8821 -+ [(set VReg_32:$dst, (AMDGPUfmax AllReg_32:$src0, VReg_32:$src1))]
8822 ++ [(set VReg_32:$dst, (AMDGPUfmax VSrc_32:$src0, VReg_32:$src1))]
8823 +>;
8824 +defm V_MIN_F32 : VOP2_32 <0x0000000f, "V_MIN_F32", []>;
8825 +defm V_MAX_F32 : VOP2_32 <0x00000010, "V_MAX_F32", []>;
8826 @@ -20380,13 +21962,13 @@ index 0000000..005be96
8827 +defm V_LSHL_B32 : VOP2_32 <0x00000019, "V_LSHL_B32", []>;
8828 +defm V_LSHLREV_B32 : VOP2_32 <0x0000001a, "V_LSHLREV_B32", []>;
8829 +defm V_AND_B32 : VOP2_32 <0x0000001b, "V_AND_B32",
8830 -+ [(set VReg_32:$dst, (and AllReg_32:$src0, VReg_32:$src1))]
8831 ++ [(set VReg_32:$dst, (and VSrc_32:$src0, VReg_32:$src1))]
8832 +>;
8833 +defm V_OR_B32 : VOP2_32 <0x0000001c, "V_OR_B32",
8834 -+ [(set VReg_32:$dst, (or AllReg_32:$src0, VReg_32:$src1))]
8835 ++ [(set VReg_32:$dst, (or VSrc_32:$src0, VReg_32:$src1))]
8836 +>;
8837 +defm V_XOR_B32 : VOP2_32 <0x0000001d, "V_XOR_B32",
8838 -+ [(set VReg_32:$dst, (xor AllReg_32:$src0, VReg_32:$src1))]
8839 ++ [(set VReg_32:$dst, (xor VSrc_32:$src0, VReg_32:$src1))]
8840 +>;
8841 +defm V_BFM_B32 : VOP2_32 <0x0000001e, "V_BFM_B32", []>;
8842 +defm V_MAC_F32 : VOP2_32 <0x0000001f, "V_MAC_F32", []>;
8843 @@ -20397,10 +21979,10 @@ index 0000000..005be96
8844 +//defm V_MBCNT_HI_U32_B32 : VOP2_32 <0x00000024, "V_MBCNT_HI_U32_B32", []>;
8845 +let Defs = [VCC] in { // Carry-out goes to VCC
8846 +defm V_ADD_I32 : VOP2_32 <0x00000025, "V_ADD_I32",
8847 -+ [(set VReg_32:$dst, (add (i32 AllReg_32:$src0), (i32 VReg_32:$src1)))]
8848 ++ [(set VReg_32:$dst, (add (i32 VSrc_32:$src0), (i32 VReg_32:$src1)))]
8849 +>;
8850 +defm V_SUB_I32 : VOP2_32 <0x00000026, "V_SUB_I32",
8851 -+ [(set VReg_32:$dst, (sub (i32 AllReg_32:$src0), (i32 VReg_32:$src1)))]
8852 ++ [(set VReg_32:$dst, (sub (i32 VSrc_32:$src0), (i32 VReg_32:$src1)))]
8853 +>;
8854 +} // End Defs = [VCC]
8855 +defm V_SUBREV_I32 : VOP2_32 <0x00000027, "V_SUBREV_I32", []>;
8856 @@ -20412,7 +21994,7 @@ index 0000000..005be96
8857 +////def V_CVT_PKNORM_I16_F32 : VOP2_I16 <0x0000002d, "V_CVT_PKNORM_I16_F32", []>;
8858 +////def V_CVT_PKNORM_U16_F32 : VOP2_U16 <0x0000002e, "V_CVT_PKNORM_U16_F32", []>;
8859 +defm V_CVT_PKRTZ_F16_F32 : VOP2_32 <0x0000002f, "V_CVT_PKRTZ_F16_F32",
8860 -+ [(set VReg_32:$dst, (int_SI_packf16 AllReg_32:$src0, VReg_32:$src1))]
8861 ++ [(set VReg_32:$dst, (int_SI_packf16 VSrc_32:$src0, VReg_32:$src1))]
8862 +>;
8863 +////def V_CVT_PK_U16_U32 : VOP2_U16 <0x00000030, "V_CVT_PK_U16_U32", []>;
8864 +////def V_CVT_PK_I16_I32 : VOP2_I16 <0x00000031, "V_CVT_PK_I16_I32", []>;
8865 @@ -20482,6 +22064,10 @@ index 0000000..005be96
8866 +def V_MUL_LO_U32 : VOP3_32 <0x00000169, "V_MUL_LO_U32", []>;
8867 +def V_MUL_HI_U32 : VOP3_32 <0x0000016a, "V_MUL_HI_U32", []>;
8868 +def V_MUL_LO_I32 : VOP3_32 <0x0000016b, "V_MUL_LO_I32", []>;
8869 ++def : Pat <
8870 ++ (mul VSrc_32:$src0, VReg_32:$src1),
8871 ++ (V_MUL_LO_I32 VSrc_32:$src0, VReg_32:$src1, (IMPLICIT_DEF), 0, 0, 0, 0)
8872 ++>;
8873 +def V_MUL_HI_I32 : VOP3_32 <0x0000016c, "V_MUL_HI_I32", []>;
8874 +def V_DIV_SCALE_F32 : VOP3_32 <0x0000016d, "V_DIV_SCALE_F32", []>;
8875 +def V_DIV_SCALE_F64 : VOP3_64 <0x0000016e, "V_DIV_SCALE_F64", []>;
8876 @@ -20519,13 +22105,20 @@ index 0000000..005be96
8877 +def S_AND_B32 : SOP2_32 <0x0000000e, "S_AND_B32", []>;
8878 +
8879 +def S_AND_B64 : SOP2_64 <0x0000000f, "S_AND_B64",
8880 -+ [(set SReg_64:$dst, (and SReg_64:$src0, SReg_64:$src1))]
8881 ++ [(set SReg_64:$dst, (i64 (and SSrc_64:$src0, SSrc_64:$src1)))]
8882 +>;
8883 -+def S_AND_VCC : SOP2_VCC <0x0000000f, "S_AND_B64",
8884 -+ [(set SReg_1:$vcc, (SIvcc_and SReg_64:$src0, SReg_64:$src1))]
8885 ++
8886 ++def : Pat <
8887 ++ (i1 (and SSrc_64:$src0, SSrc_64:$src1)),
8888 ++ (S_AND_B64 SSrc_64:$src0, SSrc_64:$src1)
8889 +>;
8890 ++
8891 +def S_OR_B32 : SOP2_32 <0x00000010, "S_OR_B32", []>;
8892 +def S_OR_B64 : SOP2_64 <0x00000011, "S_OR_B64", []>;
8893 ++def : Pat <
8894 ++ (i1 (or SSrc_64:$src0, SSrc_64:$src1)),
8895 ++ (S_OR_B64 SSrc_64:$src0, SSrc_64:$src1)
8896 ++>;
8897 +def S_XOR_B32 : SOP2_32 <0x00000012, "S_XOR_B32", []>;
8898 +def S_XOR_B64 : SOP2_64 <0x00000013, "S_XOR_B64", []>;
8899 +def S_ANDN2_B32 : SOP2_32 <0x00000014, "S_ANDN2_B32", []>;
8900 @@ -20554,48 +22147,6 @@ index 0000000..005be96
8901 +//def S_CBRANCH_G_FORK : SOP2_ <0x0000002b, "S_CBRANCH_G_FORK", []>;
8902 +def S_ABSDIFF_I32 : SOP2_32 <0x0000002c, "S_ABSDIFF_I32", []>;
8903 +
8904 -+class V_MOV_IMM <Operand immType, SDNode immNode> : InstSI <
8905 -+ (outs VReg_32:$dst),
8906 -+ (ins immType:$src0),
8907 -+ "V_MOV_IMM",
8908 -+ [(set VReg_32:$dst, (immNode:$src0))]
8909 -+>;
8910 -+
8911 -+let isCodeGenOnly = 1, isPseudo = 1 in {
8912 -+
8913 -+def V_MOV_IMM_I32 : V_MOV_IMM<i32imm, imm>;
8914 -+def V_MOV_IMM_F32 : V_MOV_IMM<f32imm, fpimm>;
8915 -+
8916 -+def S_MOV_IMM_I32 : InstSI <
8917 -+ (outs SReg_32:$dst),
8918 -+ (ins i32imm:$src0),
8919 -+ "S_MOV_IMM_I32",
8920 -+ [(set SReg_32:$dst, (imm:$src0))]
8921 -+>;
8922 -+
8923 -+// i64 immediates aren't really supported in hardware, but LLVM will use the i64
8924 -+// type for indices on load and store instructions. The pattern for
8925 -+// S_MOV_IMM_I64 will only match i64 immediates that can fit into 32-bits,
8926 -+// which the hardware can handle.
8927 -+def S_MOV_IMM_I64 : InstSI <
8928 -+ (outs SReg_64:$dst),
8929 -+ (ins i64imm:$src0),
8930 -+ "S_MOV_IMM_I64 $dst, $src0",
8931 -+ [(set SReg_64:$dst, (IMM32bitIn64bit:$src0))]
8932 -+>;
8933 -+
8934 -+} // End isCodeGenOnly, isPseudo = 1
8935 -+
8936 -+class SI_LOAD_LITERAL<Operand ImmType> :
8937 -+ Enc32 <(outs), (ins ImmType:$imm), "LOAD_LITERAL $imm", []> {
8938 -+
8939 -+ bits<32> imm;
8940 -+ let Inst{31-0} = imm;
8941 -+}
8942 -+
8943 -+def SI_LOAD_LITERAL_I32 : SI_LOAD_LITERAL<i32imm>;
8944 -+def SI_LOAD_LITERAL_F32 : SI_LOAD_LITERAL<f32imm>;
8945 -+
8946 +let isCodeGenOnly = 1, isPseudo = 1 in {
8947 +
8948 +def SET_M0 : InstSI <
8949 @@ -20614,13 +22165,6 @@ index 0000000..005be96
8950 +
8951 +let usesCustomInserter = 1 in {
8952 +
8953 -+def SI_V_CNDLT : InstSI <
8954 -+ (outs VReg_32:$dst),
8955 -+ (ins VReg_32:$src0, VReg_32:$src1, VReg_32:$src2),
8956 -+ "SI_V_CNDLT $dst, $src0, $src1, $src2",
8957 -+ [(set VReg_32:$dst, (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2))]
8958 -+>;
8959 -+
8960 +def SI_INTERP : InstSI <
8961 + (outs VReg_32:$dst),
8962 + (ins VReg_32:$i, VReg_32:$j, i32imm:$attr_chan, i32imm:$attr, SReg_32:$params),
8963 @@ -20628,21 +22172,6 @@ index 0000000..005be96
8964 + []
8965 +>;
8966 +
8967 -+def SI_INTERP_CONST : InstSI <
8968 -+ (outs VReg_32:$dst),
8969 -+ (ins i32imm:$attr_chan, i32imm:$attr, SReg_32:$params),
8970 -+ "SI_INTERP_CONST $dst, $attr_chan, $attr, $params",
8971 -+ [(set VReg_32:$dst, (int_SI_fs_interp_constant imm:$attr_chan,
8972 -+ imm:$attr, SReg_32:$params))]
8973 -+>;
8974 -+
8975 -+def SI_KIL : InstSI <
8976 -+ (outs),
8977 -+ (ins VReg_32:$src),
8978 -+ "SI_KIL $src",
8979 -+ [(int_AMDGPU_kill VReg_32:$src)]
8980 -+>;
8981 -+
8982 +def SI_WQM : InstSI <
8983 + (outs),
8984 + (ins),
8985 @@ -20662,9 +22191,9 @@ index 0000000..005be96
8986 +
8987 +def SI_IF : InstSI <
8988 + (outs SReg_64:$dst),
8989 -+ (ins SReg_1:$vcc, brtarget:$target),
8990 ++ (ins SReg_64:$vcc, brtarget:$target),
8991 + "SI_IF",
8992 -+ [(set SReg_64:$dst, (int_SI_if SReg_1:$vcc, bb:$target))]
8993 ++ [(set SReg_64:$dst, (int_SI_if SReg_64:$vcc, bb:$target))]
8994 +>;
8995 +
8996 +def SI_ELSE : InstSI <
8997 @@ -20694,9 +22223,9 @@ index 0000000..005be96
8998 +
8999 +def SI_IF_BREAK : InstSI <
9000 + (outs SReg_64:$dst),
9001 -+ (ins SReg_1:$vcc, SReg_64:$src),
9002 ++ (ins SReg_64:$vcc, SReg_64:$src),
9003 + "SI_IF_BREAK",
9004 -+ [(set SReg_64:$dst, (int_SI_if_break SReg_1:$vcc, SReg_64:$src))]
9005 ++ [(set SReg_64:$dst, (int_SI_if_break SReg_64:$vcc, SReg_64:$src))]
9006 +>;
9007 +
9008 +def SI_ELSE_BREAK : InstSI <
9009 @@ -20713,18 +22242,35 @@ index 0000000..005be96
9010 + [(int_SI_end_cf SReg_64:$saved)]
9011 +>;
9012 +
9013 ++def SI_KILL : InstSI <
9014 ++ (outs),
9015 ++ (ins VReg_32:$src),
9016 ++ "SI_KIL $src",
9017 ++ [(int_AMDGPU_kill VReg_32:$src)]
9018 ++>;
9019 ++
9020 +} // end mayLoad = 1, mayStore = 1, hasSideEffects = 1
9021 + // Uses = [EXEC], Defs = [EXEC]
9022 +
9023 +} // end IsCodeGenOnly, isPseudo
9024 +
9025 ++def : Pat<
9026 ++ (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2),
9027 ++ (V_CNDMASK_B32_e64 VReg_32:$src2, VReg_32:$src1, (V_CMP_GT_F32_e64 0, VReg_32:$src0))
9028 ++>;
9029 ++
9030 ++def : Pat <
9031 ++ (int_AMDGPU_kilp),
9032 ++ (SI_KILL (V_MOV_B32_e32 0xbf800000))
9033 ++>;
9034 ++
9035 +/* int_SI_vs_load_input */
9036 +def : Pat<
9037 + (int_SI_vs_load_input SReg_128:$tlst, IMM12bit:$attr_offset,
9038 + VReg_32:$buf_idx_vgpr),
9039 + (BUFFER_LOAD_FORMAT_XYZW imm:$attr_offset, 0, 1, 0, 0, 0,
9040 + VReg_32:$buf_idx_vgpr, SReg_128:$tlst,
9041 -+ 0, 0, (i32 SREG_LIT_0))
9042 ++ 0, 0, 0)
9043 +>;
9044 +
9045 +/* int_SI_export */
9046 @@ -20735,43 +22281,105 @@ index 0000000..005be96
9047 + VReg_32:$src0, VReg_32:$src1, VReg_32:$src2, VReg_32:$src3)
9048 +>;
9049 +
9050 -+/* int_SI_sample */
9051 ++
9052 ++/* int_SI_sample for simple 1D texture lookup */
9053 +def : Pat <
9054 -+ (int_SI_sample imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler),
9055 -+ (IMAGE_SAMPLE imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord,
9056 ++ (int_SI_sample imm:$writemask, (v1i32 VReg_32:$addr),
9057 ++ SReg_256:$rsrc, SReg_128:$sampler, imm),
9058 ++ (IMAGE_SAMPLE imm:$writemask, 0, 0, 0, 0, 0, 0, 0,
9059 ++ (i32 (COPY_TO_REGCLASS VReg_32:$addr, VReg_32)),
9060 + SReg_256:$rsrc, SReg_128:$sampler)
9061 +>;
9062 +
9063 -+/* int_SI_sample_lod */
9064 -+def : Pat <
9065 -+ (int_SI_sample_lod imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler),
9066 -+ (IMAGE_SAMPLE_L imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord,
9067 -+ SReg_256:$rsrc, SReg_128:$sampler)
9068 ++class SamplePattern<Intrinsic name, MIMG opcode, RegisterClass addr_class,
9069 ++ ValueType addr_type> : Pat <
9070 ++ (name imm:$writemask, (addr_type addr_class:$addr),
9071 ++ SReg_256:$rsrc, SReg_128:$sampler, imm),
9072 ++ (opcode imm:$writemask, 0, 0, 0, 0, 0, 0, 0,
9073 ++ (EXTRACT_SUBREG addr_class:$addr, sub0),
9074 ++ SReg_256:$rsrc, SReg_128:$sampler)
9075 +>;
9076 +
9077 -+/* int_SI_sample_bias */
9078 -+def : Pat <
9079 -+ (int_SI_sample_bias imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler),
9080 -+ (IMAGE_SAMPLE_B imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord,
9081 -+ SReg_256:$rsrc, SReg_128:$sampler)
9082 ++class SampleRectPattern<Intrinsic name, MIMG opcode, RegisterClass addr_class,
9083 ++ ValueType addr_type> : Pat <
9084 ++ (name imm:$writemask, (addr_type addr_class:$addr),
9085 ++ SReg_256:$rsrc, SReg_128:$sampler, TEX_RECT),
9086 ++ (opcode imm:$writemask, 1, 0, 0, 0, 0, 0, 0,
9087 ++ (EXTRACT_SUBREG addr_class:$addr, sub0),
9088 ++ SReg_256:$rsrc, SReg_128:$sampler)
9089 ++>;
9090 ++
9091 ++class SampleArrayPattern<Intrinsic name, MIMG opcode, RegisterClass addr_class,
9092 ++ ValueType addr_type> : Pat <
9093 ++ (name imm:$writemask, (addr_type addr_class:$addr),
9094 ++ SReg_256:$rsrc, SReg_128:$sampler, TEX_ARRAY),
9095 ++ (opcode imm:$writemask, 0, 0, 1, 0, 0, 0, 0,
9096 ++ (EXTRACT_SUBREG addr_class:$addr, sub0),
9097 ++ SReg_256:$rsrc, SReg_128:$sampler)
9098 ++>;
9099 ++
9100 ++class SampleShadowPattern<Intrinsic name, MIMG opcode,
9101 ++ RegisterClass addr_class, ValueType addr_type> : Pat <
9102 ++ (name imm:$writemask, (addr_type addr_class:$addr),
9103 ++ SReg_256:$rsrc, SReg_128:$sampler, TEX_SHADOW),
9104 ++ (opcode imm:$writemask, 0, 0, 0, 0, 0, 0, 0,
9105 ++ (EXTRACT_SUBREG addr_class:$addr, sub0),
9106 ++ SReg_256:$rsrc, SReg_128:$sampler)
9107 ++>;
9108 ++
9109 ++class SampleShadowArrayPattern<Intrinsic name, MIMG opcode,
9110 ++ RegisterClass addr_class, ValueType addr_type> : Pat <
9111 ++ (name imm:$writemask, (addr_type addr_class:$addr),
9112 ++ SReg_256:$rsrc, SReg_128:$sampler, TEX_SHADOW_ARRAY),
9113 ++ (opcode imm:$writemask, 0, 0, 1, 0, 0, 0, 0,
9114 ++ (EXTRACT_SUBREG addr_class:$addr, sub0),
9115 ++ SReg_256:$rsrc, SReg_128:$sampler)
9116 +>;
9117 +
9118 ++/* int_SI_sample* for texture lookups consuming more address parameters */
9119 ++multiclass SamplePatterns<RegisterClass addr_class, ValueType addr_type> {
9120 ++ def : SamplePattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>;
9121 ++ def : SampleRectPattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>;
9122 ++ def : SampleArrayPattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>;
9123 ++ def : SampleShadowPattern <int_SI_sample, IMAGE_SAMPLE_C, addr_class, addr_type>;
9124 ++ def : SampleShadowArrayPattern <int_SI_sample, IMAGE_SAMPLE_C, addr_class, addr_type>;
9125 ++
9126 ++ def : SamplePattern <int_SI_samplel, IMAGE_SAMPLE_L, addr_class, addr_type>;
9127 ++ def : SampleArrayPattern <int_SI_samplel, IMAGE_SAMPLE_L, addr_class, addr_type>;
9128 ++ def : SampleShadowPattern <int_SI_samplel, IMAGE_SAMPLE_C_L, addr_class, addr_type>;
9129 ++ def : SampleShadowArrayPattern <int_SI_samplel, IMAGE_SAMPLE_C_L, addr_class, addr_type>;
9130 ++
9131 ++ def : SamplePattern <int_SI_sampleb, IMAGE_SAMPLE_B, addr_class, addr_type>;
9132 ++ def : SampleArrayPattern <int_SI_sampleb, IMAGE_SAMPLE_B, addr_class, addr_type>;
9133 ++ def : SampleShadowPattern <int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_class, addr_type>;
9134 ++ def : SampleShadowArrayPattern <int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_class, addr_type>;
9135 ++}
9136 ++
9137 ++defm : SamplePatterns<VReg_64, v2i32>;
9138 ++defm : SamplePatterns<VReg_128, v4i32>;
9139 ++defm : SamplePatterns<VReg_256, v8i32>;
9140 ++defm : SamplePatterns<VReg_512, v16i32>;
9141 ++
9142 +def CLAMP_SI : CLAMP<VReg_32>;
9143 +def FABS_SI : FABS<VReg_32>;
9144 +def FNEG_SI : FNEG<VReg_32>;
9145 +
9146 -+def : Extract_Element <f32, v4f32, VReg_128, 0, sel_x>;
9147 -+def : Extract_Element <f32, v4f32, VReg_128, 1, sel_y>;
9148 -+def : Extract_Element <f32, v4f32, VReg_128, 2, sel_z>;
9149 -+def : Extract_Element <f32, v4f32, VReg_128, 3, sel_w>;
9150 ++def : Extract_Element <f32, v4f32, VReg_128, 0, sub0>;
9151 ++def : Extract_Element <f32, v4f32, VReg_128, 1, sub1>;
9152 ++def : Extract_Element <f32, v4f32, VReg_128, 2, sub2>;
9153 ++def : Extract_Element <f32, v4f32, VReg_128, 3, sub3>;
9154 +
9155 -+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 4, sel_x>;
9156 -+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 5, sel_y>;
9157 -+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 6, sel_z>;
9158 -+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 7, sel_w>;
9159 ++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 4, sub0>;
9160 ++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 5, sub1>;
9161 ++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 6, sub2>;
9162 ++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 7, sub3>;
9163 +
9164 ++def : Vector1_Build <v1i32, VReg_32, i32, VReg_32>;
9165 ++def : Vector2_Build <v2i32, VReg_64, i32, VReg_32>;
9166 +def : Vector_Build <v4f32, VReg_128, f32, VReg_32>;
9167 -+def : Vector_Build <v4i32, SReg_128, i32, SReg_32>;
9168 ++def : Vector_Build <v4i32, VReg_128, i32, VReg_32>;
9169 ++def : Vector8_Build <v8i32, VReg_256, i32, VReg_32>;
9170 ++def : Vector16_Build <v16i32, VReg_512, i32, VReg_32>;
9171 +
9172 +def : BitConvert <i32, f32, SReg_32>;
9173 +def : BitConvert <i32, f32, VReg_32>;
9174 @@ -20779,24 +22387,46 @@ index 0000000..005be96
9175 +def : BitConvert <f32, i32, SReg_32>;
9176 +def : BitConvert <f32, i32, VReg_32>;
9177 +
9178 ++/********** ================== **********/
9179 ++/********** Immediate Patterns **********/
9180 ++/********** ================== **********/
9181 ++
9182 ++def : Pat <
9183 ++ (i1 imm:$imm),
9184 ++ (S_MOV_B64 imm:$imm)
9185 ++>;
9186 ++
9187 ++def : Pat <
9188 ++ (i32 imm:$imm),
9189 ++ (V_MOV_B32_e32 imm:$imm)
9190 ++>;
9191 ++
9192 ++def : Pat <
9193 ++ (f32 fpimm:$imm),
9194 ++ (V_MOV_B32_e32 fpimm:$imm)
9195 ++>;
9196 ++
9197 +def : Pat <
9198 -+ (i64 (SIsreg1_bitcast SReg_1:$vcc)),
9199 -+ (S_MOV_B64 (COPY_TO_REGCLASS SReg_1:$vcc, SReg_64))
9200 ++ (i32 imm:$imm),
9201 ++ (S_MOV_B32 imm:$imm)
9202 +>;
9203 +
9204 +def : Pat <
9205 -+ (i1 (SIsreg1_bitcast SReg_64:$vcc)),
9206 -+ (COPY_TO_REGCLASS SReg_64:$vcc, SReg_1)
9207 ++ (f32 fpimm:$imm),
9208 ++ (S_MOV_B32 fpimm:$imm)
9209 +>;
9210 +
9211 +def : Pat <
9212 -+ (i64 (SIvcc_bitcast VCCReg:$vcc)),
9213 -+ (S_MOV_B64 (COPY_TO_REGCLASS VCCReg:$vcc, SReg_64))
9214 ++ (i64 InlineImm<i64>:$imm),
9215 ++ (S_MOV_B64 InlineImm<i64>:$imm)
9216 +>;
9217 +
9218 ++// i64 immediates aren't supported in hardware, split it into two 32bit values
9219 +def : Pat <
9220 -+ (i1 (SIvcc_bitcast SReg_64:$vcc)),
9221 -+ (COPY_TO_REGCLASS SReg_64:$vcc, VCCReg)
9222 ++ (i64 imm:$imm),
9223 ++ (INSERT_SUBREG (INSERT_SUBREG (i64 (IMPLICIT_DEF)),
9224 ++ (S_MOV_B32 (i32 (LO32 imm:$imm))), sub0),
9225 ++ (S_MOV_B32 (i32 (HI32 imm:$imm))), sub1)
9226 +>;
9227 +
9228 +/********** ===================== **********/
9229 @@ -20804,6 +22434,12 @@ index 0000000..005be96
9230 +/********** ===================== **********/
9231 +
9232 +def : Pat <
9233 ++ (int_SI_fs_interp_constant imm:$attr_chan, imm:$attr, SReg_32:$params),
9234 ++ (V_INTERP_MOV_F32 INTERP.P0, imm:$attr_chan, imm:$attr,
9235 ++ (S_MOV_B32 SReg_32:$params))
9236 ++>;
9237 ++
9238 ++def : Pat <
9239 + (int_SI_fs_interp_linear_center imm:$attr_chan, imm:$attr, SReg_32:$params),
9240 + (SI_INTERP (f32 LINEAR_CENTER_I), (f32 LINEAR_CENTER_J), imm:$attr_chan,
9241 + imm:$attr, SReg_32:$params)
9242 @@ -20861,56 +22497,95 @@ index 0000000..005be96
9243 +def : POW_Common <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_F32_e32, VReg_32>;
9244 +
9245 +def : Pat <
9246 -+ (int_AMDGPU_div AllReg_32:$src0, AllReg_32:$src1),
9247 -+ (V_MUL_LEGACY_F32_e32 AllReg_32:$src0, (V_RCP_LEGACY_F32_e32 AllReg_32:$src1))
9248 ++ (int_AMDGPU_div VSrc_32:$src0, VSrc_32:$src1),
9249 ++ (V_MUL_LEGACY_F32_e32 VSrc_32:$src0, (V_RCP_LEGACY_F32_e32 VSrc_32:$src1))
9250 +>;
9251 +
9252 +def : Pat<
9253 -+ (fdiv AllReg_32:$src0, AllReg_32:$src1),
9254 -+ (V_MUL_F32_e32 AllReg_32:$src0, (V_RCP_F32_e32 AllReg_32:$src1))
9255 ++ (fdiv VSrc_32:$src0, VSrc_32:$src1),
9256 ++ (V_MUL_F32_e32 VSrc_32:$src0, (V_RCP_F32_e32 VSrc_32:$src1))
9257 +>;
9258 +
9259 +def : Pat <
9260 -+ (int_AMDGPU_kilp),
9261 -+ (SI_KIL (V_MOV_IMM_I32 0xbf800000))
9262 ++ (fcos VSrc_32:$src0),
9263 ++ (V_COS_F32_e32 (V_MUL_F32_e32 VSrc_32:$src0, (V_MOV_B32_e32 CONST.TWO_PI_INV)))
9264 ++>;
9265 ++
9266 ++def : Pat <
9267 ++ (fsin VSrc_32:$src0),
9268 ++ (V_SIN_F32_e32 (V_MUL_F32_e32 VSrc_32:$src0, (V_MOV_B32_e32 CONST.TWO_PI_INV)))
9269 +>;
9270 +
9271 +def : Pat <
9272 + (int_AMDGPU_cube VReg_128:$src),
9273 + (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)),
9274 -+ (V_CUBETC_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
9275 -+ (EXTRACT_SUBREG VReg_128:$src, sel_y),
9276 -+ (EXTRACT_SUBREG VReg_128:$src, sel_z),
9277 -+ 0, 0, 0, 0), sel_x),
9278 -+ (V_CUBESC_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
9279 -+ (EXTRACT_SUBREG VReg_128:$src, sel_y),
9280 -+ (EXTRACT_SUBREG VReg_128:$src, sel_z),
9281 -+ 0, 0, 0, 0), sel_y),
9282 -+ (V_CUBEMA_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
9283 -+ (EXTRACT_SUBREG VReg_128:$src, sel_y),
9284 -+ (EXTRACT_SUBREG VReg_128:$src, sel_z),
9285 -+ 0, 0, 0, 0), sel_z),
9286 -+ (V_CUBEID_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
9287 -+ (EXTRACT_SUBREG VReg_128:$src, sel_y),
9288 -+ (EXTRACT_SUBREG VReg_128:$src, sel_z),
9289 -+ 0, 0, 0, 0), sel_w)
9290 ++ (V_CUBETC_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
9291 ++ (EXTRACT_SUBREG VReg_128:$src, sub1),
9292 ++ (EXTRACT_SUBREG VReg_128:$src, sub2),
9293 ++ 0, 0, 0, 0), sub0),
9294 ++ (V_CUBESC_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
9295 ++ (EXTRACT_SUBREG VReg_128:$src, sub1),
9296 ++ (EXTRACT_SUBREG VReg_128:$src, sub2),
9297 ++ 0, 0, 0, 0), sub1),
9298 ++ (V_CUBEMA_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
9299 ++ (EXTRACT_SUBREG VReg_128:$src, sub1),
9300 ++ (EXTRACT_SUBREG VReg_128:$src, sub2),
9301 ++ 0, 0, 0, 0), sub2),
9302 ++ (V_CUBEID_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
9303 ++ (EXTRACT_SUBREG VReg_128:$src, sub1),
9304 ++ (EXTRACT_SUBREG VReg_128:$src, sub2),
9305 ++ 0, 0, 0, 0), sub3)
9306 ++>;
9307 ++
9308 ++def : Pat <
9309 ++ (i32 (sext (i1 SReg_64:$src0))),
9310 ++ (V_CNDMASK_B32_e64 (i32 0), (i32 -1), SReg_64:$src0)
9311 +>;
9312 +
9313 +/********** ================== **********/
9314 +/********** VOP3 Patterns **********/
9315 +/********** ================== **********/
9316 +
9317 -+def : Pat <(f32 (IL_mad AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2)),
9318 -+ (V_MAD_LEGACY_F32 AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2,
9319 ++def : Pat <(f32 (IL_mad VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2)),
9320 ++ (V_MAD_LEGACY_F32 VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2,
9321 + 0, 0, 0, 0)>;
9322 +
9323 ++/********** ================== **********/
9324 ++/********** SMRD Patterns **********/
9325 ++/********** ================== **********/
9326 ++
9327 ++multiclass SMRD_Pattern <SMRD Instr_IMM, SMRD Instr_SGPR, ValueType vt> {
9328 ++ // 1. Offset as 8bit DWORD immediate
9329 ++ def : Pat <
9330 ++ (constant_load (SIadd64bit32bit SReg_64:$sbase, IMM8bitDWORD:$offset)),
9331 ++ (vt (Instr_IMM SReg_64:$sbase, IMM8bitDWORD:$offset))
9332 ++ >;
9333 ++
9334 ++ // 2. Offset loaded in an 32bit SGPR
9335 ++ def : Pat <
9336 ++ (constant_load (SIadd64bit32bit SReg_64:$sbase, imm:$offset)),
9337 ++ (vt (Instr_SGPR SReg_64:$sbase, (S_MOV_B32 imm:$offset)))
9338 ++ >;
9339 ++
9340 ++ // 3. No offset at all
9341 ++ def : Pat <
9342 ++ (constant_load SReg_64:$sbase),
9343 ++ (vt (Instr_IMM SReg_64:$sbase, 0))
9344 ++ >;
9345 ++}
9346 ++
9347 ++defm : SMRD_Pattern <S_LOAD_DWORD_IMM, S_LOAD_DWORD_SGPR, f32>;
9348 ++defm : SMRD_Pattern <S_LOAD_DWORD_IMM, S_LOAD_DWORD_SGPR, i32>;
9349 ++defm : SMRD_Pattern <S_LOAD_DWORDX4_IMM, S_LOAD_DWORDX4_SGPR, v4i32>;
9350 ++defm : SMRD_Pattern <S_LOAD_DWORDX8_IMM, S_LOAD_DWORDX8_SGPR, v8i32>;
9351 ++
9352 +} // End isSI predicate
9353 diff --git a/lib/Target/R600/SIIntrinsics.td b/lib/Target/R600/SIIntrinsics.td
9354 new file mode 100644
9355 -index 0000000..c322fef
9356 +index 0000000..611b9c4
9357 --- /dev/null
9358 +++ b/lib/Target/R600/SIIntrinsics.td
9359 -@@ -0,0 +1,52 @@
9360 +@@ -0,0 +1,54 @@
9361 +//===-- SIIntrinsics.td - SI Intrinsic defs ----------------*- tablegen -*-===//
9362 +//
9363 +// The LLVM Compiler Infrastructure
9364 @@ -20935,9 +22610,11 @@ index 0000000..c322fef
9365 + def int_SI_vs_load_input : Intrinsic <[llvm_v4f32_ty], [llvm_v4i32_ty, llvm_i16_ty, llvm_i32_ty], [IntrReadMem]> ;
9366 + def int_SI_wqm : Intrinsic <[], [], []>;
9367 +
9368 -+ def int_SI_sample : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>;
9369 -+ def int_SI_sample_bias : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>;
9370 -+ def int_SI_sample_lod : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>;
9371 ++ class Sample : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_anyvector_ty, llvm_v8i32_ty, llvm_v4i32_ty, llvm_i32_ty], [IntrReadMem]>;
9372 ++
9373 ++ def int_SI_sample : Sample;
9374 ++ def int_SI_sampleb : Sample;
9375 ++ def int_SI_samplel : Sample;
9376 +
9377 + /* Interpolation Intrinsics */
9378 +
9379 @@ -20965,10 +22642,10 @@ index 0000000..c322fef
9380 +}
9381 diff --git a/lib/Target/R600/SILowerControlFlow.cpp b/lib/Target/R600/SILowerControlFlow.cpp
9382 new file mode 100644
9383 -index 0000000..3fbe653
9384 +index 0000000..2007d30
9385 --- /dev/null
9386 +++ b/lib/Target/R600/SILowerControlFlow.cpp
9387 -@@ -0,0 +1,331 @@
9388 +@@ -0,0 +1,372 @@
9389 +//===-- SILowerControlFlow.cpp - Use predicates for control flow ----------===//
9390 +//
9391 +// The LLVM Compiler Infrastructure
9392 @@ -21039,7 +22716,10 @@ index 0000000..3fbe653
9393 + static char ID;
9394 + const TargetInstrInfo *TII;
9395 +
9396 -+ void Skip(MachineInstr &MI, MachineOperand &To);
9397 ++ bool shouldSkip(MachineBasicBlock *From, MachineBasicBlock *To);
9398 ++
9399 ++ void Skip(MachineInstr &From, MachineOperand &To);
9400 ++ void SkipIfDead(MachineInstr &MI);
9401 +
9402 + void If(MachineInstr &MI);
9403 + void Else(MachineInstr &MI);
9404 @@ -21049,6 +22729,7 @@ index 0000000..3fbe653
9405 + void Loop(MachineInstr &MI);
9406 + void EndCf(MachineInstr &MI);
9407 +
9408 ++ void Kill(MachineInstr &MI);
9409 + void Branch(MachineInstr &MI);
9410 +
9411 +public:
9412 @@ -21071,22 +22752,29 @@ index 0000000..3fbe653
9413 + return new SILowerControlFlowPass(tm);
9414 +}
9415 +
9416 -+void SILowerControlFlowPass::Skip(MachineInstr &From, MachineOperand &To) {
9417 ++bool SILowerControlFlowPass::shouldSkip(MachineBasicBlock *From,
9418 ++ MachineBasicBlock *To) {
9419 ++
9420 + unsigned NumInstr = 0;
9421 +
9422 -+ for (MachineBasicBlock *MBB = *From.getParent()->succ_begin();
9423 -+ NumInstr < SkipThreshold && MBB != To.getMBB() && !MBB->succ_empty();
9424 ++ for (MachineBasicBlock *MBB = From; MBB != To && !MBB->succ_empty();
9425 + MBB = *MBB->succ_begin()) {
9426 +
9427 + for (MachineBasicBlock::iterator I = MBB->begin(), E = MBB->end();
9428 + NumInstr < SkipThreshold && I != E; ++I) {
9429 +
9430 + if (I->isBundle() || !I->isBundled())
9431 -+ ++NumInstr;
9432 ++ if (++NumInstr >= SkipThreshold)
9433 ++ return true;
9434 + }
9435 + }
9436 +
9437 -+ if (NumInstr < SkipThreshold)
9438 ++ return false;
9439 ++}
9440 ++
9441 ++void SILowerControlFlowPass::Skip(MachineInstr &From, MachineOperand &To) {
9442 ++
9443 ++ if (!shouldSkip(*From.getParent()->succ_begin(), To.getMBB()))
9444 + return;
9445 +
9446 + DebugLoc DL = From.getDebugLoc();
9447 @@ -21095,6 +22783,38 @@ index 0000000..3fbe653
9448 + .addReg(AMDGPU::EXEC);
9449 +}
9450 +
9451 ++void SILowerControlFlowPass::SkipIfDead(MachineInstr &MI) {
9452 ++
9453 ++ MachineBasicBlock &MBB = *MI.getParent();
9454 ++ DebugLoc DL = MI.getDebugLoc();
9455 ++
9456 ++ if (!shouldSkip(&MBB, &MBB.getParent()->back()))
9457 ++ return;
9458 ++
9459 ++ MachineBasicBlock::iterator Insert = &MI;
9460 ++ ++Insert;
9461 ++
9462 ++ // If the exec mask is non-zero, skip the next two instructions
9463 ++ BuildMI(MBB, Insert, DL, TII->get(AMDGPU::S_CBRANCH_EXECNZ))
9464 ++ .addImm(3)
9465 ++ .addReg(AMDGPU::EXEC);
9466 ++
9467 ++ // Exec mask is zero: Export to NULL target...
9468 ++ BuildMI(MBB, Insert, DL, TII->get(AMDGPU::EXP))
9469 ++ .addImm(0)
9470 ++ .addImm(0x09) // V_008DFC_SQ_EXP_NULL
9471 ++ .addImm(0)
9472 ++ .addImm(1)
9473 ++ .addImm(1)
9474 ++ .addReg(AMDGPU::VGPR0)
9475 ++ .addReg(AMDGPU::VGPR0)
9476 ++ .addReg(AMDGPU::VGPR0)
9477 ++ .addReg(AMDGPU::VGPR0);
9478 ++
9479 ++ // ... and terminate wavefront
9480 ++ BuildMI(MBB, Insert, DL, TII->get(AMDGPU::S_ENDPGM));
9481 ++}
9482 ++
9483 +void SILowerControlFlowPass::If(MachineInstr &MI) {
9484 + MachineBasicBlock &MBB = *MI.getParent();
9485 + DebugLoc DL = MI.getDebugLoc();
9486 @@ -21213,8 +22933,28 @@ index 0000000..3fbe653
9487 + assert(0);
9488 +}
9489 +
9490 ++void SILowerControlFlowPass::Kill(MachineInstr &MI) {
9491 ++
9492 ++ MachineBasicBlock &MBB = *MI.getParent();
9493 ++ DebugLoc DL = MI.getDebugLoc();
9494 ++
9495 ++ // Kill is only allowed in pixel shaders
9496 ++ MachineFunction &MF = *MBB.getParent();
9497 ++ SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
9498 ++ assert(Info->ShaderType == ShaderType::PIXEL);
9499 ++
9500 ++ // Clear this pixel from the exec mask if the operand is negative
9501 ++ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::V_CMPX_LE_F32_e32), AMDGPU::VCC)
9502 ++ .addImm(0)
9503 ++ .addOperand(MI.getOperand(0));
9504 ++
9505 ++ MI.eraseFromParent();
9506 ++}
9507 ++
9508 +bool SILowerControlFlowPass::runOnMachineFunction(MachineFunction &MF) {
9509 -+ bool HaveCf = false;
9510 ++
9511 ++ bool HaveKill = false;
9512 ++ unsigned Depth = 0;
9513 +
9514 + for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
9515 + BI != BE; ++BI) {
9516 @@ -21228,6 +22968,7 @@ index 0000000..3fbe653
9517 + switch (MI.getOpcode()) {
9518 + default: break;
9519 + case AMDGPU::SI_IF:
9520 ++ ++Depth;
9521 + If(MI);
9522 + break;
9523 +
9524 @@ -21248,171 +22989,34 @@ index 0000000..3fbe653
9525 + break;
9526 +
9527 + case AMDGPU::SI_LOOP:
9528 ++ ++Depth;
9529 + Loop(MI);
9530 + break;
9531 +
9532 -+ case AMDGPU::SI_END_CF:
9533 -+ HaveCf = true;
9534 -+ EndCf(MI);
9535 -+ break;
9536 -+
9537 -+ case AMDGPU::S_BRANCH:
9538 -+ Branch(MI);
9539 -+ break;
9540 -+ }
9541 -+ }
9542 -+ }
9543 -+
9544 -+ // TODO: What is this good for?
9545 -+ unsigned ShaderType = MF.getInfo<SIMachineFunctionInfo>()->ShaderType;
9546 -+ if (HaveCf && ShaderType == ShaderType::PIXEL) {
9547 -+ for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
9548 -+ BI != BE; ++BI) {
9549 -+
9550 -+ MachineBasicBlock &MBB = *BI;
9551 -+ if (MBB.succ_empty()) {
9552 -+
9553 -+ MachineInstr &MI = *MBB.getFirstNonPHI();
9554 -+ DebugLoc DL = MI.getDebugLoc();
9555 -+
9556 -+ // If the exec mask is non-zero, skip the next two instructions
9557 -+ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_CBRANCH_EXECNZ))
9558 -+ .addImm(3)
9559 -+ .addReg(AMDGPU::EXEC);
9560 -+
9561 -+ // Exec mask is zero: Export to NULL target...
9562 -+ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::EXP))
9563 -+ .addImm(0)
9564 -+ .addImm(0x09) // V_008DFC_SQ_EXP_NULL
9565 -+ .addImm(0)
9566 -+ .addImm(1)
9567 -+ .addImm(1)
9568 -+ .addReg(AMDGPU::SREG_LIT_0)
9569 -+ .addReg(AMDGPU::SREG_LIT_0)
9570 -+ .addReg(AMDGPU::SREG_LIT_0)
9571 -+ .addReg(AMDGPU::SREG_LIT_0);
9572 -+
9573 -+ // ... and terminate wavefront
9574 -+ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_ENDPGM));
9575 -+ }
9576 -+ }
9577 -+ }
9578 -+
9579 -+ return true;
9580 -+}
9581 -diff --git a/lib/Target/R600/SILowerLiteralConstants.cpp b/lib/Target/R600/SILowerLiteralConstants.cpp
9582 -new file mode 100644
9583 -index 0000000..c0411e9
9584 ---- /dev/null
9585 -+++ b/lib/Target/R600/SILowerLiteralConstants.cpp
9586 -@@ -0,0 +1,108 @@
9587 -+//===-- SILowerLiteralConstants.cpp - Lower intrs using literal constants--===//
9588 -+//
9589 -+// The LLVM Compiler Infrastructure
9590 -+//
9591 -+// This file is distributed under the University of Illinois Open Source
9592 -+// License. See LICENSE.TXT for details.
9593 -+//
9594 -+//===----------------------------------------------------------------------===//
9595 -+//
9596 -+/// \file
9597 -+/// \brief This pass performs the following transformation on instructions with
9598 -+/// literal constants:
9599 -+///
9600 -+/// %VGPR0 = V_MOV_IMM_I32 1
9601 -+///
9602 -+/// becomes:
9603 -+///
9604 -+/// BUNDLE
9605 -+/// * %VGPR = V_MOV_B32_32 SI_LITERAL_CONSTANT
9606 -+/// * SI_LOAD_LITERAL 1
9607 -+///
9608 -+/// The resulting sequence matches exactly how the hardware handles immediate
9609 -+/// operands, so this transformation greatly simplifies the code generator.
9610 -+///
9611 -+/// Only the *_MOV_IMM_* support immediate operands at the moment, but when
9612 -+/// support for immediate operands is added to other instructions, they
9613 -+/// will be lowered here as well.
9614 -+//===----------------------------------------------------------------------===//
9615 -+
9616 -+#include "AMDGPU.h"
9617 -+#include "llvm/CodeGen/MachineFunction.h"
9618 -+#include "llvm/CodeGen/MachineFunctionPass.h"
9619 -+#include "llvm/CodeGen/MachineInstrBuilder.h"
9620 -+#include "llvm/CodeGen/MachineInstrBundle.h"
9621 -+
9622 -+using namespace llvm;
9623 -+
9624 -+namespace {
9625 -+
9626 -+class SILowerLiteralConstantsPass : public MachineFunctionPass {
9627 -+
9628 -+private:
9629 -+ static char ID;
9630 -+ const TargetInstrInfo *TII;
9631 -+
9632 -+public:
9633 -+ SILowerLiteralConstantsPass(TargetMachine &tm) :
9634 -+ MachineFunctionPass(ID), TII(tm.getInstrInfo()) { }
9635 -+
9636 -+ virtual bool runOnMachineFunction(MachineFunction &MF);
9637 -+
9638 -+ const char *getPassName() const {
9639 -+ return "SI Lower literal constants pass";
9640 -+ }
9641 -+};
9642 -+
9643 -+} // End anonymous namespace
9644 -+
9645 -+char SILowerLiteralConstantsPass::ID = 0;
9646 -+
9647 -+FunctionPass *llvm::createSILowerLiteralConstantsPass(TargetMachine &tm) {
9648 -+ return new SILowerLiteralConstantsPass(tm);
9649 -+}
9650 -+
9651 -+bool SILowerLiteralConstantsPass::runOnMachineFunction(MachineFunction &MF) {
9652 -+ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
9653 -+ BB != BB_E; ++BB) {
9654 -+ MachineBasicBlock &MBB = *BB;
9655 -+ for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
9656 -+ I != MBB.end(); I = Next) {
9657 -+ Next = llvm::next(I);
9658 -+ MachineInstr &MI = *I;
9659 -+ switch (MI.getOpcode()) {
9660 -+ default: break;
9661 -+ case AMDGPU::S_MOV_IMM_I32:
9662 -+ case AMDGPU::S_MOV_IMM_I64:
9663 -+ case AMDGPU::V_MOV_IMM_F32:
9664 -+ case AMDGPU::V_MOV_IMM_I32: {
9665 -+ unsigned MovOpcode;
9666 -+ unsigned LoadLiteralOpcode;
9667 -+ MachineOperand LiteralOp = MI.getOperand(1);
9668 -+ if (AMDGPU::VReg_32RegClass.contains(MI.getOperand(0).getReg())) {
9669 -+ MovOpcode = AMDGPU::V_MOV_B32_e32;
9670 -+ } else {
9671 -+ MovOpcode = AMDGPU::S_MOV_B32;
9672 -+ }
9673 -+ if (LiteralOp.isImm()) {
9674 -+ LoadLiteralOpcode = AMDGPU::SI_LOAD_LITERAL_I32;
9675 -+ } else {
9676 -+ LoadLiteralOpcode = AMDGPU::SI_LOAD_LITERAL_F32;
9677 ++ case AMDGPU::SI_END_CF:
9678 ++ if (--Depth == 0 && HaveKill) {
9679 ++ SkipIfDead(MI);
9680 ++ HaveKill = false;
9681 + }
9682 -+ MIBundleBuilder Bundle(MBB, I);
9683 -+ Bundle
9684 -+ .append(BuildMI(MF, MBB.findDebugLoc(I), TII->get(MovOpcode),
9685 -+ MI.getOperand(0).getReg())
9686 -+ .addReg(AMDGPU::SI_LITERAL_CONSTANT))
9687 -+ .append(BuildMI(MF, MBB.findDebugLoc(I),
9688 -+ TII->get(LoadLiteralOpcode))
9689 -+ .addOperand(MI.getOperand(1)));
9690 -+ llvm::finalizeBundle(MBB, Bundle.begin());
9691 -+ MI.eraseFromParent();
9692 ++ EndCf(MI);
9693 ++ break;
9694 ++
9695 ++ case AMDGPU::SI_KILL:
9696 ++ if (Depth == 0)
9697 ++ SkipIfDead(MI);
9698 ++ else
9699 ++ HaveKill = true;
9700 ++ Kill(MI);
9701 ++ break;
9702 ++
9703 ++ case AMDGPU::S_BRANCH:
9704 ++ Branch(MI);
9705 + break;
9706 -+ }
9707 + }
9708 + }
9709 + }
9710 -+ return false;
9711 ++
9712 ++ return true;
9713 +}
9714 diff --git a/lib/Target/R600/SIMachineFunctionInfo.cpp b/lib/Target/R600/SIMachineFunctionInfo.cpp
9715 new file mode 100644
9716 @@ -21589,24 +23193,10 @@ index 0000000..40171e4
9717 +#endif // SIREGISTERINFO_H_
9718 diff --git a/lib/Target/R600/SIRegisterInfo.td b/lib/Target/R600/SIRegisterInfo.td
9719 new file mode 100644
9720 -index 0000000..c3f1361
9721 +index 0000000..ab36b87
9722 --- /dev/null
9723 +++ b/lib/Target/R600/SIRegisterInfo.td
9724 -@@ -0,0 +1,167 @@
9725 -+
9726 -+let Namespace = "AMDGPU" in {
9727 -+ def low : SubRegIndex;
9728 -+ def high : SubRegIndex;
9729 -+
9730 -+ def sub0 : SubRegIndex;
9731 -+ def sub1 : SubRegIndex;
9732 -+ def sub2 : SubRegIndex;
9733 -+ def sub3 : SubRegIndex;
9734 -+ def sub4 : SubRegIndex;
9735 -+ def sub5 : SubRegIndex;
9736 -+ def sub6 : SubRegIndex;
9737 -+ def sub7 : SubRegIndex;
9738 -+}
9739 +@@ -0,0 +1,190 @@
9740 +
9741 +class SIReg <string n, bits<16> encoding = 0> : Register<n> {
9742 + let Namespace = "AMDGPU";
9743 @@ -21615,13 +23205,15 @@ index 0000000..c3f1361
9744 +
9745 +class SI_64 <string n, list<Register> subregs, bits<16> encoding> : RegisterWithSubRegs<n, subregs> {
9746 + let Namespace = "AMDGPU";
9747 -+ let SubRegIndices = [low, high];
9748 ++ let SubRegIndices = [sub0, sub1];
9749 + let HWEncoding = encoding;
9750 +}
9751 +
9752 +class SGPR_32 <bits<16> num, string name> : SIReg<name, num>;
9753 +
9754 -+class VGPR_32 <bits<16> num, string name> : SIReg<name, num>;
9755 ++class VGPR_32 <bits<16> num, string name> : SIReg<name, num> {
9756 ++ let HWEncoding{8} = 1;
9757 ++}
9758 +
9759 +// Special Registers
9760 +def VCC : SIReg<"VCC", 106>;
9761 @@ -21629,8 +23221,6 @@ index 0000000..c3f1361
9762 +def EXEC_HI : SIReg <"EXEC HI", 127>;
9763 +def EXEC : SI_64<"EXEC", [EXEC_LO, EXEC_HI], 126>;
9764 +def SCC : SIReg<"SCC", 253>;
9765 -+def SREG_LIT_0 : SIReg <"S LIT 0", 128>;
9766 -+def SI_LITERAL_CONSTANT : SIReg<"LITERAL CONSTANT", 255>;
9767 +def M0 : SIReg <"M0", 124>;
9768 +
9769 +//Interpolation registers
9770 @@ -21668,12 +23258,12 @@ index 0000000..c3f1361
9771 + (add (sequence "SGPR%u", 0, 101))>;
9772 +
9773 +// SGPR 64-bit registers
9774 -+def SGPR_64 : RegisterTuples<[low, high],
9775 ++def SGPR_64 : RegisterTuples<[sub0, sub1],
9776 + [(add (decimate SGPR_32, 2)),
9777 + (add(decimate (rotl SGPR_32, 1), 2))]>;
9778 +
9779 +// SGPR 128-bit registers
9780 -+def SGPR_128 : RegisterTuples<[sel_x, sel_y, sel_z, sel_w],
9781 ++def SGPR_128 : RegisterTuples<[sub0, sub1, sub2, sub3],
9782 + [(add (decimate SGPR_32, 4)),
9783 + (add (decimate (rotl SGPR_32, 1), 4)),
9784 + (add (decimate (rotl SGPR_32, 2), 4)),
9785 @@ -21699,32 +23289,61 @@ index 0000000..c3f1361
9786 + (add (sequence "VGPR%u", 0, 255))>;
9787 +
9788 +// VGPR 64-bit registers
9789 -+def VGPR_64 : RegisterTuples<[low, high],
9790 ++def VGPR_64 : RegisterTuples<[sub0, sub1],
9791 + [(add VGPR_32),
9792 + (add (rotl VGPR_32, 1))]>;
9793 +
9794 +// VGPR 128-bit registers
9795 -+def VGPR_128 : RegisterTuples<[sel_x, sel_y, sel_z, sel_w],
9796 ++def VGPR_128 : RegisterTuples<[sub0, sub1, sub2, sub3],
9797 + [(add VGPR_32),
9798 + (add (rotl VGPR_32, 1)),
9799 + (add (rotl VGPR_32, 2)),
9800 + (add (rotl VGPR_32, 3))]>;
9801 +
9802 ++// VGPR 256-bit registers
9803 ++def VGPR_256 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7],
9804 ++ [(add VGPR_32),
9805 ++ (add (rotl VGPR_32, 1)),
9806 ++ (add (rotl VGPR_32, 2)),
9807 ++ (add (rotl VGPR_32, 3)),
9808 ++ (add (rotl VGPR_32, 4)),
9809 ++ (add (rotl VGPR_32, 5)),
9810 ++ (add (rotl VGPR_32, 6)),
9811 ++ (add (rotl VGPR_32, 7))]>;
9812 ++
9813 ++// VGPR 512-bit registers
9814 ++def VGPR_512 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7,
9815 ++ sub8, sub9, sub10, sub11, sub12, sub13, sub14, sub15],
9816 ++ [(add VGPR_32),
9817 ++ (add (rotl VGPR_32, 1)),
9818 ++ (add (rotl VGPR_32, 2)),
9819 ++ (add (rotl VGPR_32, 3)),
9820 ++ (add (rotl VGPR_32, 4)),
9821 ++ (add (rotl VGPR_32, 5)),
9822 ++ (add (rotl VGPR_32, 6)),
9823 ++ (add (rotl VGPR_32, 7)),
9824 ++ (add (rotl VGPR_32, 8)),
9825 ++ (add (rotl VGPR_32, 9)),
9826 ++ (add (rotl VGPR_32, 10)),
9827 ++ (add (rotl VGPR_32, 11)),
9828 ++ (add (rotl VGPR_32, 12)),
9829 ++ (add (rotl VGPR_32, 13)),
9830 ++ (add (rotl VGPR_32, 14)),
9831 ++ (add (rotl VGPR_32, 15))]>;
9832 ++
9833 +// Register class for all scalar registers (SGPRs + Special Registers)
9834 +def SReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32,
9835 -+ (add SGPR_32, SREG_LIT_0, M0, EXEC_LO, EXEC_HI)
9836 ++ (add SGPR_32, M0, EXEC_LO, EXEC_HI)
9837 +>;
9838 +
9839 -+def SReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add SGPR_64, VCC, EXEC)>;
9840 -+
9841 -+def SReg_1 : RegisterClass<"AMDGPU", [i1], 1, (add VCC, SGPR_64, EXEC)>;
9842 ++def SReg_64 : RegisterClass<"AMDGPU", [i1, i64], 64, (add SGPR_64, VCC, EXEC)>;
9843 +
9844 +def SReg_128 : RegisterClass<"AMDGPU", [v4f32, v4i32], 128, (add SGPR_128)>;
9845 +
9846 +def SReg_256 : RegisterClass<"AMDGPU", [v8i32], 256, (add SGPR_256)>;
9847 +
9848 +// Register class for all vector registers (VGPRs + Interploation Registers)
9849 -+def VReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32,
9850 ++def VReg_32 : RegisterClass<"AMDGPU", [f32, i32, v1i32], 32,
9851 + (add VGPR_32,
9852 + PERSP_SAMPLE_I, PERSP_SAMPLE_J,
9853 + PERSP_CENTER_I, PERSP_CENTER_J,
9854 @@ -21745,14 +23364,22 @@ index 0000000..c3f1361
9855 + )
9856 +>;
9857 +
9858 -+def VReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add VGPR_64)>;
9859 ++def VReg_64 : RegisterClass<"AMDGPU", [i64, v2i32], 64, (add VGPR_64)>;
9860 ++
9861 ++def VReg_128 : RegisterClass<"AMDGPU", [v4f32, v4i32], 128, (add VGPR_128)>;
9862 ++
9863 ++def VReg_256 : RegisterClass<"AMDGPU", [v8i32], 256, (add VGPR_256)>;
9864 +
9865 -+def VReg_128 : RegisterClass<"AMDGPU", [v4f32], 128, (add VGPR_128)>;
9866 ++def VReg_512 : RegisterClass<"AMDGPU", [v16i32], 512, (add VGPR_512)>;
9867 +
9868 -+// AllReg_* - A set of all scalar and vector registers of a given width.
9869 -+def AllReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32, (add VReg_32, SReg_32)>;
9870 ++// [SV]Src_* operands can have either an immediate or an register
9871 ++def SSrc_32 : RegisterClass<"AMDGPU", [i32, f32], 32, (add SReg_32)>;
9872 +
9873 -+def AllReg_64 : RegisterClass<"AMDGPU", [f64, i64], 64, (add SReg_64, VReg_64)>;
9874 ++def SSrc_64 : RegisterClass<"AMDGPU", [i1, i64], 64, (add SReg_64)>;
9875 ++
9876 ++def VSrc_32 : RegisterClass<"AMDGPU", [i32, f32], 32, (add VReg_32, SReg_32)>;
9877 ++
9878 ++def VSrc_64 : RegisterClass<"AMDGPU", [i64], 64, (add SReg_64, VReg_64)>;
9879 +
9880 +// Special register classes for predicates and the M0 register
9881 +def SCCReg : RegisterClass<"AMDGPU", [i1], 1, (add SCC)>;
9882 @@ -21876,6 +23503,30 @@ index 0000000..b8ac4e7
9883 +CPPFLAGS = -I$(PROJ_OBJ_DIR)/.. -I$(PROJ_SRC_DIR)/..
9884 +
9885 +include $(LEVEL)/Makefile.common
9886 +diff --git a/test/CodeGen/R600/128bit-kernel-args.ll b/test/CodeGen/R600/128bit-kernel-args.ll
9887 +new file mode 100644
9888 +index 0000000..114f9e7
9889 +--- /dev/null
9890 ++++ b/test/CodeGen/R600/128bit-kernel-args.ll
9891 +@@ -0,0 +1,18 @@
9892 ++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
9893 ++
9894 ++; CHECK: @v4i32_kernel_arg
9895 ++; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
9896 ++
9897 ++define void @v4i32_kernel_arg(<4 x i32> addrspace(1)* %out, <4 x i32> %in) {
9898 ++entry:
9899 ++ store <4 x i32> %in, <4 x i32> addrspace(1)* %out
9900 ++ ret void
9901 ++}
9902 ++
9903 ++; CHECK: @v4f32_kernel_arg
9904 ++; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
9905 ++define void @v4f32_kernel_args(<4 x float> addrspace(1)* %out, <4 x float> %in) {
9906 ++entry:
9907 ++ store <4 x float> %in, <4 x float> addrspace(1)* %out
9908 ++ ret void
9909 ++}
9910 diff --git a/test/CodeGen/R600/add.v4i32.ll b/test/CodeGen/R600/add.v4i32.ll
9911 new file mode 100644
9912 index 0000000..ac4a874
9913 @@ -21918,6 +23569,82 @@ index 0000000..662085e
9914 + store <4 x i32> %result, <4 x i32> addrspace(1)* %out
9915 + ret void
9916 +}
9917 +diff --git a/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll b/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll
9918 +new file mode 100644
9919 +index 0000000..fd958b3
9920 +--- /dev/null
9921 ++++ b/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll
9922 +@@ -0,0 +1,36 @@
9923 ++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
9924 ++
9925 ++; This test is for a bug in
9926 ++; DAGCombiner::reduceBuildVecConvertToConvertBuildVec() where
9927 ++; the wrong type was being passed to
9928 ++; TargetLowering::getOperationAction() when checking the legality of
9929 ++; ISD::UINT_TO_FP and ISD::SINT_TO_FP opcodes.
9930 ++
9931 ++
9932 ++; CHECK: @sint
9933 ++; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
9934 ++
9935 ++define void @sint(<4 x float> addrspace(1)* %out, i32 addrspace(1)* %in) {
9936 ++entry:
9937 ++ %ptr = getelementptr i32 addrspace(1)* %in, i32 1
9938 ++ %sint = load i32 addrspace(1) * %in
9939 ++ %conv = sitofp i32 %sint to float
9940 ++ %0 = insertelement <4 x float> undef, float %conv, i32 0
9941 ++ %splat = shufflevector <4 x float> %0, <4 x float> undef, <4 x i32> zeroinitializer
9942 ++ store <4 x float> %splat, <4 x float> addrspace(1)* %out
9943 ++ ret void
9944 ++}
9945 ++
9946 ++;CHECK: @uint
9947 ++;CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
9948 ++
9949 ++define void @uint(<4 x float> addrspace(1)* %out, i32 addrspace(1)* %in) {
9950 ++entry:
9951 ++ %ptr = getelementptr i32 addrspace(1)* %in, i32 1
9952 ++ %uint = load i32 addrspace(1) * %in
9953 ++ %conv = uitofp i32 %uint to float
9954 ++ %0 = insertelement <4 x float> undef, float %conv, i32 0
9955 ++ %splat = shufflevector <4 x float> %0, <4 x float> undef, <4 x i32> zeroinitializer
9956 ++ store <4 x float> %splat, <4 x float> addrspace(1)* %out
9957 ++ ret void
9958 ++}
9959 +diff --git a/test/CodeGen/R600/disconnected-predset-break-bug.ll b/test/CodeGen/R600/disconnected-predset-break-bug.ll
9960 +new file mode 100644
9961 +index 0000000..a586742
9962 +--- /dev/null
9963 ++++ b/test/CodeGen/R600/disconnected-predset-break-bug.ll
9964 +@@ -0,0 +1,28 @@
9965 ++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
9966 ++
9967 ++; PRED_SET* instructions must be tied to any instruction that uses their
9968 ++; result. This tests that there are no instructions between the PRED_SET*
9969 ++; and the PREDICATE_BREAK in this loop.
9970 ++
9971 ++; CHECK: @loop_ge
9972 ++; CHECK: WHILE
9973 ++; CHECK: PRED_SET
9974 ++; CHECK-NEXT: PREDICATED_BREAK
9975 ++define void @loop_ge(i32 addrspace(1)* nocapture %out, i32 %iterations) nounwind {
9976 ++entry:
9977 ++ %cmp5 = icmp sgt i32 %iterations, 0
9978 ++ br i1 %cmp5, label %for.body, label %for.end
9979 ++
9980 ++for.body: ; preds = %for.body, %entry
9981 ++ %i.07.in = phi i32 [ %i.07, %for.body ], [ %iterations, %entry ]
9982 ++ %ai.06 = phi i32 [ %add, %for.body ], [ 0, %entry ]
9983 ++ %i.07 = add nsw i32 %i.07.in, -1
9984 ++ %arrayidx = getelementptr inbounds i32 addrspace(1)* %out, i32 %ai.06
9985 ++ store i32 %i.07, i32 addrspace(1)* %arrayidx, align 4
9986 ++ %add = add nsw i32 %ai.06, 1
9987 ++ %exitcond = icmp eq i32 %add, %iterations
9988 ++ br i1 %exitcond, label %for.end, label %for.body
9989 ++
9990 ++for.end: ; preds = %for.body, %entry
9991 ++ ret void
9992 ++}
9993 diff --git a/test/CodeGen/R600/fabs.ll b/test/CodeGen/R600/fabs.ll
9994 new file mode 100644
9995 index 0000000..0407533
9996 @@ -22027,15 +23754,13 @@ index 0000000..5c981ef
9997 +}
9998 diff --git a/test/CodeGen/R600/fcmp.ll b/test/CodeGen/R600/fcmp.ll
9999 new file mode 100644
10000 -index 0000000..1dcd07c
10001 +index 0000000..89f5e9e
10002 --- /dev/null
10003 +++ b/test/CodeGen/R600/fcmp.ll
10004 -@@ -0,0 +1,16 @@
10005 +@@ -0,0 +1,14 @@
10006 +;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
10007 +
10008 -+;CHECK: SETE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10009 -+;CHECK: MOV T{{[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}
10010 -+;CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10011 ++;CHECK: SETE_DX10 T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10012 +
10013 +define void @test(i32 addrspace(1)* %out, float addrspace(1)* %in) {
10014 +entry:
10015 @@ -22183,14 +23908,13 @@ index 0000000..6d44a0c
10016 +}
10017 diff --git a/test/CodeGen/R600/fsub.ll b/test/CodeGen/R600/fsub.ll
10018 new file mode 100644
10019 -index 0000000..0ec1c37
10020 +index 0000000..591aa52
10021 --- /dev/null
10022 +++ b/test/CodeGen/R600/fsub.ll
10023 -@@ -0,0 +1,17 @@
10024 +@@ -0,0 +1,16 @@
10025 +;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
10026 +
10027 -+; CHECK: MOV T{{[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}
10028 -+; CHECK: ADD T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10029 ++; CHECK: ADD T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}
10030 +
10031 +define void @test() {
10032 + %r0 = call float @llvm.R600.load.input(i32 0)
10033 @@ -22266,6 +23990,64 @@ index 0000000..aad44d9
10034 + store i32 %value, i32 addrspace(1)* %out
10035 + ret void
10036 +}
10037 +diff --git a/test/CodeGen/R600/kcache-fold.ll b/test/CodeGen/R600/kcache-fold.ll
10038 +new file mode 100644
10039 +index 0000000..382f78c
10040 +--- /dev/null
10041 ++++ b/test/CodeGen/R600/kcache-fold.ll
10042 +@@ -0,0 +1,52 @@
10043 ++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
10044 ++
10045 ++; CHECK: MOV T{{[0-9]+\.[XYZW], CBuf0\[[0-9]+\]\.[XYZW]}}
10046 ++
10047 ++define void @main() {
10048 ++main_body:
10049 ++ %0 = load <4 x float> addrspace(9)* null
10050 ++ %1 = extractelement <4 x float> %0, i32 0
10051 ++ %2 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
10052 ++ %3 = extractelement <4 x float> %2, i32 0
10053 ++ %4 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
10054 ++ %5 = extractelement <4 x float> %4, i32 0
10055 ++ %6 = fcmp ult float %1, 0.000000e+00
10056 ++ %7 = select i1 %6, float %3, float %5
10057 ++ %8 = load <4 x float> addrspace(9)* null
10058 ++ %9 = extractelement <4 x float> %8, i32 1
10059 ++ %10 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
10060 ++ %11 = extractelement <4 x float> %10, i32 1
10061 ++ %12 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
10062 ++ %13 = extractelement <4 x float> %12, i32 1
10063 ++ %14 = fcmp ult float %9, 0.000000e+00
10064 ++ %15 = select i1 %14, float %11, float %13
10065 ++ %16 = load <4 x float> addrspace(9)* null
10066 ++ %17 = extractelement <4 x float> %16, i32 2
10067 ++ %18 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
10068 ++ %19 = extractelement <4 x float> %18, i32 2
10069 ++ %20 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
10070 ++ %21 = extractelement <4 x float> %20, i32 2
10071 ++ %22 = fcmp ult float %17, 0.000000e+00
10072 ++ %23 = select i1 %22, float %19, float %21
10073 ++ %24 = load <4 x float> addrspace(9)* null
10074 ++ %25 = extractelement <4 x float> %24, i32 3
10075 ++ %26 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
10076 ++ %27 = extractelement <4 x float> %26, i32 3
10077 ++ %28 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
10078 ++ %29 = extractelement <4 x float> %28, i32 3
10079 ++ %30 = fcmp ult float %25, 0.000000e+00
10080 ++ %31 = select i1 %30, float %27, float %29
10081 ++ %32 = call float @llvm.AMDIL.clamp.(float %7, float 0.000000e+00, float 1.000000e+00)
10082 ++ %33 = call float @llvm.AMDIL.clamp.(float %15, float 0.000000e+00, float 1.000000e+00)
10083 ++ %34 = call float @llvm.AMDIL.clamp.(float %23, float 0.000000e+00, float 1.000000e+00)
10084 ++ %35 = call float @llvm.AMDIL.clamp.(float %31, float 0.000000e+00, float 1.000000e+00)
10085 ++ %36 = insertelement <4 x float> undef, float %32, i32 0
10086 ++ %37 = insertelement <4 x float> %36, float %33, i32 1
10087 ++ %38 = insertelement <4 x float> %37, float %34, i32 2
10088 ++ %39 = insertelement <4 x float> %38, float %35, i32 3
10089 ++ call void @llvm.R600.store.swizzle(<4 x float> %39, i32 0, i32 0)
10090 ++ ret void
10091 ++}
10092 ++
10093 ++declare float @llvm.AMDIL.clamp.(float, float, float) readnone
10094 ++declare void @llvm.R600.store.swizzle(<4 x float>, i32, i32)
10095 diff --git a/test/CodeGen/R600/lit.local.cfg b/test/CodeGen/R600/lit.local.cfg
10096 new file mode 100644
10097 index 0000000..36ee493
10098 @@ -22287,10 +24069,10 @@ index 0000000..36ee493
10099 +
10100 diff --git a/test/CodeGen/R600/literals.ll b/test/CodeGen/R600/literals.ll
10101 new file mode 100644
10102 -index 0000000..4c731b2
10103 +index 0000000..be62342
10104 --- /dev/null
10105 +++ b/test/CodeGen/R600/literals.ll
10106 -@@ -0,0 +1,30 @@
10107 +@@ -0,0 +1,32 @@
10108 +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
10109 +
10110 +; Test using an integer literal constant.
10111 @@ -22299,6 +24081,7 @@ index 0000000..4c731b2
10112 +; or
10113 +; ADD_INT literal.x REG, 5
10114 +
10115 ++; CHECK; @i32_literal
10116 +; CHECK: ADD_INT {{[A-Z0-9,. ]*}}literal.x,{{[A-Z0-9,. ]*}} 5
10117 +define void @i32_literal(i32 addrspace(1)* %out, i32 %in) {
10118 +entry:
10119 @@ -22313,6 +24096,7 @@ index 0000000..4c731b2
10120 +; or
10121 +; ADD literal.x REG, 5.0
10122 +
10123 ++; CHECK: @float_literal
10124 +; CHECK: ADD {{[A-Z0-9,. ]*}}literal.x,{{[A-Z0-9,. ]*}} {{[0-9]+}}(5.0
10125 +define void @float_literal(float addrspace(1)* %out, float %in) {
10126 +entry:
10127 @@ -22366,6 +24150,35 @@ index 0000000..fac957f
10128 +declare void @llvm.AMDGPU.store.output(float, i32)
10129 +
10130 +declare float @llvm.AMDGPU.trunc(float ) readnone
10131 +diff --git a/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll b/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll
10132 +new file mode 100644
10133 +index 0000000..0c19f14
10134 +--- /dev/null
10135 ++++ b/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll
10136 +@@ -0,0 +1,23 @@
10137 ++;RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s
10138 ++
10139 ++;CHECK: S_MOV_B32
10140 ++;CHECK-NEXT: V_INTERP_MOV_F32
10141 ++
10142 ++define void @main() {
10143 ++main_body:
10144 ++ call void @llvm.AMDGPU.shader.type(i32 0)
10145 ++ %0 = load i32 addrspace(8)* inttoptr (i32 6 to i32 addrspace(8)*)
10146 ++ %1 = call float @llvm.SI.fs.interp.constant(i32 0, i32 0, i32 %0)
10147 ++ %2 = call i32 @llvm.SI.packf16(float %1, float %1)
10148 ++ %3 = bitcast i32 %2 to float
10149 ++ call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %3, float %3, float %3, float %3)
10150 ++ ret void
10151 ++}
10152 ++
10153 ++declare void @llvm.AMDGPU.shader.type(i32)
10154 ++
10155 ++declare float @llvm.SI.fs.interp.constant(i32, i32, i32) readonly
10156 ++
10157 ++declare i32 @llvm.SI.packf16(float, float) readnone
10158 ++
10159 ++declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)
10160 diff --git a/test/CodeGen/R600/llvm.cos.ll b/test/CodeGen/R600/llvm.cos.ll
10161 new file mode 100644
10162 index 0000000..dc120bf
10163 @@ -22466,6 +24279,112 @@ index 0000000..b070dcd
10164 + store i32 %2, i32 addrspace(1)* %out
10165 + ret void
10166 +}
10167 +diff --git a/test/CodeGen/R600/predicates.ll b/test/CodeGen/R600/predicates.ll
10168 +new file mode 100644
10169 +index 0000000..18895a4
10170 +--- /dev/null
10171 ++++ b/test/CodeGen/R600/predicates.ll
10172 +@@ -0,0 +1,100 @@
10173 ++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
10174 ++
10175 ++; These tests make sure the compiler is optimizing branches using predicates
10176 ++; when it is legal to do so.
10177 ++
10178 ++; CHECK: @simple_if
10179 ++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
10180 ++; CHECK: LSHL T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
10181 ++define void @simple_if(i32 addrspace(1)* %out, i32 %in) {
10182 ++entry:
10183 ++ %0 = icmp sgt i32 %in, 0
10184 ++ br i1 %0, label %IF, label %ENDIF
10185 ++
10186 ++IF:
10187 ++ %1 = shl i32 %in, 1
10188 ++ br label %ENDIF
10189 ++
10190 ++ENDIF:
10191 ++ %2 = phi i32 [ %in, %entry ], [ %1, %IF ]
10192 ++ store i32 %2, i32 addrspace(1)* %out
10193 ++ ret void
10194 ++}
10195 ++
10196 ++; CHECK: @simple_if_else
10197 ++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
10198 ++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
10199 ++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
10200 ++define void @simple_if_else(i32 addrspace(1)* %out, i32 %in) {
10201 ++entry:
10202 ++ %0 = icmp sgt i32 %in, 0
10203 ++ br i1 %0, label %IF, label %ELSE
10204 ++
10205 ++IF:
10206 ++ %1 = shl i32 %in, 1
10207 ++ br label %ENDIF
10208 ++
10209 ++ELSE:
10210 ++ %2 = lshr i32 %in, 1
10211 ++ br label %ENDIF
10212 ++
10213 ++ENDIF:
10214 ++ %3 = phi i32 [ %1, %IF ], [ %2, %ELSE ]
10215 ++ store i32 %3, i32 addrspace(1)* %out
10216 ++ ret void
10217 ++}
10218 ++
10219 ++; CHECK: @nested_if
10220 ++; CHECK: IF_PREDICATE_SET
10221 ++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
10222 ++; CHECK: LSHL T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
10223 ++; CHECK: ENDIF
10224 ++define void @nested_if(i32 addrspace(1)* %out, i32 %in) {
10225 ++entry:
10226 ++ %0 = icmp sgt i32 %in, 0
10227 ++ br i1 %0, label %IF0, label %ENDIF
10228 ++
10229 ++IF0:
10230 ++ %1 = add i32 %in, 10
10231 ++ %2 = icmp sgt i32 %1, 0
10232 ++ br i1 %2, label %IF1, label %ENDIF
10233 ++
10234 ++IF1:
10235 ++ %3 = shl i32 %1, 1
10236 ++ br label %ENDIF
10237 ++
10238 ++ENDIF:
10239 ++ %4 = phi i32 [%in, %entry], [%1, %IF0], [%3, %IF1]
10240 ++ store i32 %4, i32 addrspace(1)* %out
10241 ++ ret void
10242 ++}
10243 ++
10244 ++; CHECK: @nested_if_else
10245 ++; CHECK: IF_PREDICATE_SET
10246 ++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
10247 ++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
10248 ++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
10249 ++; CHECK: ENDIF
10250 ++define void @nested_if_else(i32 addrspace(1)* %out, i32 %in) {
10251 ++entry:
10252 ++ %0 = icmp sgt i32 %in, 0
10253 ++ br i1 %0, label %IF0, label %ENDIF
10254 ++
10255 ++IF0:
10256 ++ %1 = add i32 %in, 10
10257 ++ %2 = icmp sgt i32 %1, 0
10258 ++ br i1 %2, label %IF1, label %ELSE1
10259 ++
10260 ++IF1:
10261 ++ %3 = shl i32 %1, 1
10262 ++ br label %ENDIF
10263 ++
10264 ++ELSE1:
10265 ++ %4 = lshr i32 %in, 1
10266 ++ br label %ENDIF
10267 ++
10268 ++ENDIF:
10269 ++ %5 = phi i32 [%in, %entry], [%3, %IF1], [%4, %ELSE1]
10270 ++ store i32 %5, i32 addrspace(1)* %out
10271 ++ ret void
10272 ++}
10273 diff --git a/test/CodeGen/R600/reciprocal.ll b/test/CodeGen/R600/reciprocal.ll
10274 new file mode 100644
10275 index 0000000..6838c1a
10276 @@ -22517,7 +24436,7 @@ index 0000000..3556fac
10277 +}
10278 diff --git a/test/CodeGen/R600/selectcc-icmp-select-float.ll b/test/CodeGen/R600/selectcc-icmp-select-float.ll
10279 new file mode 100644
10280 -index 0000000..f65a300
10281 +index 0000000..359ca1e
10282 --- /dev/null
10283 +++ b/test/CodeGen/R600/selectcc-icmp-select-float.ll
10284 @@ -0,0 +1,15 @@
10285 @@ -22525,7 +24444,7 @@ index 0000000..f65a300
10286 +
10287 +; Note additional optimizations may cause this SGT to be replaced with a
10288 +; CND* instruction.
10289 -+; CHECK: SGT_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal.x, -1}}
10290 ++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal.x, -1}}
10291 +; Test a selectcc with i32 LHS/RHS and float True/False
10292 +
10293 +define void @test(float addrspace(1)* %out, i32 addrspace(1)* %in) {
10294 @@ -22570,6 +24489,149 @@ index 0000000..b38078e
10295 + store i32 %3, i32 addrspace(1)* %out
10296 + ret void
10297 +}
10298 +diff --git a/test/CodeGen/R600/set-dx10.ll b/test/CodeGen/R600/set-dx10.ll
10299 +new file mode 100644
10300 +index 0000000..54febcf
10301 +--- /dev/null
10302 ++++ b/test/CodeGen/R600/set-dx10.ll
10303 +@@ -0,0 +1,137 @@
10304 ++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
10305 ++
10306 ++; These tests check that floating point comparisons which are used by select
10307 ++; to store integer true (-1) and false (0) values are lowered to one of the
10308 ++; SET*DX10 instructions.
10309 ++
10310 ++; CHECK: @fcmp_une_select_fptosi
10311 ++; CHECK: SETNE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
10312 ++define void @fcmp_une_select_fptosi(i32 addrspace(1)* %out, float %in) {
10313 ++entry:
10314 ++ %0 = fcmp une float %in, 5.0
10315 ++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
10316 ++ %2 = fsub float -0.000000e+00, %1
10317 ++ %3 = fptosi float %2 to i32
10318 ++ store i32 %3, i32 addrspace(1)* %out
10319 ++ ret void
10320 ++}
10321 ++
10322 ++; CHECK: @fcmp_une_select_i32
10323 ++; CHECK: SETNE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
10324 ++define void @fcmp_une_select_i32(i32 addrspace(1)* %out, float %in) {
10325 ++entry:
10326 ++ %0 = fcmp une float %in, 5.0
10327 ++ %1 = select i1 %0, i32 -1, i32 0
10328 ++ store i32 %1, i32 addrspace(1)* %out
10329 ++ ret void
10330 ++}
10331 ++
10332 ++; CHECK: @fcmp_ueq_select_fptosi
10333 ++; CHECK: SETE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
10334 ++define void @fcmp_ueq_select_fptosi(i32 addrspace(1)* %out, float %in) {
10335 ++entry:
10336 ++ %0 = fcmp ueq float %in, 5.0
10337 ++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
10338 ++ %2 = fsub float -0.000000e+00, %1
10339 ++ %3 = fptosi float %2 to i32
10340 ++ store i32 %3, i32 addrspace(1)* %out
10341 ++ ret void
10342 ++}
10343 ++
10344 ++; CHECK: @fcmp_ueq_select_i32
10345 ++; CHECK: SETE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
10346 ++define void @fcmp_ueq_select_i32(i32 addrspace(1)* %out, float %in) {
10347 ++entry:
10348 ++ %0 = fcmp ueq float %in, 5.0
10349 ++ %1 = select i1 %0, i32 -1, i32 0
10350 ++ store i32 %1, i32 addrspace(1)* %out
10351 ++ ret void
10352 ++}
10353 ++
10354 ++; CHECK: @fcmp_ugt_select_fptosi
10355 ++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
10356 ++define void @fcmp_ugt_select_fptosi(i32 addrspace(1)* %out, float %in) {
10357 ++entry:
10358 ++ %0 = fcmp ugt float %in, 5.0
10359 ++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
10360 ++ %2 = fsub float -0.000000e+00, %1
10361 ++ %3 = fptosi float %2 to i32
10362 ++ store i32 %3, i32 addrspace(1)* %out
10363 ++ ret void
10364 ++}
10365 ++
10366 ++; CHECK: @fcmp_ugt_select_i32
10367 ++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
10368 ++define void @fcmp_ugt_select_i32(i32 addrspace(1)* %out, float %in) {
10369 ++entry:
10370 ++ %0 = fcmp ugt float %in, 5.0
10371 ++ %1 = select i1 %0, i32 -1, i32 0
10372 ++ store i32 %1, i32 addrspace(1)* %out
10373 ++ ret void
10374 ++}
10375 ++
10376 ++; CHECK: @fcmp_uge_select_fptosi
10377 ++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
10378 ++define void @fcmp_uge_select_fptosi(i32 addrspace(1)* %out, float %in) {
10379 ++entry:
10380 ++ %0 = fcmp uge float %in, 5.0
10381 ++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
10382 ++ %2 = fsub float -0.000000e+00, %1
10383 ++ %3 = fptosi float %2 to i32
10384 ++ store i32 %3, i32 addrspace(1)* %out
10385 ++ ret void
10386 ++}
10387 ++
10388 ++; CHECK: @fcmp_uge_select_i32
10389 ++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
10390 ++define void @fcmp_uge_select_i32(i32 addrspace(1)* %out, float %in) {
10391 ++entry:
10392 ++ %0 = fcmp uge float %in, 5.0
10393 ++ %1 = select i1 %0, i32 -1, i32 0
10394 ++ store i32 %1, i32 addrspace(1)* %out
10395 ++ ret void
10396 ++}
10397 ++
10398 ++; CHECK: @fcmp_ule_select_fptosi
10399 ++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
10400 ++define void @fcmp_ule_select_fptosi(i32 addrspace(1)* %out, float %in) {
10401 ++entry:
10402 ++ %0 = fcmp ule float %in, 5.0
10403 ++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
10404 ++ %2 = fsub float -0.000000e+00, %1
10405 ++ %3 = fptosi float %2 to i32
10406 ++ store i32 %3, i32 addrspace(1)* %out
10407 ++ ret void
10408 ++}
10409 ++
10410 ++; CHECK: @fcmp_ule_select_i32
10411 ++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
10412 ++define void @fcmp_ule_select_i32(i32 addrspace(1)* %out, float %in) {
10413 ++entry:
10414 ++ %0 = fcmp ule float %in, 5.0
10415 ++ %1 = select i1 %0, i32 -1, i32 0
10416 ++ store i32 %1, i32 addrspace(1)* %out
10417 ++ ret void
10418 ++}
10419 ++
10420 ++; CHECK: @fcmp_ult_select_fptosi
10421 ++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
10422 ++define void @fcmp_ult_select_fptosi(i32 addrspace(1)* %out, float %in) {
10423 ++entry:
10424 ++ %0 = fcmp ult float %in, 5.0
10425 ++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
10426 ++ %2 = fsub float -0.000000e+00, %1
10427 ++ %3 = fptosi float %2 to i32
10428 ++ store i32 %3, i32 addrspace(1)* %out
10429 ++ ret void
10430 ++}
10431 ++
10432 ++; CHECK: @fcmp_ult_select_i32
10433 ++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
10434 ++define void @fcmp_ult_select_i32(i32 addrspace(1)* %out, float %in) {
10435 ++entry:
10436 ++ %0 = fcmp ult float %in, 5.0
10437 ++ %1 = select i1 %0, i32 -1, i32 0
10438 ++ store i32 %1, i32 addrspace(1)* %out
10439 ++ ret void
10440 ++}
10441 diff --git a/test/CodeGen/R600/setcc.v4i32.ll b/test/CodeGen/R600/setcc.v4i32.ll
10442 new file mode 100644
10443 index 0000000..0752f2e
10444 @@ -22590,12 +24652,13 @@ index 0000000..0752f2e
10445 +}
10446 diff --git a/test/CodeGen/R600/short-args.ll b/test/CodeGen/R600/short-args.ll
10447 new file mode 100644
10448 -index 0000000..1070250
10449 +index 0000000..b69e327
10450 --- /dev/null
10451 +++ b/test/CodeGen/R600/short-args.ll
10452 -@@ -0,0 +1,37 @@
10453 +@@ -0,0 +1,41 @@
10454 +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
10455 +
10456 ++; CHECK: @i8_arg
10457 +; CHECK: VTX_READ_8 T{{[0-9]+\.X, T[0-9]+\.X}}
10458 +
10459 +define void @i8_arg(i32 addrspace(1)* nocapture %out, i8 %in) nounwind {
10460 @@ -22605,6 +24668,7 @@ index 0000000..1070250
10461 + ret void
10462 +}
10463 +
10464 ++; CHECK: @i8_zext_arg
10465 +; CHECK: VTX_READ_8 T{{[0-9]+\.X, T[0-9]+\.X}}
10466 +
10467 +define void @i8_zext_arg(i32 addrspace(1)* nocapture %out, i8 zeroext %in) nounwind {
10468 @@ -22614,6 +24678,7 @@ index 0000000..1070250
10469 + ret void
10470 +}
10471 +
10472 ++; CHECK: @i16_arg
10473 +; CHECK: VTX_READ_16 T{{[0-9]+\.X, T[0-9]+\.X}}
10474 +
10475 +define void @i16_arg(i32 addrspace(1)* nocapture %out, i16 %in) nounwind {
10476 @@ -22623,6 +24688,7 @@ index 0000000..1070250
10477 + ret void
10478 +}
10479 +
10480 ++; CHECK: @i16_zext_arg
10481 +; CHECK: VTX_READ_16 T{{[0-9]+\.X, T[0-9]+\.X}}
10482 +
10483 +define void @i16_zext_arg(i32 addrspace(1)* nocapture %out, i16 zeroext %in) nounwind {
10484 @@ -22682,6 +24748,95 @@ index 0000000..47657a6
10485 + store <4 x i32> %result, <4 x i32> addrspace(1)* %out
10486 + ret void
10487 +}
10488 +diff --git a/test/CodeGen/R600/unsupported-cc.ll b/test/CodeGen/R600/unsupported-cc.ll
10489 +new file mode 100644
10490 +index 0000000..b48c591
10491 +--- /dev/null
10492 ++++ b/test/CodeGen/R600/unsupported-cc.ll
10493 +@@ -0,0 +1,83 @@
10494 ++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
10495 ++
10496 ++; These tests are for condition codes that are not supported by the hardware
10497 ++
10498 ++; CHECK: @slt
10499 ++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 5(7.006492e-45)
10500 ++define void @slt(i32 addrspace(1)* %out, i32 %in) {
10501 ++entry:
10502 ++ %0 = icmp slt i32 %in, 5
10503 ++ %1 = select i1 %0, i32 -1, i32 0
10504 ++ store i32 %1, i32 addrspace(1)* %out
10505 ++ ret void
10506 ++}
10507 ++
10508 ++; CHECK: @ult_i32
10509 ++; CHECK: SETGT_UINT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 5(7.006492e-45)
10510 ++define void @ult_i32(i32 addrspace(1)* %out, i32 %in) {
10511 ++entry:
10512 ++ %0 = icmp ult i32 %in, 5
10513 ++ %1 = select i1 %0, i32 -1, i32 0
10514 ++ store i32 %1, i32 addrspace(1)* %out
10515 ++ ret void
10516 ++}
10517 ++
10518 ++; CHECK: @ult_float
10519 ++; CHECK: SETGT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
10520 ++define void @ult_float(float addrspace(1)* %out, float %in) {
10521 ++entry:
10522 ++ %0 = fcmp ult float %in, 5.0
10523 ++ %1 = select i1 %0, float 1.0, float 0.0
10524 ++ store float %1, float addrspace(1)* %out
10525 ++ ret void
10526 ++}
10527 ++
10528 ++; CHECK: @olt
10529 ++; CHECK: SETGT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
10530 ++define void @olt(float addrspace(1)* %out, float %in) {
10531 ++entry:
10532 ++ %0 = fcmp olt float %in, 5.0
10533 ++ %1 = select i1 %0, float 1.0, float 0.0
10534 ++ store float %1, float addrspace(1)* %out
10535 ++ ret void
10536 ++}
10537 ++
10538 ++; CHECK: @sle
10539 ++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 6(8.407791e-45)
10540 ++define void @sle(i32 addrspace(1)* %out, i32 %in) {
10541 ++entry:
10542 ++ %0 = icmp sle i32 %in, 5
10543 ++ %1 = select i1 %0, i32 -1, i32 0
10544 ++ store i32 %1, i32 addrspace(1)* %out
10545 ++ ret void
10546 ++}
10547 ++
10548 ++; CHECK: @ule_i32
10549 ++; CHECK: SETGT_UINT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 6(8.407791e-45)
10550 ++define void @ule_i32(i32 addrspace(1)* %out, i32 %in) {
10551 ++entry:
10552 ++ %0 = icmp ule i32 %in, 5
10553 ++ %1 = select i1 %0, i32 -1, i32 0
10554 ++ store i32 %1, i32 addrspace(1)* %out
10555 ++ ret void
10556 ++}
10557 ++
10558 ++; CHECK: @ule_float
10559 ++; CHECK: SETGE T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
10560 ++define void @ule_float(float addrspace(1)* %out, float %in) {
10561 ++entry:
10562 ++ %0 = fcmp ule float %in, 5.0
10563 ++ %1 = select i1 %0, float 1.0, float 0.0
10564 ++ store float %1, float addrspace(1)* %out
10565 ++ ret void
10566 ++}
10567 ++
10568 ++; CHECK: @ole
10569 ++; CHECK: SETGE T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
10570 ++define void @ole(float addrspace(1)* %out, float %in) {
10571 ++entry:
10572 ++ %0 = fcmp ole float %in, 5.0
10573 ++ %1 = select i1 %0, float 1.0, float 0.0
10574 ++ store float %1, float addrspace(1)* %out
10575 ++ ret void
10576 ++}
10577 diff --git a/test/CodeGen/R600/urem.v4i32.ll b/test/CodeGen/R600/urem.v4i32.ll
10578 new file mode 100644
10579 index 0000000..2e7388c
10580 @@ -22705,15 +24860,13 @@ index 0000000..2e7388c
10581 +}
10582 diff --git a/test/CodeGen/R600/vec4-expand.ll b/test/CodeGen/R600/vec4-expand.ll
10583 new file mode 100644
10584 -index 0000000..47cbf82
10585 +index 0000000..8f62bc6
10586 --- /dev/null
10587 +++ b/test/CodeGen/R600/vec4-expand.ll
10588 -@@ -0,0 +1,52 @@
10589 -+; There are bugs in the DAGCombiner that prevent this test from passing.
10590 -+; XFAIL: *
10591 -+
10592 +@@ -0,0 +1,53 @@
10593 +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
10594 +
10595 ++; CHECK: @fp_to_sint
10596 +; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10597 +; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10598 +; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10599 @@ -22726,6 +24879,7 @@ index 0000000..47cbf82
10600 + ret void
10601 +}
10602 +
10603 ++; CHECK: @fp_to_uint
10604 +; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10605 +; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10606 +; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10607 @@ -22738,6 +24892,7 @@ index 0000000..47cbf82
10608 + ret void
10609 +}
10610 +
10611 ++; CHECK: @sint_to_fp
10612 +; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10613 +; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10614 +; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10615 @@ -22750,6 +24905,7 @@ index 0000000..47cbf82
10616 + ret void
10617 +}
10618 +
10619 ++; CHECK: @uint_to_fp
10620 +; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10621 +; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10622 +; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
10623 @@ -22804,6 +24960,15 @@ index 0000000..62cdcf5
10624 +declare <4 x float> @llvm.SI.vs.load.input(<4 x i32>, i32, i32)
10625 +
10626 +declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)
10627 ---
10628 -1.8.0.2
10629 -
10630 +diff --git a/test/CodeGen/X86/cvtv2f32.ll b/test/CodeGen/X86/cvtv2f32.ll
10631 +index 466b096..d11bb9e 100644
10632 +--- a/test/CodeGen/X86/cvtv2f32.ll
10633 ++++ b/test/CodeGen/X86/cvtv2f32.ll
10634 +@@ -1,3 +1,7 @@
10635 ++; A bug fix in the DAGCombiner made this test fail, so marking as xfail
10636 ++; until this can be investigated further.
10637 ++; XFAIL: *
10638 ++
10639 + ; RUN: llc < %s -mtriple=i686-linux-pc -mcpu=corei7 | FileCheck %s
10640 +
10641 + define <2 x float> @foo(i32 %x, i32 %y, <2 x float> %v) {
10642
10643 diff --git a/sys-devel/llvm/llvm-3.2.ebuild b/sys-devel/llvm/llvm-3.2.ebuild
10644 index 7171bfc..ceb16bb 100644
10645 --- a/sys-devel/llvm/llvm-3.2.ebuild
10646 +++ b/sys-devel/llvm/llvm-3.2.ebuild
10647 @@ -1,33 +1,38 @@
10648 -# Copyright 1999-2012 Gentoo Foundation
10649 +# Copyright 1999-2013 Gentoo Foundation
10650 # Distributed under the terms of the GNU General Public License v2
10651 -# $Header: /var/cvsroot/gentoo-x86/sys-devel/llvm/llvm-3.2.ebuild,v 1.1 2012/12/21 09:18:12 voyageur Exp $
10652 +# $Header: /var/cvsroot/gentoo-x86/sys-devel/llvm/llvm-3.2.ebuild,v 1.6 2013/02/27 06:02:15 zmedico Exp $
10653
10654 EAPI=5
10655 -PYTHON_DEPEND="2"
10656 -inherit eutils flag-o-matic multilib toolchain-funcs python pax-utils
10657 +
10658 +# pypy gives me around 1700 unresolved tests due to open file limit
10659 +# being exceeded. probably GC does not close them fast enough.
10660 +PYTHON_COMPAT=( python{2_5,2_6,2_7} )
10661 +
10662 +inherit eutils flag-o-matic multilib python-any-r1 toolchain-funcs pax-utils
10663
10664 DESCRIPTION="Low Level Virtual Machine"
10665 HOMEPAGE="http://llvm.org/"
10666 -SRC_URI="http://llvm.org/releases/${PV}/${P}.src.tar.gz"
10667 +SRC_URI="http://llvm.org/releases/${PV}/${P}.src.tar.gz
10668 + !doc? ( http://dev.gentoo.org/~voyageur/distfiles/${P}-manpages.tar.bz2 )"
10669
10670 LICENSE="UoI-NCSA"
10671 SLOT="0"
10672 -KEYWORDS="~amd64 ~arm ~ppc ~x86 ~amd64-fbsd ~x86-fbsd ~x64-freebsd ~amd64-linux ~x86-linux ~ppc-macos ~x64-macos"
10673 +KEYWORDS="~amd64 ~arm ~ppc ~x86 ~amd64-fbsd ~x86-fbsd ~x64-freebsd ~amd64-linux ~arm-linux ~x86-linux ~ppc-macos ~x64-macos"
10674 IUSE="debug doc gold +libffi multitarget ocaml test udis86 vim-syntax"
10675
10676 DEPEND="dev-lang/perl
10677 - dev-python/sphinx
10678 >=sys-devel/make-3.79
10679 >=sys-devel/flex-2.5.4
10680 >=sys-devel/bison-1.875d
10681 || ( >=sys-devel/gcc-3.0 >=sys-devel/gcc-apple-4.2.1 )
10682 || ( >=sys-devel/binutils-2.18 >=sys-devel/binutils-apple-3.2.3 )
10683 + doc? ( dev-python/sphinx )
10684 gold? ( >=sys-devel/binutils-2.22[cxx] )
10685 libffi? ( virtual/pkgconfig
10686 virtual/libffi )
10687 ocaml? ( dev-lang/ocaml )
10688 - udis86? ( amd64? ( dev-libs/udis86[pic] )
10689 - !amd64? ( dev-libs/udis86 ) )"
10690 + udis86? ( dev-libs/udis86[pic(+)] )
10691 + ${PYTHON_DEPS}"
10692 RDEPEND="dev-lang/perl
10693 libffi? ( virtual/libffi )
10694 vim-syntax? ( || ( app-editors/vim app-editors/gvim ) )"
10695 @@ -36,8 +41,7 @@ S=${WORKDIR}/${P}.src
10696
10697 pkg_setup() {
10698 # Required for test and build
10699 - python_set_active_version 2
10700 - python_pkg_setup
10701 + python-any-r1_pkg_setup
10702
10703 # need to check if the active compiler is ok
10704
10705 @@ -64,12 +68,12 @@ pkg_setup() {
10706
10707 if [[ ${CHOST} == x86_64-* && ${broken_gcc_amd64} == *" ${version} "* ]];
10708 then
10709 - elog "Your version of gcc is known to miscompile llvm in amd64"
10710 - elog "architectures. Check"
10711 - elog "http://www.llvm.org/docs/GettingStarted.html for possible"
10712 - elog "solutions."
10713 + elog "Your version of gcc is known to miscompile llvm in amd64"
10714 + elog "architectures. Check"
10715 + elog "http://www.llvm.org/docs/GettingStarted.html for possible"
10716 + elog "solutions."
10717 die "Your currently active version of gcc is known to miscompile llvm"
10718 - fi
10719 + fi
10720 }
10721
10722 src_prepare() {
10723 @@ -96,12 +100,9 @@ src_prepare() {
10724 sed -e "/NO_INSTALL = 1/s/^/#/" -i utils/FileCheck/Makefile \
10725 || die "FileCheck Makefile sed failed"
10726
10727 - # Specify python version
10728 - python_convert_shebangs -r 2 test/Scripts
10729 -
10730 epatch "${FILESDIR}"/${PN}-3.2-nodoctargz.patch
10731 epatch "${FILESDIR}"/${PN}-3.0-PPC_macro.patch
10732 - epatch "${FILESDIR}"/0001-Add-R600-backend.patch
10733 + epatch "${FILESDIR}"/R600-Mesa-9.1.patch
10734
10735 # User patches
10736 epatch_user
10737 @@ -150,20 +151,28 @@ src_configure() {
10738 src_compile() {
10739 emake VERBOSE=1 KEEP_SYMBOLS=1 REQUIRES_RTTI=1
10740
10741 - emake -C docs -f Makefile.sphinx man
10742 - use doc && emake -C docs -f Makefile.sphinx html
10743 + if use doc; then
10744 + emake -C docs -f Makefile.sphinx man html
10745 + fi
10746 + # emake -C docs -f Makefile.sphinx html
10747
10748 pax-mark m Release/bin/lli
10749 if use test; then
10750 pax-mark m unittests/ExecutionEngine/JIT/Release/JITTests
10751 + pax-mark m unittests/ExecutionEngine/MCJIT/Release/MCJITTests
10752 + pax-mark m unittests/Support/Release/SupportTests
10753 fi
10754 }
10755
10756 src_install() {
10757 emake KEEP_SYMBOLS=1 DESTDIR="${D}" install
10758
10759 - doman docs/_build/man/*.1
10760 - use doc && dohtml -r docs/_build/html/
10761 + if use doc; then
10762 + doman docs/_build/man/*.1
10763 + dohtml -r docs/_build/html/
10764 + else
10765 + doman "${WORKDIR}"/${P}-manpages/*.1
10766 + fi
10767
10768 if use vim-syntax; then
10769 insinto /usr/share/vim/vimfiles/syntax
10770
10771 diff --git a/sys-devel/llvm/metadata.xml b/sys-devel/llvm/metadata.xml
10772 index e5a362b..38e16d8 100644
10773 --- a/sys-devel/llvm/metadata.xml
10774 +++ b/sys-devel/llvm/metadata.xml
10775 @@ -16,7 +16,6 @@
10776 4. LLVM does not imply things that you would expect from a high-level virtual machine. It does not require garbage collection or run-time code generation (In fact, LLVM makes a great static compiler!). Note that optional LLVM components can be used to build high-level virtual machines and other systems that need these services.</longdescription>
10777 <use>
10778 <flag name='gold'>Build the gold linker plugin</flag>
10779 - <flag name='llvm-gcc'>Build LLVM with <pkg>sys-devel/llvm-gcc</pkg></flag>
10780 <flag name='multitarget'>Build all host targets (default: host only)</flag>
10781 <flag name='udis86'>Enable support for <pkg>dev-libs/udis86</pkg> disassembler library</flag>
10782 </use>