LLVM 23.0.0git
AMDGPURegisterBankInfo.cpp
Go to the documentation of this file.
1//===- AMDGPURegisterBankInfo.cpp -------------------------------*- C++ -*-==//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8/// \file
9/// This file implements the targeting of the RegisterBankInfo class for
10/// AMDGPU.
11///
12/// \par
13///
14/// AMDGPU has unique register bank constraints that require special high level
15/// strategies to deal with. There are two main true physical register banks
16/// VGPR (vector), and SGPR (scalar). Additionally the VCC register bank is a
17/// sort of pseudo-register bank needed to represent SGPRs used in a vector
18/// boolean context. There is also the AGPR bank, which is a special purpose
19/// physical register bank present on some subtargets.
20///
21/// Copying from VGPR to SGPR is generally illegal, unless the value is known to
22/// be uniform. It is generally not valid to legalize operands by inserting
23/// copies as on other targets. Operations which require uniform, SGPR operands
24/// generally require scalarization by repeatedly executing the instruction,
25/// activating each set of lanes using a unique set of input values. This is
26/// referred to as a waterfall loop.
27///
28/// \par Booleans
29///
30/// Booleans (s1 values) requires special consideration. A vector compare result
31/// is naturally a bitmask with one bit per lane, in a 32 or 64-bit
32/// register. These are represented with the VCC bank. During selection, we need
33/// to be able to unambiguously go back from a register class to a register
34/// bank. To distinguish whether an SGPR should use the SGPR or VCC register
35/// bank, we need to know the use context type. An SGPR s1 value always means a
36/// VCC bank value, otherwise it will be the SGPR bank. A scalar compare sets
37/// SCC, which is a 1-bit unaddressable register. This will need to be copied to
38/// a 32-bit virtual register. Taken together, this means we need to adjust the
39/// type of boolean operations to be regbank legal. All SALU booleans need to be
40/// widened to 32-bits, and all VALU booleans need to be s1 values.
41///
42/// A noteworthy exception to the s1-means-vcc rule is for legalization artifact
43/// casts. G_TRUNC s1 results, and G_SEXT/G_ZEXT/G_ANYEXT sources are never vcc
44/// bank. A non-boolean source (such as a truncate from a 1-bit load from
45/// memory) will require a copy to the VCC bank which will require clearing the
46/// high bits and inserting a compare.
47///
48/// \par Constant bus restriction
49///
50/// VALU instructions have a limitation known as the constant bus
51/// restriction. Most VALU instructions can use SGPR operands, but may read at
52/// most 1 SGPR or constant literal value (this to 2 in gfx10 for most
53/// instructions). This is one unique SGPR, so the same SGPR may be used for
54/// multiple operands. From a register bank perspective, any combination of
55/// operands should be legal as an SGPR, but this is contextually dependent on
56/// the SGPR operands all being the same register. There is therefore optimal to
57/// choose the SGPR with the most uses to minimize the number of copies.
58///
59/// We avoid trying to solve this problem in RegBankSelect. Any VALU G_*
60/// operation should have its source operands all mapped to VGPRs (except for
61/// VCC), inserting copies from any SGPR operands. This the most trivial legal
62/// mapping. Anything beyond the simplest 1:1 instruction selection would be too
63/// complicated to solve here. Every optimization pattern or instruction
64/// selected to multiple outputs would have to enforce this rule, and there
65/// would be additional complexity in tracking this rule for every G_*
66/// operation. By forcing all inputs to VGPRs, it also simplifies the task of
67/// picking the optimal operand combination from a post-isel optimization pass.
68///
69//===----------------------------------------------------------------------===//
70
72
73#include "AMDGPU.h"
75#include "AMDGPUInstrInfo.h"
76#include "AMDGPULaneMaskUtils.h"
77#include "GCNSubtarget.h"
79#include "SIRegisterInfo.h"
85#include "llvm/IR/IntrinsicsAMDGPU.h"
86
87#define GET_TARGET_REGBANK_IMPL
88#include "AMDGPUGenRegisterBank.inc"
89
90// This file will be TableGen'ed at some point.
91#include "AMDGPUGenRegisterBankInfo.def"
92
93using namespace llvm;
94using namespace MIPatternMatch;
95
96namespace {
97
98// Observer to apply a register bank to new registers created by LegalizerHelper.
99class ApplyRegBankMapping final : public GISelChangeObserver {
100private:
102 const AMDGPURegisterBankInfo &RBI;
104 const RegisterBank *NewBank;
106
107public:
108 ApplyRegBankMapping(MachineIRBuilder &B, const AMDGPURegisterBankInfo &RBI_,
109 MachineRegisterInfo &MRI_, const RegisterBank *RB)
110 : B(B), RBI(RBI_), MRI(MRI_), NewBank(RB) {
111 assert(!B.isObservingChanges());
112 B.setChangeObserver(*this);
113 }
114
115 ~ApplyRegBankMapping() override {
116 for (MachineInstr *MI : NewInsts)
117 applyBank(*MI);
118
119 B.stopObservingChanges();
120 }
121
122 /// Set any registers that don't have a set register class or bank to SALU.
123 void applyBank(MachineInstr &MI) {
124 const unsigned Opc = MI.getOpcode();
125 if (Opc == AMDGPU::G_ANYEXT || Opc == AMDGPU::G_ZEXT ||
126 Opc == AMDGPU::G_SEXT) {
127 // LegalizerHelper wants to use the basic legalization artifacts when
128 // widening etc. We don't handle selection with vcc in artifact sources,
129 // so we need to use a select instead to handle these properly.
130 Register DstReg = MI.getOperand(0).getReg();
131 Register SrcReg = MI.getOperand(1).getReg();
132 const RegisterBank *SrcBank = RBI.getRegBank(SrcReg, MRI, *RBI.TRI);
133 if (SrcBank == &AMDGPU::VCCRegBank) {
134 const LLT S32 = LLT::scalar(32);
135 assert(MRI.getType(SrcReg) == LLT::scalar(1));
136 assert(MRI.getType(DstReg) == S32);
137 assert(NewBank == &AMDGPU::VGPRRegBank);
138
139 // Replace the extension with a select, which really uses the boolean
140 // source.
141 B.setInsertPt(*MI.getParent(), MI);
142
143 auto True = B.buildConstant(S32, Opc == AMDGPU::G_SEXT ? -1 : 1);
144 auto False = B.buildConstant(S32, 0);
145 B.buildSelect(DstReg, SrcReg, True, False);
146 MRI.setRegBank(True.getReg(0), *NewBank);
147 MRI.setRegBank(False.getReg(0), *NewBank);
148 MI.eraseFromParent();
149 }
150
151 assert(!MRI.getRegClassOrRegBank(DstReg));
152 MRI.setRegBank(DstReg, *NewBank);
153 return;
154 }
155
156#ifndef NDEBUG
157 if (Opc == AMDGPU::G_TRUNC) {
158 Register DstReg = MI.getOperand(0).getReg();
159 const RegisterBank *DstBank = RBI.getRegBank(DstReg, MRI, *RBI.TRI);
160 assert(DstBank != &AMDGPU::VCCRegBank);
161 }
162#endif
163
164 for (MachineOperand &Op : MI.operands()) {
165 if (!Op.isReg())
166 continue;
167
168 // We may see physical registers if building a real MI
169 Register Reg = Op.getReg();
170 if (Reg.isPhysical() || MRI.getRegClassOrRegBank(Reg))
171 continue;
172
173 const RegisterBank *RB = NewBank;
174 if (MRI.getType(Reg) == LLT::scalar(1)) {
175 assert(NewBank == &AMDGPU::VGPRRegBank &&
176 "s1 operands should only be used for vector bools");
177 assert((MI.getOpcode() != AMDGPU::G_TRUNC &&
178 MI.getOpcode() != AMDGPU::G_ANYEXT) &&
179 "not expecting legalization artifacts here");
180 RB = &AMDGPU::VCCRegBank;
181 }
182
183 MRI.setRegBank(Reg, *RB);
184 }
185 }
186
187 void erasingInstr(MachineInstr &MI) override {}
188
189 void createdInstr(MachineInstr &MI) override {
190 // At this point, the instruction was just inserted and has no operands.
191 NewInsts.push_back(&MI);
192 }
193
194 void changingInstr(MachineInstr &MI) override {}
195 void changedInstr(MachineInstr &MI) override {
196 // FIXME: In principle we should probably add the instruction to NewInsts,
197 // but the way the LegalizerHelper uses the observer, we will always see the
198 // registers we need to set the regbank on also referenced in a new
199 // instruction.
200 }
201};
202
203} // anonymous namespace
204
206 : Subtarget(ST), TRI(Subtarget.getRegisterInfo()),
207 TII(Subtarget.getInstrInfo()) {
208
209 // HACK: Until this is fully tablegen'd.
210 static llvm::once_flag InitializeRegisterBankFlag;
211
212 static auto InitializeRegisterBankOnce = [this]() {
213 assert(&getRegBank(AMDGPU::SGPRRegBankID) == &AMDGPU::SGPRRegBank &&
214 &getRegBank(AMDGPU::VGPRRegBankID) == &AMDGPU::VGPRRegBank &&
215 &getRegBank(AMDGPU::AGPRRegBankID) == &AMDGPU::AGPRRegBank);
216 (void)this;
217 };
218
219 llvm::call_once(InitializeRegisterBankFlag, InitializeRegisterBankOnce);
220}
221
222static bool isVectorRegisterBank(const RegisterBank &Bank) {
223 unsigned BankID = Bank.getID();
224 return BankID == AMDGPU::VGPRRegBankID || BankID == AMDGPU::AGPRRegBankID;
225}
226
228 return RB != &AMDGPU::SGPRRegBank;
229}
230
232 const RegisterBank &Src,
233 TypeSize Size) const {
234 // TODO: Should there be a UniformVGPRRegBank which can use readfirstlane?
235 if (Dst.getID() == AMDGPU::SGPRRegBankID &&
236 (isVectorRegisterBank(Src) || Src.getID() == AMDGPU::VCCRegBankID)) {
237 return std::numeric_limits<unsigned>::max();
238 }
239
240 // Bool values are tricky, because the meaning is based on context. The SCC
241 // and VCC banks are for the natural scalar and vector conditions produced by
242 // a compare.
243 //
244 // Legalization doesn't know about the necessary context, so an s1 use may
245 // have been a truncate from an arbitrary value, in which case a copy (lowered
246 // as a compare with 0) needs to be inserted.
247 if (Size == 1 &&
248 (Dst.getID() == AMDGPU::SGPRRegBankID) &&
249 (isVectorRegisterBank(Src) ||
250 Src.getID() == AMDGPU::SGPRRegBankID ||
251 Src.getID() == AMDGPU::VCCRegBankID))
252 return std::numeric_limits<unsigned>::max();
253
254 // There is no direct copy between AGPRs.
255 if (Dst.getID() == AMDGPU::AGPRRegBankID &&
256 Src.getID() == AMDGPU::AGPRRegBankID)
257 return 4;
258
259 return RegisterBankInfo::copyCost(Dst, Src, Size);
260}
261
263 const ValueMapping &ValMapping,
264 const RegisterBank *CurBank) const {
265 // Check if this is a breakdown for G_LOAD to move the pointer from SGPR to
266 // VGPR.
267 // FIXME: Is there a better way to do this?
268 if (ValMapping.NumBreakDowns >= 2 || ValMapping.BreakDown[0].Length >= 64)
269 return 10; // This is expensive.
270
271 assert(ValMapping.NumBreakDowns == 2 &&
272 ValMapping.BreakDown[0].Length == 32 &&
273 ValMapping.BreakDown[0].StartIdx == 0 &&
274 ValMapping.BreakDown[1].Length == 32 &&
275 ValMapping.BreakDown[1].StartIdx == 32 &&
276 ValMapping.BreakDown[0].RegBank == ValMapping.BreakDown[1].RegBank);
277
278 // 32-bit extract of a 64-bit value is just access of a subregister, so free.
279 // TODO: Cost of 0 hits assert, though it's not clear it's what we really
280 // want.
281
282 // TODO: 32-bit insert to a 64-bit SGPR may incur a non-free copy due to SGPR
283 // alignment restrictions, but this probably isn't important.
284 return 1;
285}
286
287const RegisterBank &
289 LLT Ty) const {
290 // We promote real scalar booleans to SReg_32. Any SGPR using s1 is really a
291 // VCC-like use.
292 if (TRI->isSGPRClass(&RC)) {
293 // FIXME: This probably came from a copy from a physical register, which
294 // should be inferable from the copied to-type. We don't have many boolean
295 // physical register constraints so just assume a normal SGPR for now.
296 if (!Ty.isValid())
297 return AMDGPU::SGPRRegBank;
298
299 return Ty == LLT::scalar(1) ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
300 }
301
302 return TRI->isAGPRClass(&RC) ? AMDGPU::AGPRRegBank : AMDGPU::VGPRRegBank;
303}
304
305template <unsigned NumOps>
308 const MachineInstr &MI, const MachineRegisterInfo &MRI,
309 const std::array<unsigned, NumOps> RegSrcOpIdx,
310 ArrayRef<OpRegBankEntry<NumOps>> Table) const {
311
312 InstructionMappings AltMappings;
313
314 SmallVector<const ValueMapping *, 10> Operands(MI.getNumOperands());
315
316 unsigned Sizes[NumOps];
317 for (unsigned I = 0; I < NumOps; ++I) {
318 Register Reg = MI.getOperand(RegSrcOpIdx[I]).getReg();
319 Sizes[I] = getSizeInBits(Reg, MRI, *TRI);
320 }
321
322 for (unsigned I = 0, E = MI.getNumExplicitDefs(); I != E; ++I) {
323 unsigned SizeI = getSizeInBits(MI.getOperand(I).getReg(), MRI, *TRI);
324 Operands[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SizeI);
325 }
326
327 // getInstrMapping's default mapping uses ID 1, so start at 2.
328 unsigned MappingID = 2;
329 for (const auto &Entry : Table) {
330 for (unsigned I = 0; I < NumOps; ++I) {
331 int OpIdx = RegSrcOpIdx[I];
332 Operands[OpIdx] = AMDGPU::getValueMapping(Entry.RegBanks[I], Sizes[I]);
333 }
334
335 AltMappings.push_back(&getInstructionMapping(MappingID++, Entry.Cost,
336 getOperandsMapping(Operands),
337 Operands.size()));
338 }
339
340 return AltMappings;
341}
342
345 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
347 case Intrinsic::amdgcn_readlane: {
348 static const OpRegBankEntry<3> Table[2] = {
349 // Perfectly legal.
350 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
351
352 // Need a readfirstlane for the index.
353 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
354 };
355
356 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
357 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
358 }
359 case Intrinsic::amdgcn_writelane: {
360 static const OpRegBankEntry<4> Table[4] = {
361 // Perfectly legal.
362 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
363
364 // Need readfirstlane of first op
365 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
366
367 // Need readfirstlane of second op
368 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
369
370 // Need readfirstlane of both ops
371 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 3 }
372 };
373
374 // rsrc, voffset, offset
375 const std::array<unsigned, 4> RegSrcOpIdx = { { 0, 2, 3, 4 } };
376 return addMappingFromTable<4>(MI, MRI, RegSrcOpIdx, Table);
377 }
378 default:
380 }
381}
382
385 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
386
388 case Intrinsic::amdgcn_s_buffer_load: {
389 static const OpRegBankEntry<2> Table[4] = {
390 // Perfectly legal.
391 { { AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
392
393 // Only need 1 register in loop
394 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 300 },
395
396 // Have to waterfall the resource.
397 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1000 },
398
399 // Have to waterfall the resource, and the offset.
400 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 1500 }
401 };
402
403 // rsrc, offset
404 const std::array<unsigned, 2> RegSrcOpIdx = { { 2, 3 } };
405 return addMappingFromTable<2>(MI, MRI, RegSrcOpIdx, Table);
406 }
407 case Intrinsic::amdgcn_ds_ordered_add:
408 case Intrinsic::amdgcn_ds_ordered_swap: {
409 // VGPR = M0, VGPR
410 static const OpRegBankEntry<3> Table[2] = {
411 // Perfectly legal.
412 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
413
414 // Need a readfirstlane for m0
415 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
416 };
417
418 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
419 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
420 }
421 case Intrinsic::amdgcn_s_sendmsg:
422 case Intrinsic::amdgcn_s_sendmsghalt: {
423 // FIXME: Should have no register for immediate
424 static const OpRegBankEntry<1> Table[2] = {
425 // Perfectly legal.
426 { { AMDGPU::SGPRRegBankID }, 1 },
427
428 // Need readlane
429 { { AMDGPU::VGPRRegBankID }, 3 }
430 };
431
432 const std::array<unsigned, 1> RegSrcOpIdx = { { 2 } };
433 return addMappingFromTable<1>(MI, MRI, RegSrcOpIdx, Table);
434 }
435 default:
437 }
438}
439
440// FIXME: Returns uniform if there's no source value information. This is
441// probably wrong.
443 if (!MI.hasOneMemOperand())
444 return false;
445
446 const MachineMemOperand *MMO = *MI.memoperands_begin();
447 const unsigned AS = MMO->getAddrSpace();
448 const bool IsConst = AS == AMDGPUAS::CONSTANT_ADDRESS ||
450 const unsigned MemSize = 8 * MMO->getSize().getValue();
451
452 // Require 4-byte alignment.
453 return (MMO->getAlign() >= Align(4) ||
454 (Subtarget.hasScalarSubwordLoads() &&
455 ((MemSize == 16 && MMO->getAlign() >= Align(2)) ||
456 (MemSize == 8 && MMO->getAlign() >= Align(1))))) &&
457 // Can't do a scalar atomic load.
458 !MMO->isAtomic() &&
459 // Don't use scalar loads for volatile accesses to non-constant address
460 // spaces.
461 (IsConst || !MMO->isVolatile()) &&
462 // Memory must be known constant, or not written before this load.
463 (IsConst || MMO->isInvariant() || (MMO->getFlags() & MONoClobber)) &&
465}
466
469 const MachineInstr &MI) const {
470
471 const MachineFunction &MF = *MI.getMF();
472 const MachineRegisterInfo &MRI = MF.getRegInfo();
473
474
475 InstructionMappings AltMappings;
476 switch (MI.getOpcode()) {
477 case TargetOpcode::G_CONSTANT:
478 case TargetOpcode::G_IMPLICIT_DEF: {
479 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
480 if (Size == 1) {
481 static const OpRegBankEntry<1> Table[3] = {
482 { { AMDGPU::VGPRRegBankID }, 1 },
483 { { AMDGPU::SGPRRegBankID }, 1 },
484 { { AMDGPU::VCCRegBankID }, 1 }
485 };
486
487 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
488 }
489
490 [[fallthrough]];
491 }
492 case TargetOpcode::G_FCONSTANT:
493 case TargetOpcode::G_FRAME_INDEX:
494 case TargetOpcode::G_GLOBAL_VALUE: {
495 static const OpRegBankEntry<1> Table[2] = {
496 { { AMDGPU::VGPRRegBankID }, 1 },
497 { { AMDGPU::SGPRRegBankID }, 1 }
498 };
499
500 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
501 }
502 case TargetOpcode::G_AND:
503 case TargetOpcode::G_OR:
504 case TargetOpcode::G_XOR: {
505 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
506
507 if (Size == 1) {
508 // s_{and|or|xor}_b32 set scc when the result of the 32-bit op is not 0.
509 const InstructionMapping &SCCMapping = getInstructionMapping(
510 1, 1, getOperandsMapping(
511 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
512 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
513 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32)}),
514 3); // Num Operands
515 AltMappings.push_back(&SCCMapping);
516
517 const InstructionMapping &VCCMapping0 = getInstructionMapping(
518 2, 1, getOperandsMapping(
519 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
520 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
521 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size)}),
522 3); // Num Operands
523 AltMappings.push_back(&VCCMapping0);
524 return AltMappings;
525 }
526
527 if (Size != 64)
528 break;
529
530 const InstructionMapping &SSMapping = getInstructionMapping(
531 1, 1, getOperandsMapping(
532 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
533 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
534 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
535 3); // Num Operands
536 AltMappings.push_back(&SSMapping);
537
538 const InstructionMapping &VVMapping = getInstructionMapping(
539 2, 2, getOperandsMapping(
540 {AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
541 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
542 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
543 3); // Num Operands
544 AltMappings.push_back(&VVMapping);
545 break;
546 }
547 case TargetOpcode::G_LOAD:
548 case TargetOpcode::G_ZEXTLOAD:
549 case TargetOpcode::G_SEXTLOAD: {
550 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
551 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
552 unsigned PtrSize = PtrTy.getSizeInBits();
553 unsigned AS = PtrTy.getAddressSpace();
554
558 const InstructionMapping &SSMapping = getInstructionMapping(
559 1, 1, getOperandsMapping(
560 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
561 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize)}),
562 2); // Num Operands
563 AltMappings.push_back(&SSMapping);
564 }
565
566 const InstructionMapping &VVMapping = getInstructionMapping(
567 2, 1,
569 {AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
570 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize)}),
571 2); // Num Operands
572 AltMappings.push_back(&VVMapping);
573
574 // It may be possible to have a vgpr = load sgpr mapping here, because
575 // the mubuf instructions support this kind of load, but probably for only
576 // gfx7 and older. However, the addressing mode matching in the instruction
577 // selector should be able to do a better job of detecting and selecting
578 // these kinds of loads from the vgpr = load vgpr mapping.
579
580 return AltMappings;
581
582 }
583 case TargetOpcode::G_SELECT: {
584 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
585 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
586 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
587 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
588 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
589 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
590 4); // Num Operands
591 AltMappings.push_back(&SSMapping);
592
593 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
594 getOperandsMapping({AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
595 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
596 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
597 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
598 4); // Num Operands
599 AltMappings.push_back(&VVMapping);
600
601 return AltMappings;
602 }
603 case TargetOpcode::G_UADDE:
604 case TargetOpcode::G_USUBE:
605 case TargetOpcode::G_SADDE:
606 case TargetOpcode::G_SSUBE: {
607 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
608 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
610 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
611 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
612 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
613 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
614 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1)}),
615 5); // Num Operands
616 AltMappings.push_back(&SSMapping);
617
618 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
619 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
620 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
621 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
622 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
623 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1)}),
624 5); // Num Operands
625 AltMappings.push_back(&VVMapping);
626 return AltMappings;
627 }
628 case AMDGPU::G_BRCOND: {
629 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
630
631 // TODO: Change type to 32 for scalar
633 1, 1, getOperandsMapping(
634 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1), nullptr}),
635 2); // Num Operands
636 AltMappings.push_back(&SMapping);
637
639 1, 1, getOperandsMapping(
640 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1), nullptr }),
641 2); // Num Operands
642 AltMappings.push_back(&VMapping);
643 return AltMappings;
644 }
645 case AMDGPU::G_INTRINSIC:
646 case AMDGPU::G_INTRINSIC_CONVERGENT:
648 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
649 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS:
651 default:
652 break;
653 }
655}
656
660 LLT HalfTy,
661 Register Reg) const {
662 assert(HalfTy.getSizeInBits() == 32);
663 MachineRegisterInfo *MRI = B.getMRI();
664 Register LoLHS = MRI->createGenericVirtualRegister(HalfTy);
665 Register HiLHS = MRI->createGenericVirtualRegister(HalfTy);
666 const RegisterBank *Bank = getRegBank(Reg, *MRI, *TRI);
667 MRI->setRegBank(LoLHS, *Bank);
668 MRI->setRegBank(HiLHS, *Bank);
669
670 Regs.push_back(LoLHS);
671 Regs.push_back(HiLHS);
672
673 B.buildInstr(AMDGPU::G_UNMERGE_VALUES)
674 .addDef(LoLHS)
675 .addDef(HiLHS)
676 .addUse(Reg);
677}
678
679/// Replace the current type each register in \p Regs has with \p NewTy
681 LLT NewTy) {
682 for (Register Reg : Regs) {
683 assert(MRI.getType(Reg).getSizeInBits() == NewTy.getSizeInBits());
684 MRI.setType(Reg, NewTy);
685 }
686}
687
689 if (Ty.isVector()) {
690 assert(Ty.getElementCount().isKnownMultipleOf(2));
691 return LLT::scalarOrVector(Ty.getElementCount().divideCoefficientBy(2),
692 Ty.getElementType());
693 }
694
695 assert(Ty.getScalarSizeInBits() % 2 == 0);
696 return LLT::scalar(Ty.getScalarSizeInBits() / 2);
697}
698
699// Build one or more V_READFIRSTLANE_B32 instructions to move the given vector
700// source value into a scalar register.
703 Register Src) const {
704 LLT Ty = MRI.getType(Src);
705 const RegisterBank *Bank = getRegBank(Src, MRI, *TRI);
706
707 if (Bank == &AMDGPU::SGPRRegBank)
708 return Src;
709
710 unsigned Bits = Ty.getSizeInBits();
711 assert(Bits % 32 == 0);
712
713 if (Bank != &AMDGPU::VGPRRegBank) {
714 // We need to copy from AGPR to VGPR
715 Src = B.buildCopy(Ty, Src).getReg(0);
716 MRI.setRegBank(Src, AMDGPU::VGPRRegBank);
717 }
718
719 LLT S32 = LLT::scalar(32);
720 unsigned NumParts = Bits / 32;
723
724 if (Bits == 32) {
725 SrcParts.push_back(Src);
726 } else {
727 auto Unmerge = B.buildUnmerge(S32, Src);
728 for (unsigned i = 0; i < NumParts; ++i)
729 SrcParts.push_back(Unmerge.getReg(i));
730 }
731
732 for (unsigned i = 0; i < NumParts; ++i) {
733 Register SrcPart = SrcParts[i];
734 Register DstPart = MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);
735 MRI.setType(DstPart, NumParts == 1 ? Ty : S32);
736
737 const TargetRegisterClass *Constrained =
738 constrainGenericRegister(SrcPart, AMDGPU::VGPR_32RegClass, MRI);
739 (void)Constrained;
740 assert(Constrained && "Failed to constrain readfirstlane src reg");
741
742 B.buildInstr(AMDGPU::V_READFIRSTLANE_B32, {DstPart}, {SrcPart});
743
744 DstParts.push_back(DstPart);
745 }
746
747 if (Bits == 32)
748 return DstParts[0];
749
750 Register Dst = B.buildMergeLikeInstr(Ty, DstParts).getReg(0);
751 MRI.setRegBank(Dst, AMDGPU::SGPRRegBank);
752 return Dst;
753}
754
755/// Legalize instruction \p MI where operands in \p OpIndices must be SGPRs. If
756/// any of the required SGPR operands are VGPRs, perform a waterfall loop to
757/// execute the instruction for each unique combination of values in all lanes
758/// in the wave. The block will be split such that rest of the instructions are
759/// moved to a new block.
760///
761/// Essentially performs this loop:
762//
763/// Save Execution Mask
764/// For (Lane : Wavefront) {
765/// Enable Lane, Disable all other lanes
766/// SGPR = read SGPR value for current lane from VGPR
767/// VGPRResult[Lane] = use_op SGPR
768/// }
769/// Restore Execution Mask
770///
771/// There is additional complexity to try for compare values to identify the
772/// unique values used.
775 SmallSet<Register, 4> &SGPROperandRegs) const {
776 // Track use registers which have already been expanded with a readfirstlane
777 // sequence. This may have multiple uses if moving a sequence.
778 DenseMap<Register, Register> WaterfalledRegMap;
779
780 MachineBasicBlock &MBB = B.getMBB();
781 MachineFunction *MF = &B.getMF();
782
783 const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass();
784 const AMDGPU::LaneMaskConstants &LMC =
786
787#ifndef NDEBUG
788 const int OrigRangeSize = std::distance(Range.begin(), Range.end());
789#endif
790
791 MachineRegisterInfo &MRI = *B.getMRI();
792 Register SaveExecReg = MRI.createVirtualRegister(WaveRC);
793 Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC);
794
795 // Don't bother using generic instructions/registers for the exec mask.
796 B.buildInstr(TargetOpcode::IMPLICIT_DEF)
797 .addDef(InitSaveExecReg);
798
799 Register PhiExec = MRI.createVirtualRegister(WaveRC);
800 Register NewExec = MRI.createVirtualRegister(WaveRC);
801
802 // To insert the loop we need to split the block. Move everything before this
803 // point to a new block, and insert a new empty block before this instruction.
806 MachineBasicBlock *RemainderBB = MF->CreateMachineBasicBlock();
807 MachineBasicBlock *RestoreExecBB = MF->CreateMachineBasicBlock();
809 ++MBBI;
810 MF->insert(MBBI, LoopBB);
811 MF->insert(MBBI, BodyBB);
812 MF->insert(MBBI, RestoreExecBB);
813 MF->insert(MBBI, RemainderBB);
814
815 LoopBB->addSuccessor(BodyBB);
816 BodyBB->addSuccessor(RestoreExecBB);
817 BodyBB->addSuccessor(LoopBB);
818
819 // Move the rest of the block into a new block.
821 RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end());
822
823 MBB.addSuccessor(LoopBB);
824 RestoreExecBB->addSuccessor(RemainderBB);
825
826 B.setInsertPt(*LoopBB, LoopBB->end());
827
828 B.buildInstr(TargetOpcode::PHI)
829 .addDef(PhiExec)
830 .addReg(InitSaveExecReg)
831 .addMBB(&MBB)
832 .addReg(NewExec)
833 .addMBB(BodyBB);
834
835 const DebugLoc &DL = B.getDL();
836
837 MachineInstr &FirstInst = *Range.begin();
838
839 // Move the instruction into the loop body. Note we moved everything after
840 // Range.end() already into a new block, so Range.end() is no longer valid.
841 BodyBB->splice(BodyBB->end(), &MBB, Range.begin(), MBB.end());
842
843 // Figure out the iterator range after splicing the instructions.
844 MachineBasicBlock::iterator NewBegin = FirstInst.getIterator();
845 auto NewEnd = BodyBB->end();
846
847 B.setMBB(*LoopBB);
848
849 LLT S1 = LLT::scalar(1);
850 Register CondReg;
851
852 assert(std::distance(NewBegin, NewEnd) == OrigRangeSize);
853
854 for (MachineInstr &MI : make_range(NewBegin, NewEnd)) {
855 for (MachineOperand &Op : MI.all_uses()) {
856 Register OldReg = Op.getReg();
857 if (!SGPROperandRegs.count(OldReg))
858 continue;
859
860 // See if we already processed this register in another instruction in the
861 // sequence.
862 auto OldVal = WaterfalledRegMap.find(OldReg);
863 if (OldVal != WaterfalledRegMap.end()) {
864 Op.setReg(OldVal->second);
865 continue;
866 }
867
868 Register OpReg = Op.getReg();
869 LLT OpTy = MRI.getType(OpReg);
870
871 const RegisterBank *OpBank = getRegBank(OpReg, MRI, *TRI);
872 if (OpBank != &AMDGPU::VGPRRegBank) {
873 // Insert copy from AGPR to VGPR before the loop.
874 B.setMBB(MBB);
875 OpReg = B.buildCopy(OpTy, OpReg).getReg(0);
876 MRI.setRegBank(OpReg, AMDGPU::VGPRRegBank);
877 B.setMBB(*LoopBB);
878 }
879
880 Register CurrentLaneReg = buildReadFirstLane(B, MRI, OpReg);
881
882 // Build the comparison(s).
883 unsigned OpSize = OpTy.getSizeInBits();
884 bool Is64 = OpSize % 64 == 0;
885 unsigned PartSize = Is64 ? 64 : 32;
886 LLT PartTy = LLT::scalar(PartSize);
887 unsigned NumParts = OpSize / PartSize;
889 SmallVector<Register, 8> CurrentLaneParts;
890
891 if (NumParts == 1) {
892 OpParts.push_back(OpReg);
893 CurrentLaneParts.push_back(CurrentLaneReg);
894 } else {
895 auto UnmergeOp = B.buildUnmerge(PartTy, OpReg);
896 auto UnmergeCurrentLane = B.buildUnmerge(PartTy, CurrentLaneReg);
897 for (unsigned i = 0; i < NumParts; ++i) {
898 OpParts.push_back(UnmergeOp.getReg(i));
899 CurrentLaneParts.push_back(UnmergeCurrentLane.getReg(i));
900 MRI.setRegBank(OpParts[i], AMDGPU::VGPRRegBank);
901 MRI.setRegBank(CurrentLaneParts[i], AMDGPU::SGPRRegBank);
902 }
903 }
904
905 for (unsigned i = 0; i < NumParts; ++i) {
906 auto CmpReg = B.buildICmp(CmpInst::ICMP_EQ, S1, CurrentLaneParts[i],
907 OpParts[i]).getReg(0);
908 MRI.setRegBank(CmpReg, AMDGPU::VCCRegBank);
909
910 if (!CondReg) {
911 CondReg = CmpReg;
912 } else {
913 CondReg = B.buildAnd(S1, CondReg, CmpReg).getReg(0);
914 MRI.setRegBank(CondReg, AMDGPU::VCCRegBank);
915 }
916 }
917
918 Op.setReg(CurrentLaneReg);
919
920 // Make sure we don't re-process this register again.
921 WaterfalledRegMap.insert(std::pair(OldReg, Op.getReg()));
922 }
923 }
924
925 // The ballot becomes a no-op during instruction selection.
926 CondReg = B.buildIntrinsic(Intrinsic::amdgcn_ballot,
927 {LLT::scalar(Subtarget.isWave32() ? 32 : 64)})
928 .addReg(CondReg)
929 .getReg(0);
930 MRI.setRegClass(CondReg, WaveRC);
931
932 // Update EXEC, save the original EXEC value to VCC.
933 B.buildInstr(LMC.AndSaveExecOpc)
934 .addDef(NewExec)
935 .addReg(CondReg, RegState::Kill);
936
937 MRI.setSimpleHint(NewExec, CondReg);
938
939 B.setInsertPt(*BodyBB, BodyBB->end());
940
941 // Update EXEC, switch all done bits to 0 and all todo bits to 1.
942 B.buildInstr(LMC.XorTermOpc)
943 .addDef(LMC.ExecReg)
944 .addReg(LMC.ExecReg)
945 .addReg(NewExec);
946
947 // XXX - s_xor_b64 sets scc to 1 if the result is nonzero, so can we use
948 // s_cbranch_scc0?
949
950 // Loop back to V_READFIRSTLANE_B32 if there are still variants to cover.
951 B.buildInstr(AMDGPU::SI_WATERFALL_LOOP).addMBB(LoopBB);
952
953 // Save the EXEC mask before the loop.
954 BuildMI(MBB, MBB.end(), DL, TII->get(LMC.MovOpc), SaveExecReg)
955 .addReg(LMC.ExecReg);
956
957 // Restore the EXEC mask after the loop.
958 B.setMBB(*RestoreExecBB);
959 B.buildInstr(LMC.MovTermOpc).addDef(LMC.ExecReg).addReg(SaveExecReg);
960
961 // Set the insert point after the original instruction, so any new
962 // instructions will be in the remainder.
963 B.setInsertPt(*RemainderBB, RemainderBB->begin());
964
965 return true;
966}
967
968// Return any unique registers used by \p MI at \p OpIndices that need to be
969// handled in a waterfall loop. Returns these registers in \p
970// SGPROperandRegs. Returns true if there are any operands to handle and a
971// waterfall loop is necessary.
973 SmallSet<Register, 4> &SGPROperandRegs, MachineInstr &MI,
974 MachineRegisterInfo &MRI, ArrayRef<unsigned> OpIndices) const {
975 for (unsigned Op : OpIndices) {
976 assert(MI.getOperand(Op).isUse());
977 Register Reg = MI.getOperand(Op).getReg();
978 const RegisterBank *OpBank = getRegBank(Reg, MRI, *TRI);
979 if (OpBank->getID() != AMDGPU::SGPRRegBankID)
980 SGPROperandRegs.insert(Reg);
981 }
982
983 // No operands need to be replaced, so no need to loop.
984 return !SGPROperandRegs.empty();
985}
986
989 // Use a set to avoid extra readfirstlanes in the case where multiple operands
990 // are the same register.
991 SmallSet<Register, 4> SGPROperandRegs;
992
993 if (!collectWaterfallOperands(SGPROperandRegs, MI, *B.getMRI(), OpIndices))
994 return false;
995
996 MachineBasicBlock::iterator I = MI.getIterator();
997 return executeInWaterfallLoop(B, make_range(I, std::next(I)),
998 SGPROperandRegs);
999}
1000
1001// Legalize an operand that must be an SGPR by inserting a readfirstlane.
1003 MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const {
1004 Register Reg = MI.getOperand(OpIdx).getReg();
1005 MachineRegisterInfo &MRI = *B.getMRI();
1006 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
1007 if (Bank == &AMDGPU::SGPRRegBank)
1008 return;
1009
1010 Reg = buildReadFirstLane(B, MRI, Reg);
1011 MI.getOperand(OpIdx).setReg(Reg);
1012}
1013
1014/// Split \p Ty into 2 pieces. The first will have \p FirstSize bits, and the
1015/// rest will be in the remainder.
1016static std::pair<LLT, LLT> splitUnequalType(LLT Ty, unsigned FirstSize) {
1017 unsigned TotalSize = Ty.getSizeInBits();
1018 if (!Ty.isVector())
1019 return {LLT::scalar(FirstSize), LLT::scalar(TotalSize - FirstSize)};
1020
1021 LLT EltTy = Ty.getElementType();
1022 unsigned EltSize = EltTy.getSizeInBits();
1023 assert(FirstSize % EltSize == 0);
1024
1025 unsigned FirstPartNumElts = FirstSize / EltSize;
1026 unsigned RemainderElts = (TotalSize - FirstSize) / EltSize;
1027
1028 return {LLT::scalarOrVector(ElementCount::getFixed(FirstPartNumElts), EltTy),
1029 LLT::scalarOrVector(ElementCount::getFixed(RemainderElts), EltTy)};
1030}
1031
1033 if (!Ty.isVector())
1034 return LLT::scalar(128);
1035
1036 LLT EltTy = Ty.getElementType();
1037 assert(128 % EltTy.getSizeInBits() == 0);
1038 return LLT::fixed_vector(128 / EltTy.getSizeInBits(), EltTy);
1039}
1040
1044 MachineInstr &MI) const {
1045 MachineRegisterInfo &MRI = *B.getMRI();
1046 Register DstReg = MI.getOperand(0).getReg();
1047 const LLT LoadTy = MRI.getType(DstReg);
1048 unsigned LoadSize = LoadTy.getSizeInBits();
1049 MachineMemOperand *MMO = *MI.memoperands_begin();
1050 const unsigned MaxNonSmrdLoadSize = 128;
1051
1052 const RegisterBank *DstBank =
1053 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1054 if (DstBank == &AMDGPU::SGPRRegBank) {
1055 // There are some special cases that we need to look at for 32 bit and 96
1056 // bit SGPR loads otherwise we have nothing to do.
1057 if (LoadSize != 32 && (LoadSize != 96 || Subtarget.hasScalarDwordx3Loads()))
1058 return false;
1059
1060 const unsigned MemSize = 8 * MMO->getSize().getValue();
1061 // Scalar loads of size 8 or 16 bit with proper alignment may be widened to
1062 // 32 bit. Check to see if we need to widen the memory access, 8 or 16 bit
1063 // scalar loads should have a load size of 32 but memory access size of less
1064 // than 32.
1065 if (LoadSize == 32 &&
1066 (MemSize == 32 || LoadTy.isVector() || !isScalarLoadLegal(MI)))
1067 return false;
1068
1069 if (LoadSize == 32 &&
1070 ((MemSize == 8 && MMO->getAlign() >= Align(1)) ||
1071 (MemSize == 16 && MMO->getAlign() >= Align(2))) &&
1073 Subtarget.getGeneration() >= AMDGPUSubtarget::GFX12)
1074 return false;
1075
1076 Register PtrReg = MI.getOperand(1).getReg();
1077
1078 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
1079
1080 if (LoadSize == 32) {
1081 // This is an extending load from a sub-dword size. Widen the memory
1082 // access size to 4 bytes and clear the extra high bits appropriately
1083 const LLT S32 = LLT::scalar(32);
1084 if (MI.getOpcode() == AMDGPU::G_SEXTLOAD) {
1085 // Must extend the sign bit into higher bits for a G_SEXTLOAD
1086 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1087 B.buildSExtInReg(MI.getOperand(0), WideLoad, MemSize);
1088 } else if (MI.getOpcode() == AMDGPU::G_ZEXTLOAD) {
1089 // Must extend zero into higher bits with an AND for a G_ZEXTLOAD
1090 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1091 B.buildZExtInReg(MI.getOperand(0), WideLoad, MemSize);
1092 } else
1093 // We do not need to touch the higher bits for regular loads.
1094 B.buildLoadFromOffset(MI.getOperand(0), PtrReg, *MMO, 0);
1095 } else {
1096 // 96-bit loads are only available for vector loads. We need to split this
1097 // into a 64-bit part, and 32 (unless we can widen to a 128-bit load).
1098 if (MMO->getAlign() < Align(16)) {
1099 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
1100 LLT Part64, Part32;
1101 std::tie(Part64, Part32) = splitUnequalType(LoadTy, 64);
1102 if (Helper.reduceLoadStoreWidth(cast<GAnyLoad>(MI), 0, Part64) !=
1104 return false;
1105 return true;
1106 }
1107 LLT WiderTy = widen96To128(LoadTy);
1108 auto WideLoad = B.buildLoadFromOffset(WiderTy, PtrReg, *MMO, 0);
1109 if (WiderTy.isScalar()) {
1110 B.buildTrunc(MI.getOperand(0), WideLoad);
1111 } else {
1112 B.buildDeleteTrailingVectorElements(MI.getOperand(0).getReg(),
1113 WideLoad);
1114 }
1115 }
1116
1117 MI.eraseFromParent();
1118 return true;
1119 }
1120
1121 // 128-bit loads are supported for all instruction types.
1122 if (LoadSize <= MaxNonSmrdLoadSize)
1123 return false;
1124
1125 SmallVector<Register, 1> SrcRegs(OpdMapper.getVRegs(1));
1126
1127 if (SrcRegs.empty())
1128 SrcRegs.push_back(MI.getOperand(1).getReg());
1129
1130 // RegBankSelect only emits scalar types, so we need to reset the pointer
1131 // operand to a pointer type.
1132 Register BasePtrReg = SrcRegs[0];
1133 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
1134 MRI.setType(BasePtrReg, PtrTy);
1135
1136 // The following are the loads not splitted enough during legalization
1137 // because it was not clear they are smem-load or vmem-load
1140 assert(LoadSize % MaxNonSmrdLoadSize == 0);
1141 unsigned NumSplitParts = LoadTy.getSizeInBits() / MaxNonSmrdLoadSize;
1142 const LLT LoadSplitTy = LoadTy.divide(NumSplitParts);
1143 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
1144 LegalizerHelper Helper(B.getMF(), O, B);
1145 if (LoadTy.isVector()) {
1146 if (Helper.fewerElementsVector(MI, 0, LoadSplitTy) !=
1148 return false;
1149 } else {
1150 if (Helper.narrowScalar(MI, 0, LoadSplitTy) != LegalizerHelper::Legalized)
1151 return false;
1152 }
1153 }
1154
1155 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
1156 return true;
1157}
1158
1162 MachineInstr &MI) const {
1163 MachineRegisterInfo &MRI = *B.getMRI();
1164 const MachineFunction &MF = B.getMF();
1165 const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
1166 const auto &TFI = *ST.getFrameLowering();
1167
1168 // Guard in case the stack growth direction ever changes with scratch
1169 // instructions.
1170 assert(TFI.getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp &&
1171 "Stack grows upwards for AMDGPU");
1172
1173 Register Dst = MI.getOperand(0).getReg();
1174 Register AllocSize = MI.getOperand(1).getReg();
1175 Align Alignment = assumeAligned(MI.getOperand(2).getImm());
1176
1177 // When using flat-scratch, the stack offset is unscaled.
1178 const bool HasFlatScratch = ST.hasFlatScratchEnabled();
1179 const unsigned WavefrontSizeLog2 = ST.getWavefrontSizeLog2();
1180
1181 const RegisterBank *SizeBank = getRegBank(AllocSize, MRI, *TRI);
1182
1183 if (SizeBank != &AMDGPU::SGPRRegBank) {
1184 auto WaveReduction =
1185 B.buildIntrinsic(Intrinsic::amdgcn_wave_reduce_umax, {LLT::scalar(32)})
1186 .addUse(AllocSize)
1187 .addImm(0);
1188 AllocSize = WaveReduction.getReg(0);
1189 }
1190
1191 LLT PtrTy = MRI.getType(Dst);
1192 LLT IntPtrTy = LLT::scalar(PtrTy.getSizeInBits());
1193
1195 Register SPReg = Info->getStackPtrOffsetReg();
1196 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1197
1198 Register ScaledSize = AllocSize;
1199 if (!HasFlatScratch) {
1200 auto WaveSize = B.buildConstant(LLT::scalar(32), WavefrontSizeLog2);
1201 ScaledSize = B.buildShl(IntPtrTy, AllocSize, WaveSize).getReg(0);
1202 }
1203
1204 auto OldSP = B.buildCopy(PtrTy, SPReg);
1205 if (Alignment > TFI.getStackAlign()) {
1206 const uint64_t ScaledAlignment =
1207 HasFlatScratch ? Alignment.value()
1208 : (Alignment.value() << WavefrontSizeLog2);
1209 const uint64_t StackAlignMask = ScaledAlignment - 1;
1210 auto Tmp1 = B.buildPtrAdd(PtrTy, OldSP,
1211 B.buildConstant(LLT::scalar(32), StackAlignMask));
1212 B.buildMaskLowPtrBits(Dst, Tmp1,
1213 (HasFlatScratch
1214 ? Log2(Alignment)
1215 : Log2(Alignment) + WavefrontSizeLog2));
1216 } else {
1217 B.buildCopy(Dst, OldSP);
1218 }
1219 auto PtrAdd = B.buildPtrAdd(PtrTy, Dst, ScaledSize);
1220 B.buildCopy(SPReg, PtrAdd);
1221 MI.eraseFromParent();
1222 return true;
1223}
1224
1228 int RsrcIdx) const {
1229 const int NumDefs = MI.getNumExplicitDefs();
1230
1231 // The reported argument index is relative to the IR intrinsic call arguments,
1232 // so we need to shift by the number of defs and the intrinsic ID.
1233 RsrcIdx += NumDefs + 1;
1234
1235 // Insert copies to VGPR arguments.
1236 applyDefaultMapping(OpdMapper);
1237
1238 // Fixup any SGPR arguments.
1239 SmallVector<unsigned, 4> SGPRIndexes;
1240 for (int I = NumDefs, NumOps = MI.getNumOperands(); I != NumOps; ++I) {
1241 if (!MI.getOperand(I).isReg())
1242 continue;
1243
1244 // If this intrinsic has a sampler, it immediately follows rsrc.
1245 if (I == RsrcIdx || I == RsrcIdx + 1)
1246 SGPRIndexes.push_back(I);
1247 }
1248
1249 executeInWaterfallLoop(B, MI, SGPRIndexes);
1250 return true;
1251}
1252
1253// Analyze a combined offset from an llvm.amdgcn.s.buffer intrinsic and store
1254// the three offsets (voffset, soffset and instoffset)
1256 MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg,
1257 Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const {
1258 const LLT S32 = LLT::scalar(32);
1259 MachineRegisterInfo *MRI = B.getMRI();
1260
1261 if (std::optional<int64_t> Imm =
1262 getIConstantVRegSExtVal(CombinedOffset, *MRI)) {
1263 uint32_t SOffset, ImmOffset;
1264 if (TII->splitMUBUFOffset(*Imm, SOffset, ImmOffset, Alignment)) {
1265 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1266 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1267 InstOffsetVal = ImmOffset;
1268
1269 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1270 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1271 return SOffset + ImmOffset;
1272 }
1273 }
1274
1275 const bool CheckNUW = Subtarget.hasGFX1250Insts();
1276 Register Base;
1277 unsigned Offset;
1278
1279 std::tie(Base, Offset) =
1280 AMDGPU::getBaseWithConstantOffset(*MRI, CombinedOffset,
1281 /*KnownBits=*/nullptr,
1282 /*CheckNUW=*/CheckNUW);
1283
1284 uint32_t SOffset, ImmOffset;
1285 if (static_cast<int32_t>(Offset) > 0 &&
1286 TII->splitMUBUFOffset(Offset, SOffset, ImmOffset, Alignment)) {
1287 if (getRegBank(Base, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1288 VOffsetReg = Base;
1289 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1290 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1291 InstOffsetVal = ImmOffset;
1292 return 0; // XXX - Why is this 0?
1293 }
1294
1295 // If we have SGPR base, we can use it for soffset.
1296 if (SOffset == 0) {
1297 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1298 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1299 SOffsetReg = Base;
1300 InstOffsetVal = ImmOffset;
1301 return 0; // XXX - Why is this 0?
1302 }
1303 }
1304
1305 // Handle the variable sgpr + vgpr case.
1306 MachineInstr *Add = getOpcodeDef(AMDGPU::G_ADD, CombinedOffset, *MRI);
1307 if (Add && static_cast<int32_t>(Offset) >= 0 &&
1308 (!CheckNUW || Add->getFlag(MachineInstr::NoUWrap))) {
1309 Register Src0 = getSrcRegIgnoringCopies(Add->getOperand(1).getReg(), *MRI);
1310 Register Src1 = getSrcRegIgnoringCopies(Add->getOperand(2).getReg(), *MRI);
1311
1312 const RegisterBank *Src0Bank = getRegBank(Src0, *MRI, *TRI);
1313 const RegisterBank *Src1Bank = getRegBank(Src1, *MRI, *TRI);
1314
1315 if (Src0Bank == &AMDGPU::VGPRRegBank && Src1Bank == &AMDGPU::SGPRRegBank) {
1316 VOffsetReg = Src0;
1317 SOffsetReg = Src1;
1318 return 0;
1319 }
1320
1321 if (Src0Bank == &AMDGPU::SGPRRegBank && Src1Bank == &AMDGPU::VGPRRegBank) {
1322 VOffsetReg = Src1;
1323 SOffsetReg = Src0;
1324 return 0;
1325 }
1326 }
1327
1328 // Ensure we have a VGPR for the combined offset. This could be an issue if we
1329 // have an SGPR offset and a VGPR resource.
1330 if (getRegBank(CombinedOffset, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1331 VOffsetReg = CombinedOffset;
1332 } else {
1333 VOffsetReg = B.buildCopy(S32, CombinedOffset).getReg(0);
1334 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1335 }
1336
1337 SOffsetReg = B.buildConstant(S32, 0).getReg(0);
1338 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1339 return 0;
1340}
1341
1343 switch (Opc) {
1344 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
1345 return AMDGPU::G_AMDGPU_BUFFER_LOAD;
1346 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
1347 return AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE;
1348 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
1349 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE;
1350 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
1351 return AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT;
1352 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT:
1353 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT;
1354 default:
1355 break;
1356 }
1357 llvm_unreachable("Unexpected s_buffer_load opcode");
1358}
1359
1361 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1362 MachineInstr &MI = OpdMapper.getMI();
1363 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1364
1365 const LLT S32 = LLT::scalar(32);
1366 Register Dst = MI.getOperand(0).getReg();
1367 LLT Ty = MRI.getType(Dst);
1368
1369 const RegisterBank *RSrcBank =
1370 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1371 const RegisterBank *OffsetBank =
1372 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1373 if (RSrcBank == &AMDGPU::SGPRRegBank &&
1374 OffsetBank == &AMDGPU::SGPRRegBank)
1375 return true; // Legal mapping
1376
1377 // FIXME: 96-bit case was widened during legalize. We need to narrow it back
1378 // here but don't have an MMO.
1379
1380 unsigned LoadSize = Ty.getSizeInBits();
1381 int NumLoads = 1;
1382 if (LoadSize == 256 || LoadSize == 512) {
1383 NumLoads = LoadSize / 128;
1384 Ty = Ty.divide(NumLoads);
1385 }
1386
1387 // Use the alignment to ensure that the required offsets will fit into the
1388 // immediate offsets.
1389 const Align Alignment = NumLoads > 1 ? Align(16 * NumLoads) : Align(1);
1390
1391 MachineFunction &MF = B.getMF();
1392
1393 Register SOffset;
1394 Register VOffset;
1395 int64_t ImmOffset = 0;
1396
1397 unsigned MMOOffset = setBufferOffsets(B, MI.getOperand(2).getReg(), VOffset,
1398 SOffset, ImmOffset, Alignment);
1399
1400 // TODO: 96-bit loads were widened to 128-bit results. Shrink the result if we
1401 // can, but we need to track an MMO for that.
1402 const unsigned MemSize = (Ty.getSizeInBits() + 7) / 8;
1403 const Align MemAlign(4); // FIXME: ABI type alignment?
1408 MemSize, MemAlign);
1409 if (MMOOffset != 0)
1410 BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset, MemSize);
1411
1412 // If only the offset is divergent, emit a MUBUF buffer load instead. We can
1413 // assume that the buffer is unswizzled.
1414
1415 Register RSrc = MI.getOperand(1).getReg();
1416 Register VIndex = B.buildConstant(S32, 0).getReg(0);
1417 B.getMRI()->setRegBank(VIndex, AMDGPU::VGPRRegBank);
1418
1419 SmallVector<Register, 4> LoadParts(NumLoads);
1420
1421 MachineBasicBlock::iterator MII = MI.getIterator();
1422 MachineInstrSpan Span(MII, &B.getMBB());
1423
1424 for (int i = 0; i < NumLoads; ++i) {
1425 if (NumLoads == 1) {
1426 LoadParts[i] = Dst;
1427 } else {
1428 LoadParts[i] = MRI.createGenericVirtualRegister(Ty);
1429 MRI.setRegBank(LoadParts[i], AMDGPU::VGPRRegBank);
1430 }
1431
1432 if (i != 0)
1433 BaseMMO = MF.getMachineMemOperand(BaseMMO, 16, MemSize);
1434
1435 B.buildInstr(getSBufferLoadCorrespondingBufferLoadOpcode(MI.getOpcode()))
1436 .addDef(LoadParts[i]) // vdata
1437 .addUse(RSrc) // rsrc
1438 .addUse(VIndex) // vindex
1439 .addUse(VOffset) // voffset
1440 .addUse(SOffset) // soffset
1441 .addImm(ImmOffset + 16 * i) // offset(imm)
1442 .addImm(0) // cachepolicy, swizzled buffer(imm)
1443 .addImm(0) // idxen(imm)
1444 .addMemOperand(BaseMMO);
1445 }
1446
1447 // TODO: If only the resource is a VGPR, it may be better to execute the
1448 // scalar load in the waterfall loop if the resource is expected to frequently
1449 // be dynamically uniform.
1450 if (RSrcBank != &AMDGPU::SGPRRegBank) {
1451 // Remove the original instruction to avoid potentially confusing the
1452 // waterfall loop logic.
1453 B.setInstr(*Span.begin());
1454 MI.eraseFromParent();
1455
1456 SmallSet<Register, 4> OpsToWaterfall;
1457
1458 OpsToWaterfall.insert(RSrc);
1459 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
1460 OpsToWaterfall);
1461 }
1462
1463 if (NumLoads != 1) {
1464 if (Ty.isVector())
1465 B.buildConcatVectors(Dst, LoadParts);
1466 else
1467 B.buildMergeLikeInstr(Dst, LoadParts);
1468 }
1469
1470 // We removed the instruction earlier with a waterfall loop.
1471 if (RSrcBank == &AMDGPU::SGPRRegBank)
1472 MI.eraseFromParent();
1473
1474 return true;
1475}
1476
1478 const OperandsMapper &OpdMapper,
1479 bool Signed) const {
1480 MachineInstr &MI = OpdMapper.getMI();
1481 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1482
1483 // Insert basic copies
1484 applyDefaultMapping(OpdMapper);
1485
1486 Register DstReg = MI.getOperand(0).getReg();
1487 LLT Ty = MRI.getType(DstReg);
1488
1489 const LLT S32 = LLT::scalar(32);
1490
1491 unsigned FirstOpnd = isa<GIntrinsic>(MI) ? 2 : 1;
1492 Register SrcReg = MI.getOperand(FirstOpnd).getReg();
1493 Register OffsetReg = MI.getOperand(FirstOpnd + 1).getReg();
1494 Register WidthReg = MI.getOperand(FirstOpnd + 2).getReg();
1495
1496 const RegisterBank *DstBank =
1497 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1498 if (DstBank == &AMDGPU::VGPRRegBank) {
1499 if (Ty == S32)
1500 return true;
1501
1502 // There is no 64-bit vgpr bitfield extract instructions so the operation
1503 // is expanded to a sequence of instructions that implement the operation.
1504 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
1505
1506 const LLT S64 = LLT::scalar(64);
1507 // Shift the source operand so that extracted bits start at bit 0.
1508 auto ShiftOffset = Signed ? B.buildAShr(S64, SrcReg, OffsetReg)
1509 : B.buildLShr(S64, SrcReg, OffsetReg);
1510 auto UnmergeSOffset = B.buildUnmerge({S32, S32}, ShiftOffset);
1511
1512 // A 64-bit bitfield extract uses the 32-bit bitfield extract instructions
1513 // if the width is a constant.
1514 if (auto ConstWidth = getIConstantVRegValWithLookThrough(WidthReg, MRI)) {
1515 // Use the 32-bit bitfield extract instruction if the width is a constant.
1516 // Depending on the width size, use either the low or high 32-bits.
1517 auto Zero = B.buildConstant(S32, 0);
1518 auto WidthImm = ConstWidth->Value.getZExtValue();
1519 if (WidthImm <= 32) {
1520 // Use bitfield extract on the lower 32-bit source, and then sign-extend
1521 // or clear the upper 32-bits.
1522 auto Extract =
1523 Signed ? B.buildSbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg)
1524 : B.buildUbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg);
1525 auto Extend =
1526 Signed ? B.buildAShr(S32, Extract, B.buildConstant(S32, 31)) : Zero;
1527 B.buildMergeLikeInstr(DstReg, {Extract, Extend});
1528 } else {
1529 // Use bitfield extract on upper 32-bit source, and combine with lower
1530 // 32-bit source.
1531 auto UpperWidth = B.buildConstant(S32, WidthImm - 32);
1532 auto Extract =
1533 Signed
1534 ? B.buildSbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth)
1535 : B.buildUbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth);
1536 B.buildMergeLikeInstr(DstReg, {UnmergeSOffset.getReg(0), Extract});
1537 }
1538 MI.eraseFromParent();
1539 return true;
1540 }
1541
1542 // Expand to Src >> Offset << (64 - Width) >> (64 - Width) using 64-bit
1543 // operations.
1544 auto ExtShift = B.buildSub(S32, B.buildConstant(S32, 64), WidthReg);
1545 auto SignBit = B.buildShl(S64, ShiftOffset, ExtShift);
1546 if (Signed)
1547 B.buildAShr(S64, SignBit, ExtShift);
1548 else
1549 B.buildLShr(S64, SignBit, ExtShift);
1550 MI.eraseFromParent();
1551 return true;
1552 }
1553
1554 // The scalar form packs the offset and width in a single operand.
1555
1556 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1557
1558 // Ensure the high bits are clear to insert the offset.
1559 auto OffsetMask = B.buildConstant(S32, maskTrailingOnes<unsigned>(6));
1560 auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
1561
1562 // Zeros out the low bits, so don't bother clamping the input value.
1563 auto ShiftWidth = B.buildShl(S32, WidthReg, B.buildConstant(S32, 16));
1564
1565 // Transformation function, pack the offset and width of a BFE into
1566 // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
1567 // source, bits [5:0] contain the offset and bits [22:16] the width.
1568 auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
1569
1570 // TODO: It might be worth using a pseudo here to avoid scc clobber and
1571 // register class constraints.
1572 unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
1573 (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
1574
1575 auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
1576 constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this);
1577
1578 MI.eraseFromParent();
1579 return true;
1580}
1581
1583 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1584 MachineInstr &MI = OpdMapper.getMI();
1585 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1586
1587 // Insert basic copies.
1588 applyDefaultMapping(OpdMapper);
1589
1590 Register Dst0 = MI.getOperand(0).getReg();
1591 Register Dst1 = MI.getOperand(1).getReg();
1592 Register Src0 = MI.getOperand(2).getReg();
1593 Register Src1 = MI.getOperand(3).getReg();
1594 Register Src2 = MI.getOperand(4).getReg();
1595
1596 if (MRI.getRegBankOrNull(Src0) == &AMDGPU::VGPRRegBank)
1597 return true;
1598
1599 bool IsUnsigned = MI.getOpcode() == AMDGPU::G_AMDGPU_MAD_U64_U32;
1600 LLT S1 = LLT::scalar(1);
1601 LLT S32 = LLT::scalar(32);
1602
1603 bool DstOnValu = MRI.getRegBankOrNull(Src2) == &AMDGPU::VGPRRegBank;
1604 bool Accumulate = true;
1605
1606 if (!DstOnValu) {
1607 if (mi_match(Src2, MRI, m_ZeroInt()))
1608 Accumulate = false;
1609 }
1610
1611 // Keep the multiplication on the SALU.
1612 Register DstHi;
1613 Register DstLo = B.buildMul(S32, Src0, Src1).getReg(0);
1614 bool MulHiInVgpr = false;
1615
1616 MRI.setRegBank(DstLo, AMDGPU::SGPRRegBank);
1617
1618 if (Subtarget.hasSMulHi()) {
1619 DstHi = IsUnsigned ? B.buildUMulH(S32, Src0, Src1).getReg(0)
1620 : B.buildSMulH(S32, Src0, Src1).getReg(0);
1621 MRI.setRegBank(DstHi, AMDGPU::SGPRRegBank);
1622 } else {
1623 Register VSrc0 = B.buildCopy(S32, Src0).getReg(0);
1624 Register VSrc1 = B.buildCopy(S32, Src1).getReg(0);
1625
1626 MRI.setRegBank(VSrc0, AMDGPU::VGPRRegBank);
1627 MRI.setRegBank(VSrc1, AMDGPU::VGPRRegBank);
1628
1629 DstHi = IsUnsigned ? B.buildUMulH(S32, VSrc0, VSrc1).getReg(0)
1630 : B.buildSMulH(S32, VSrc0, VSrc1).getReg(0);
1631 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1632
1633 if (!DstOnValu) {
1634 DstHi = buildReadFirstLane(B, MRI, DstHi);
1635 } else {
1636 MulHiInVgpr = true;
1637 }
1638 }
1639
1640 // Accumulate and produce the "carry-out" bit.
1641 //
1642 // The "carry-out" is defined as bit 64 of the result when computed as a
1643 // big integer. For unsigned multiply-add, this matches the usual definition
1644 // of carry-out. For signed multiply-add, bit 64 is the sign bit of the
1645 // result, which is determined as:
1646 // sign(Src0 * Src1) + sign(Src2) + carry-out from unsigned 64-bit add
1647 LLT CarryType = DstOnValu ? S1 : S32;
1648 const RegisterBank &CarryBank =
1649 DstOnValu ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
1650 const RegisterBank &DstBank =
1651 DstOnValu ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank;
1652 Register Carry;
1653 Register Zero;
1654
1655 if (!IsUnsigned) {
1656 Zero = B.buildConstant(S32, 0).getReg(0);
1657 MRI.setRegBank(Zero,
1658 MulHiInVgpr ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank);
1659
1660 Carry = B.buildICmp(CmpInst::ICMP_SLT, MulHiInVgpr ? S1 : S32, DstHi, Zero)
1661 .getReg(0);
1662 MRI.setRegBank(Carry, MulHiInVgpr ? AMDGPU::VCCRegBank
1663 : AMDGPU::SGPRRegBank);
1664
1665 if (DstOnValu && !MulHiInVgpr) {
1666 Carry = B.buildTrunc(S1, Carry).getReg(0);
1667 MRI.setRegBank(Carry, AMDGPU::VCCRegBank);
1668 }
1669 }
1670
1671 if (Accumulate) {
1672 if (DstOnValu) {
1673 DstLo = B.buildCopy(S32, DstLo).getReg(0);
1674 DstHi = B.buildCopy(S32, DstHi).getReg(0);
1675 MRI.setRegBank(DstLo, AMDGPU::VGPRRegBank);
1676 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1677 }
1678
1679 auto Unmerge = B.buildUnmerge(S32, Src2);
1680 Register Src2Lo = Unmerge.getReg(0);
1681 Register Src2Hi = Unmerge.getReg(1);
1682 MRI.setRegBank(Src2Lo, DstBank);
1683 MRI.setRegBank(Src2Hi, DstBank);
1684
1685 if (!IsUnsigned) {
1686 auto Src2Sign = B.buildICmp(CmpInst::ICMP_SLT, CarryType, Src2Hi, Zero);
1687 MRI.setRegBank(Src2Sign.getReg(0), CarryBank);
1688
1689 Carry = B.buildXor(CarryType, Carry, Src2Sign).getReg(0);
1690 MRI.setRegBank(Carry, CarryBank);
1691 }
1692
1693 auto AddLo = B.buildUAddo(S32, CarryType, DstLo, Src2Lo);
1694 DstLo = AddLo.getReg(0);
1695 Register CarryLo = AddLo.getReg(1);
1696 MRI.setRegBank(DstLo, DstBank);
1697 MRI.setRegBank(CarryLo, CarryBank);
1698
1699 auto AddHi = B.buildUAdde(S32, CarryType, DstHi, Src2Hi, CarryLo);
1700 DstHi = AddHi.getReg(0);
1701 MRI.setRegBank(DstHi, DstBank);
1702
1703 Register CarryHi = AddHi.getReg(1);
1704 MRI.setRegBank(CarryHi, CarryBank);
1705
1706 if (IsUnsigned) {
1707 Carry = CarryHi;
1708 } else {
1709 Carry = B.buildXor(CarryType, Carry, CarryHi).getReg(0);
1710 MRI.setRegBank(Carry, CarryBank);
1711 }
1712 } else {
1713 if (IsUnsigned) {
1714 Carry = B.buildConstant(CarryType, 0).getReg(0);
1715 MRI.setRegBank(Carry, CarryBank);
1716 }
1717 }
1718
1719 B.buildMergeLikeInstr(Dst0, {DstLo, DstHi});
1720
1721 if (DstOnValu) {
1722 B.buildCopy(Dst1, Carry);
1723 } else {
1724 B.buildTrunc(Dst1, Carry);
1725 }
1726
1727 MI.eraseFromParent();
1728 return true;
1729}
1730
1731// Return a suitable opcode for extending the operands of Opc when widening.
1732static unsigned getExtendOp(unsigned Opc) {
1733 switch (Opc) {
1734 case TargetOpcode::G_ASHR:
1735 case TargetOpcode::G_SMIN:
1736 case TargetOpcode::G_SMAX:
1737 return TargetOpcode::G_SEXT;
1738 case TargetOpcode::G_LSHR:
1739 case TargetOpcode::G_UMIN:
1740 case TargetOpcode::G_UMAX:
1741 return TargetOpcode::G_ZEXT;
1742 default:
1743 return TargetOpcode::G_ANYEXT;
1744 }
1745}
1746
1747// Emit a legalized extension from <2 x s16> to 2 32-bit components, avoiding
1748// any illegal vector extend or unmerge operations.
1749static std::pair<Register, Register>
1750unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode) {
1751 const LLT S32 = LLT::scalar(32);
1752 auto Bitcast = B.buildBitcast(S32, Src);
1753
1754 if (ExtOpcode == TargetOpcode::G_SEXT) {
1755 auto ExtLo = B.buildSExtInReg(S32, Bitcast, 16);
1756 auto ShiftHi = B.buildAShr(S32, Bitcast, B.buildConstant(S32, 16));
1757 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1758 }
1759
1760 auto ShiftHi = B.buildLShr(S32, Bitcast, B.buildConstant(S32, 16));
1761 if (ExtOpcode == TargetOpcode::G_ZEXT) {
1762 auto ExtLo = B.buildAnd(S32, Bitcast, B.buildConstant(S32, 0xffff));
1763 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1764 }
1765
1766 assert(ExtOpcode == TargetOpcode::G_ANYEXT);
1767 return std::pair(Bitcast.getReg(0), ShiftHi.getReg(0));
1768}
1769
1770// For cases where only a single copy is inserted for matching register banks.
1771// Replace the register in the instruction operand
1773 const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx) {
1774 SmallVector<unsigned, 1> SrcReg(OpdMapper.getVRegs(OpIdx));
1775 if (!SrcReg.empty()) {
1776 assert(SrcReg.size() == 1);
1777 OpdMapper.getMI().getOperand(OpIdx).setReg(SrcReg[0]);
1778 return true;
1779 }
1780
1781 return false;
1782}
1783
1784/// Handle register layout difference for f16 images for some subtargets.
1787 Register Reg) const {
1788 if (!Subtarget.hasUnpackedD16VMem())
1789 return Reg;
1790
1791 const LLT S16 = LLT::scalar(16);
1792 LLT StoreVT = MRI.getType(Reg);
1793 if (!StoreVT.isVector() || StoreVT.getElementType() != S16)
1794 return Reg;
1795
1796 auto Unmerge = B.buildUnmerge(S16, Reg);
1797
1798
1799 SmallVector<Register, 4> WideRegs;
1800 for (int I = 0, E = Unmerge->getNumOperands() - 1; I != E; ++I)
1801 WideRegs.push_back(Unmerge.getReg(I));
1802
1803 const LLT S32 = LLT::scalar(32);
1804 int NumElts = StoreVT.getNumElements();
1805
1806 return B.buildMergeLikeInstr(LLT::fixed_vector(NumElts, S32), WideRegs)
1807 .getReg(0);
1808}
1809
1810static std::pair<Register, unsigned>
1812 int64_t Const;
1813 if (mi_match(Reg, MRI, m_ICst(Const)))
1814 return std::pair(Register(), Const);
1815
1816 Register Base;
1817 if (mi_match(Reg, MRI, m_GAdd(m_Reg(Base), m_ICst(Const))))
1818 return std::pair(Base, Const);
1819
1820 // TODO: Handle G_OR used for add case
1821 return std::pair(Reg, 0);
1822}
1823
1824std::pair<Register, unsigned>
1826 Register OrigOffset) const {
1827 const unsigned MaxImm = SIInstrInfo::getMaxMUBUFImmOffset(Subtarget);
1828 Register BaseReg;
1829 unsigned ImmOffset;
1830 const LLT S32 = LLT::scalar(32);
1831
1832 // TODO: Use AMDGPU::getBaseWithConstantOffset() instead.
1833 std::tie(BaseReg, ImmOffset) = getBaseWithConstantOffset(*B.getMRI(),
1834 OrigOffset);
1835
1836 unsigned C1 = 0;
1837 if (ImmOffset != 0) {
1838 // If the immediate value is too big for the immoffset field, put only bits
1839 // that would normally fit in the immoffset field. The remaining value that
1840 // is copied/added for the voffset field is a large power of 2, and it
1841 // stands more chance of being CSEd with the copy/add for another similar
1842 // load/store.
1843 // However, do not do that rounding down if that is a negative
1844 // number, as it appears to be illegal to have a negative offset in the
1845 // vgpr, even if adding the immediate offset makes it positive.
1846 unsigned Overflow = ImmOffset & ~MaxImm;
1847 ImmOffset -= Overflow;
1848 if (static_cast<int32_t>(Overflow) < 0) {
1849 Overflow += ImmOffset;
1850 ImmOffset = 0;
1851 }
1852
1853 C1 = ImmOffset;
1854 if (Overflow != 0) {
1855 if (!BaseReg)
1856 BaseReg = B.buildConstant(S32, Overflow).getReg(0);
1857 else {
1858 auto OverflowVal = B.buildConstant(S32, Overflow);
1859 BaseReg = B.buildAdd(S32, BaseReg, OverflowVal).getReg(0);
1860 }
1861 }
1862 }
1863
1864 if (!BaseReg)
1865 BaseReg = B.buildConstant(S32, 0).getReg(0);
1866
1867 return {BaseReg, C1};
1868}
1869
1871 Register SrcReg) const {
1872 MachineRegisterInfo &MRI = *B.getMRI();
1873 LLT SrcTy = MRI.getType(SrcReg);
1874 if (SrcTy.getSizeInBits() == 32) {
1875 // Use a v_mov_b32 here to make the exec dependency explicit.
1876 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1877 .addDef(DstReg)
1878 .addUse(SrcReg);
1879 return constrainGenericRegister(DstReg, AMDGPU::VGPR_32RegClass, MRI) &&
1880 constrainGenericRegister(SrcReg, AMDGPU::SReg_32RegClass, MRI);
1881 }
1882
1883 Register TmpReg0 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1884 Register TmpReg1 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1885
1886 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1887 .addDef(TmpReg0)
1888 .addUse(SrcReg, {}, AMDGPU::sub0);
1889 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1890 .addDef(TmpReg1)
1891 .addUse(SrcReg, {}, AMDGPU::sub1);
1892 B.buildInstr(AMDGPU::REG_SEQUENCE)
1893 .addDef(DstReg)
1894 .addUse(TmpReg0)
1895 .addImm(AMDGPU::sub0)
1896 .addUse(TmpReg1)
1897 .addImm(AMDGPU::sub1);
1898
1899 return constrainGenericRegister(SrcReg, AMDGPU::SReg_64RegClass, MRI) &&
1900 constrainGenericRegister(DstReg, AMDGPU::VReg_64RegClass, MRI);
1901}
1902
1903/// Utility function for pushing dynamic vector indexes with a constant offset
1904/// into waterfall loops.
1906 MachineInstr &IdxUseInstr,
1907 unsigned OpIdx,
1908 unsigned ConstOffset) {
1909 MachineRegisterInfo &MRI = *B.getMRI();
1910 const LLT S32 = LLT::scalar(32);
1911 Register WaterfallIdx = IdxUseInstr.getOperand(OpIdx).getReg();
1912 B.setInsertPt(*IdxUseInstr.getParent(), IdxUseInstr.getIterator());
1913
1914 auto MaterializedOffset = B.buildConstant(S32, ConstOffset);
1915
1916 auto Add = B.buildAdd(S32, WaterfallIdx, MaterializedOffset);
1917 MRI.setRegBank(MaterializedOffset.getReg(0), AMDGPU::SGPRRegBank);
1918 MRI.setRegBank(Add.getReg(0), AMDGPU::SGPRRegBank);
1919 IdxUseInstr.getOperand(OpIdx).setReg(Add.getReg(0));
1920}
1921
1922/// Implement extending a 32-bit value to a 64-bit value. \p Lo32Reg is the
1923/// original 32-bit source value (to be inserted in the low part of the combined
1924/// 64-bit result), and \p Hi32Reg is the high half of the combined 64-bit
1925/// value.
1927 Register Hi32Reg, Register Lo32Reg,
1928 unsigned ExtOpc,
1929 const RegisterBank &RegBank,
1930 bool IsBooleanSrc = false) {
1931 if (ExtOpc == AMDGPU::G_ZEXT) {
1932 B.buildConstant(Hi32Reg, 0);
1933 } else if (ExtOpc == AMDGPU::G_SEXT) {
1934 if (IsBooleanSrc) {
1935 // If we know the original source was an s1, the high half is the same as
1936 // the low.
1937 B.buildCopy(Hi32Reg, Lo32Reg);
1938 } else {
1939 // Replicate sign bit from 32-bit extended part.
1940 auto ShiftAmt = B.buildConstant(LLT::scalar(32), 31);
1941 B.getMRI()->setRegBank(ShiftAmt.getReg(0), RegBank);
1942 B.buildAShr(Hi32Reg, Lo32Reg, ShiftAmt);
1943 }
1944 } else {
1945 assert(ExtOpc == AMDGPU::G_ANYEXT && "not an integer extension");
1946 B.buildUndef(Hi32Reg);
1947 }
1948}
1949
1950bool AMDGPURegisterBankInfo::foldExtractEltToCmpSelect(
1952 const OperandsMapper &OpdMapper) const {
1953 MachineRegisterInfo &MRI = *B.getMRI();
1954
1955 Register VecReg = MI.getOperand(1).getReg();
1956 Register Idx = MI.getOperand(2).getReg();
1957
1958 const RegisterBank &IdxBank =
1959 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1960
1961 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
1962
1963 LLT VecTy = MRI.getType(VecReg);
1964 unsigned EltSize = VecTy.getScalarSizeInBits();
1965 unsigned NumElem = VecTy.getNumElements();
1966
1967 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
1968 IsDivergentIdx, &Subtarget))
1969 return false;
1970
1971 LLT S32 = LLT::scalar(32);
1972
1973 const RegisterBank &DstBank =
1974 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1975 const RegisterBank &SrcBank =
1976 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1977
1978 const RegisterBank &CCBank =
1979 (DstBank == AMDGPU::SGPRRegBank &&
1980 SrcBank == AMDGPU::SGPRRegBank &&
1981 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
1982 : AMDGPU::VCCRegBank;
1983 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
1984
1985 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
1986 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
1987 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
1988 }
1989
1990 LLT EltTy = VecTy.getScalarType();
1991 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
1992 unsigned NumLanes = DstRegs.size();
1993 if (!NumLanes)
1994 NumLanes = 1;
1995 else
1996 EltTy = MRI.getType(DstRegs[0]);
1997
1998 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
1999 SmallVector<Register, 2> Res(NumLanes);
2000 for (unsigned L = 0; L < NumLanes; ++L)
2001 Res[L] = UnmergeToEltTy.getReg(L);
2002
2003 for (unsigned I = 1; I < NumElem; ++I) {
2004 auto IC = B.buildConstant(S32, I);
2005 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
2006 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
2007 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
2008
2009 for (unsigned L = 0; L < NumLanes; ++L) {
2010 auto S = B.buildSelect(EltTy, Cmp,
2011 UnmergeToEltTy.getReg(I * NumLanes + L), Res[L]);
2012
2013 for (unsigned N : { 0, 2, 3 })
2014 MRI.setRegBank(S->getOperand(N).getReg(), DstBank);
2015
2016 Res[L] = S->getOperand(0).getReg();
2017 }
2018 }
2019
2020 for (unsigned L = 0; L < NumLanes; ++L) {
2021 Register DstReg = (NumLanes == 1) ? MI.getOperand(0).getReg() : DstRegs[L];
2022 B.buildCopy(DstReg, Res[L]);
2023 MRI.setRegBank(DstReg, DstBank);
2024 }
2025
2026 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2027 MI.eraseFromParent();
2028
2029 return true;
2030}
2031
2032// Insert a cross regbank copy for a register if it already has a bank that
2033// differs from the one we want to set.
2036 const RegisterBank &Bank) {
2037 const RegisterBank *CurrBank = MRI.getRegBankOrNull(Reg);
2038 if (CurrBank && *CurrBank != Bank) {
2039 Register Copy = B.buildCopy(MRI.getType(Reg), Reg).getReg(0);
2040 MRI.setRegBank(Copy, Bank);
2041 return Copy;
2042 }
2043
2044 MRI.setRegBank(Reg, Bank);
2045 return Reg;
2046}
2047
2048bool AMDGPURegisterBankInfo::foldInsertEltToCmpSelect(
2050 const OperandsMapper &OpdMapper) const {
2051
2052 MachineRegisterInfo &MRI = *B.getMRI();
2053 Register VecReg = MI.getOperand(1).getReg();
2054 Register Idx = MI.getOperand(3).getReg();
2055
2056 const RegisterBank &IdxBank =
2057 *OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2058
2059 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
2060
2061 LLT VecTy = MRI.getType(VecReg);
2062 unsigned EltSize = VecTy.getScalarSizeInBits();
2063 unsigned NumElem = VecTy.getNumElements();
2064
2065 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
2066 IsDivergentIdx, &Subtarget))
2067 return false;
2068
2069 LLT S32 = LLT::scalar(32);
2070
2071 const RegisterBank &DstBank =
2072 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2073 const RegisterBank &SrcBank =
2074 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2075 const RegisterBank &InsBank =
2076 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2077
2078 const RegisterBank &CCBank =
2079 (DstBank == AMDGPU::SGPRRegBank &&
2080 SrcBank == AMDGPU::SGPRRegBank &&
2081 InsBank == AMDGPU::SGPRRegBank &&
2082 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
2083 : AMDGPU::VCCRegBank;
2084 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
2085
2086 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
2087 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
2088 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
2089 }
2090
2091 LLT EltTy = VecTy.getScalarType();
2092 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2093 unsigned NumLanes = InsRegs.size();
2094 if (!NumLanes) {
2095 NumLanes = 1;
2096 InsRegs.push_back(MI.getOperand(2).getReg());
2097 } else {
2098 EltTy = MRI.getType(InsRegs[0]);
2099 }
2100
2101 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
2102 SmallVector<Register, 16> Ops(NumElem * NumLanes);
2103
2104 for (unsigned I = 0; I < NumElem; ++I) {
2105 auto IC = B.buildConstant(S32, I);
2106 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
2107 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
2108 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
2109
2110 for (unsigned L = 0; L < NumLanes; ++L) {
2111 Register Op0 = constrainRegToBank(MRI, B, InsRegs[L], DstBank);
2112 Register Op1 = UnmergeToEltTy.getReg(I * NumLanes + L);
2113 Op1 = constrainRegToBank(MRI, B, Op1, DstBank);
2114
2115 Register Select = B.buildSelect(EltTy, Cmp, Op0, Op1).getReg(0);
2116 MRI.setRegBank(Select, DstBank);
2117
2118 Ops[I * NumLanes + L] = Select;
2119 }
2120 }
2121
2122 LLT MergeTy = LLT::fixed_vector(Ops.size(), EltTy);
2123 if (MergeTy == MRI.getType(MI.getOperand(0).getReg())) {
2124 B.buildBuildVector(MI.getOperand(0), Ops);
2125 } else {
2126 auto Vec = B.buildBuildVector(MergeTy, Ops);
2127 MRI.setRegBank(Vec->getOperand(0).getReg(), DstBank);
2128 B.buildBitcast(MI.getOperand(0).getReg(), Vec);
2129 }
2130
2131 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2132 MI.eraseFromParent();
2133
2134 return true;
2135}
2136
2137// Break s_mul_u64 into 32-bit vector operations.
2139 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2140 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2141 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2142 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2143
2144 // All inputs are SGPRs, nothing special to do.
2145 if (DefRegs.empty()) {
2146 assert(Src0Regs.empty() && Src1Regs.empty());
2147 applyDefaultMapping(OpdMapper);
2148 return;
2149 }
2150
2151 assert(DefRegs.size() == 2);
2152 assert(Src0Regs.size() == Src1Regs.size() &&
2153 (Src0Regs.empty() || Src0Regs.size() == 2));
2154
2155 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2156 MachineInstr &MI = OpdMapper.getMI();
2157 Register DstReg = MI.getOperand(0).getReg();
2158 LLT HalfTy = LLT::scalar(32);
2159
2160 // Depending on where the source registers came from, the generic code may
2161 // have decided to split the inputs already or not. If not, we still need to
2162 // extract the values.
2163
2164 if (Src0Regs.empty())
2165 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2166 else
2167 setRegsToType(MRI, Src0Regs, HalfTy);
2168
2169 if (Src1Regs.empty())
2170 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2171 else
2172 setRegsToType(MRI, Src1Regs, HalfTy);
2173
2174 setRegsToType(MRI, DefRegs, HalfTy);
2175
2176 // The multiplication is done as follows:
2177 //
2178 // Op1H Op1L
2179 // * Op0H Op0L
2180 // --------------------
2181 // Op1H*Op0L Op1L*Op0L
2182 // + Op1H*Op0H Op1L*Op0H
2183 // -----------------------------------------
2184 // (Op1H*Op0L + Op1L*Op0H + carry) Op1L*Op0L
2185 //
2186 // We drop Op1H*Op0H because the result of the multiplication is a 64-bit
2187 // value and that would overflow.
2188 // The low 32-bit value is Op1L*Op0L.
2189 // The high 32-bit value is Op1H*Op0L + Op1L*Op0H + carry (from
2190 // Op1L*Op0L).
2191
2192 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
2193
2194 Register Hi = B.buildUMulH(HalfTy, Src0Regs[0], Src1Regs[0]).getReg(0);
2195 Register MulLoHi = B.buildMul(HalfTy, Src0Regs[0], Src1Regs[1]).getReg(0);
2196 Register Add = B.buildAdd(HalfTy, Hi, MulLoHi).getReg(0);
2197 Register MulHiLo = B.buildMul(HalfTy, Src0Regs[1], Src1Regs[0]).getReg(0);
2198 B.buildAdd(DefRegs[1], Add, MulHiLo);
2199 B.buildMul(DefRegs[0], Src0Regs[0], Src1Regs[0]);
2200
2201 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2202 MI.eraseFromParent();
2203}
2204
2206 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2207 MachineInstr &MI = OpdMapper.getMI();
2208 B.setInstrAndDebugLoc(MI);
2209 unsigned Opc = MI.getOpcode();
2210 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2211 switch (Opc) {
2212 case AMDGPU::G_CONSTANT:
2213 case AMDGPU::G_IMPLICIT_DEF: {
2214 Register DstReg = MI.getOperand(0).getReg();
2215 LLT DstTy = MRI.getType(DstReg);
2216 if (DstTy != LLT::scalar(1))
2217 break;
2218
2219 const RegisterBank *DstBank =
2220 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2221 if (DstBank == &AMDGPU::VCCRegBank)
2222 break;
2223 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2224 if (DefRegs.empty())
2225 DefRegs.push_back(DstReg);
2226
2227 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2228
2230 LLVMContext &Ctx = B.getMF().getFunction().getContext();
2231
2232 MI.getOperand(0).setReg(NewDstReg);
2233 if (Opc != AMDGPU::G_IMPLICIT_DEF) {
2234 uint64_t ConstVal = MI.getOperand(1).getCImm()->getZExtValue();
2235 MI.getOperand(1).setCImm(
2236 ConstantInt::get(IntegerType::getInt32Ty(Ctx), ConstVal));
2237 }
2238
2239 MRI.setRegBank(NewDstReg, *DstBank);
2240 B.buildTrunc(DefRegs[0], NewDstReg);
2241 return;
2242 }
2243 case AMDGPU::G_PHI: {
2244 Register DstReg = MI.getOperand(0).getReg();
2245 LLT DstTy = MRI.getType(DstReg);
2246 if (DstTy != LLT::scalar(1))
2247 break;
2248
2249 const LLT S32 = LLT::scalar(32);
2250 const RegisterBank *DstBank =
2251 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2252 if (DstBank == &AMDGPU::VCCRegBank) {
2253 applyDefaultMapping(OpdMapper);
2254 // The standard handling only considers the result register bank for
2255 // phis. For VCC, blindly inserting a copy when the phi is lowered will
2256 // produce an invalid copy. We can only copy with some kind of compare to
2257 // get a vector boolean result. Insert a register bank copy that will be
2258 // correctly lowered to a compare.
2259 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
2260 Register SrcReg = MI.getOperand(I).getReg();
2261 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
2262
2263 if (SrcBank != &AMDGPU::VCCRegBank) {
2264 MachineBasicBlock *SrcMBB = MI.getOperand(I + 1).getMBB();
2265 B.setInsertPt(*SrcMBB, SrcMBB->getFirstTerminator());
2266
2267 auto Copy = B.buildCopy(LLT::scalar(1), SrcReg);
2268 MRI.setRegBank(Copy.getReg(0), AMDGPU::VCCRegBank);
2269 MI.getOperand(I).setReg(Copy.getReg(0));
2270 }
2271 }
2272
2273 return;
2274 }
2275
2276 // Phi handling is strange and only considers the bank of the destination.
2277 substituteSimpleCopyRegs(OpdMapper, 0);
2278
2279 // Promote SGPR/VGPR booleans to s32
2280 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2281 B.setInsertPt(B.getMBB(), MI);
2282 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
2283
2284 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2285 llvm_unreachable("widen scalar should have succeeded");
2286
2287 return;
2288 }
2289 case AMDGPU::G_FCMP:
2290 if (!Subtarget.hasSALUFloatInsts())
2291 break;
2292 [[fallthrough]];
2293 case AMDGPU::G_ICMP:
2294 case AMDGPU::G_UADDO:
2295 case AMDGPU::G_USUBO:
2296 case AMDGPU::G_UADDE:
2297 case AMDGPU::G_SADDE:
2298 case AMDGPU::G_USUBE:
2299 case AMDGPU::G_SSUBE: {
2300 unsigned BoolDstOp =
2301 (Opc == AMDGPU::G_ICMP || Opc == AMDGPU::G_FCMP) ? 0 : 1;
2302 Register DstReg = MI.getOperand(BoolDstOp).getReg();
2303
2304 const RegisterBank *DstBank =
2305 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2306 if (DstBank != &AMDGPU::SGPRRegBank)
2307 break;
2308
2309 const bool HasCarryIn = MI.getNumOperands() == 5;
2310
2311 // If this is a scalar compare, promote the result to s32, as the selection
2312 // will end up using a copy to a 32-bit vreg.
2313 const LLT S32 = LLT::scalar(32);
2314 Register NewDstReg = MRI.createGenericVirtualRegister(S32);
2315 MRI.setRegBank(NewDstReg, AMDGPU::SGPRRegBank);
2316 MI.getOperand(BoolDstOp).setReg(NewDstReg);
2317
2318 if (HasCarryIn) {
2319 Register NewSrcReg = MRI.createGenericVirtualRegister(S32);
2320 MRI.setRegBank(NewSrcReg, AMDGPU::SGPRRegBank);
2321 B.buildZExt(NewSrcReg, MI.getOperand(4).getReg());
2322 MI.getOperand(4).setReg(NewSrcReg);
2323 }
2324
2325 MachineBasicBlock *MBB = MI.getParent();
2326 B.setInsertPt(*MBB, std::next(MI.getIterator()));
2327
2328 // If we had a constrained VCC result register, a copy was inserted to VCC
2329 // from SGPR.
2330 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2331 if (DefRegs.empty())
2332 DefRegs.push_back(DstReg);
2333 B.buildTrunc(DefRegs[0], NewDstReg);
2334 return;
2335 }
2336 case AMDGPU::G_SELECT: {
2337 Register DstReg = MI.getOperand(0).getReg();
2338 LLT DstTy = MRI.getType(DstReg);
2339
2340 SmallVector<Register, 1> CondRegs(OpdMapper.getVRegs(1));
2341 if (CondRegs.empty())
2342 CondRegs.push_back(MI.getOperand(1).getReg());
2343 else {
2344 assert(CondRegs.size() == 1);
2345 }
2346
2347 const RegisterBank *CondBank = getRegBank(CondRegs[0], MRI, *TRI);
2348 if (CondBank == &AMDGPU::SGPRRegBank) {
2349 const LLT S32 = LLT::scalar(32);
2350 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2351 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2352
2353 MI.getOperand(1).setReg(NewCondReg);
2354 B.buildZExt(NewCondReg, CondRegs[0]);
2355 }
2356
2357 if (DstTy.getSizeInBits() != 64)
2358 break;
2359
2360 LLT HalfTy = getHalfSizedType(DstTy);
2361
2362 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2363 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2364 SmallVector<Register, 2> Src2Regs(OpdMapper.getVRegs(3));
2365
2366 // All inputs are SGPRs, nothing special to do.
2367 if (DefRegs.empty()) {
2368 assert(Src1Regs.empty() && Src2Regs.empty());
2369 break;
2370 }
2371
2372 if (Src1Regs.empty())
2373 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2374 else {
2375 setRegsToType(MRI, Src1Regs, HalfTy);
2376 }
2377
2378 if (Src2Regs.empty())
2379 split64BitValueForMapping(B, Src2Regs, HalfTy, MI.getOperand(3).getReg());
2380 else
2381 setRegsToType(MRI, Src2Regs, HalfTy);
2382
2383 setRegsToType(MRI, DefRegs, HalfTy);
2384
2385 auto Flags = MI.getFlags();
2386 B.buildSelect(DefRegs[0], CondRegs[0], Src1Regs[0], Src2Regs[0], Flags);
2387 B.buildSelect(DefRegs[1], CondRegs[0], Src1Regs[1], Src2Regs[1], Flags);
2388
2389 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2390 MI.eraseFromParent();
2391 return;
2392 }
2393 case AMDGPU::G_BRCOND: {
2394 Register CondReg = MI.getOperand(0).getReg();
2395 // FIXME: Should use legalizer helper, but should change bool ext type.
2396 const RegisterBank *CondBank =
2397 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2398
2399 if (CondBank == &AMDGPU::SGPRRegBank) {
2400 const LLT S32 = LLT::scalar(32);
2401 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2402 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2403
2404 MI.getOperand(0).setReg(NewCondReg);
2405 B.buildZExt(NewCondReg, CondReg);
2406 return;
2407 }
2408
2409 break;
2410 }
2411 case AMDGPU::G_AND:
2412 case AMDGPU::G_OR:
2413 case AMDGPU::G_XOR: {
2414 // 64-bit and is only available on the SALU, so split into 2 32-bit ops if
2415 // there is a VGPR input.
2416 Register DstReg = MI.getOperand(0).getReg();
2417 LLT DstTy = MRI.getType(DstReg);
2418
2419 const RegisterBank *DstBank =
2420 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2421
2422 if (DstTy.getSizeInBits() == 1) {
2423 if (DstBank == &AMDGPU::VCCRegBank)
2424 break;
2425
2426 MachineFunction *MF = MI.getMF();
2427 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2428 LegalizerHelper Helper(*MF, ApplyBank, B);
2429
2430 if (Helper.widenScalar(MI, 0, LLT::scalar(32)) !=
2432 llvm_unreachable("widen scalar should have succeeded");
2433 return;
2434 }
2435
2436 if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) {
2437 const LLT S32 = LLT::scalar(32);
2438 MachineBasicBlock *MBB = MI.getParent();
2439 MachineFunction *MF = MBB->getParent();
2440 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2441 LegalizerHelper Helper(*MF, ApplySALU, B);
2442 // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening
2443 // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1
2444 // as "not".
2445 if (MI.getOpcode() == AMDGPU::G_XOR &&
2446 mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) {
2447 Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
2448 Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT);
2449 Helper.widenScalarDst(MI, S32);
2450 } else {
2451 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2452 llvm_unreachable("widen scalar should have succeeded");
2453 }
2454 return;
2455 }
2456
2457 if (DstTy.getSizeInBits() != 64)
2458 break;
2459
2460 LLT HalfTy = getHalfSizedType(DstTy);
2461 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2462 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2463 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2464
2465 // All inputs are SGPRs, nothing special to do.
2466 if (DefRegs.empty()) {
2467 assert(Src0Regs.empty() && Src1Regs.empty());
2468 break;
2469 }
2470
2471 assert(DefRegs.size() == 2);
2472 assert(Src0Regs.size() == Src1Regs.size() &&
2473 (Src0Regs.empty() || Src0Regs.size() == 2));
2474
2475 // Depending on where the source registers came from, the generic code may
2476 // have decided to split the inputs already or not. If not, we still need to
2477 // extract the values.
2478
2479 if (Src0Regs.empty())
2480 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2481 else
2482 setRegsToType(MRI, Src0Regs, HalfTy);
2483
2484 if (Src1Regs.empty())
2485 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2486 else
2487 setRegsToType(MRI, Src1Regs, HalfTy);
2488
2489 setRegsToType(MRI, DefRegs, HalfTy);
2490
2491 auto Flags = MI.getFlags();
2492 B.buildInstr(Opc, {DefRegs[0]}, {Src0Regs[0], Src1Regs[0]}, Flags);
2493 B.buildInstr(Opc, {DefRegs[1]}, {Src0Regs[1], Src1Regs[1]}, Flags);
2494
2495 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2496 MI.eraseFromParent();
2497 return;
2498 }
2499 case AMDGPU::G_ABS: {
2500 Register SrcReg = MI.getOperand(1).getReg();
2501 const RegisterBank *SrcBank = MRI.getRegBankOrNull(SrcReg);
2502
2503 // There is no VALU abs instruction so we need to replace it with a sub and
2504 // max combination.
2505 if (SrcBank && SrcBank == &AMDGPU::VGPRRegBank) {
2506 MachineFunction *MF = MI.getMF();
2507 ApplyRegBankMapping Apply(B, *this, MRI, &AMDGPU::VGPRRegBank);
2508 LegalizerHelper Helper(*MF, Apply, B);
2509
2511 llvm_unreachable("lowerAbsToMaxNeg should have succeeded");
2512 return;
2513 }
2514 [[fallthrough]];
2515 }
2516 case AMDGPU::G_ADD:
2517 case AMDGPU::G_SUB:
2518 case AMDGPU::G_MUL:
2519 case AMDGPU::G_SHL:
2520 case AMDGPU::G_LSHR:
2521 case AMDGPU::G_ASHR:
2522 case AMDGPU::G_SMIN:
2523 case AMDGPU::G_SMAX:
2524 case AMDGPU::G_UMIN:
2525 case AMDGPU::G_UMAX: {
2526 Register DstReg = MI.getOperand(0).getReg();
2527 LLT DstTy = MRI.getType(DstReg);
2528
2529 // Special case for s_mul_u64. There is not a vector equivalent of
2530 // s_mul_u64. Hence, we have to break down s_mul_u64 into 32-bit vector
2531 // multiplications.
2532 if (!Subtarget.hasVMulU64Inst() && Opc == AMDGPU::G_MUL &&
2533 DstTy.getSizeInBits() == 64) {
2534 applyMappingSMULU64(B, OpdMapper);
2535 return;
2536 }
2537
2538 // 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
2539 // Packed 16-bit operations need to be scalarized and promoted.
2540 if (DstTy != LLT::scalar(16) && DstTy != LLT::fixed_vector(2, 16))
2541 break;
2542
2543 const RegisterBank *DstBank =
2544 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2545 if (DstBank == &AMDGPU::VGPRRegBank)
2546 break;
2547
2548 const LLT S32 = LLT::scalar(32);
2549 MachineBasicBlock *MBB = MI.getParent();
2550 MachineFunction *MF = MBB->getParent();
2551 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2552
2553 if (DstTy.isVector() && Opc == AMDGPU::G_ABS) {
2554 Register WideSrcLo, WideSrcHi;
2555
2556 std::tie(WideSrcLo, WideSrcHi) =
2557 unpackV2S16ToS32(B, MI.getOperand(1).getReg(), TargetOpcode::G_SEXT);
2558 auto Lo = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcLo});
2559 auto Hi = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcHi});
2560 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2561 MI.eraseFromParent();
2562 return;
2563 }
2564
2565 if (DstTy.isVector()) {
2566 Register WideSrc0Lo, WideSrc0Hi;
2567 Register WideSrc1Lo, WideSrc1Hi;
2568
2569 unsigned ExtendOp = getExtendOp(MI.getOpcode());
2570 std::tie(WideSrc0Lo, WideSrc0Hi)
2571 = unpackV2S16ToS32(B, MI.getOperand(1).getReg(), ExtendOp);
2572 std::tie(WideSrc1Lo, WideSrc1Hi)
2573 = unpackV2S16ToS32(B, MI.getOperand(2).getReg(), ExtendOp);
2574 auto Lo = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Lo, WideSrc1Lo});
2575 auto Hi = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Hi, WideSrc1Hi});
2576 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2577 MI.eraseFromParent();
2578 } else {
2579 LegalizerHelper Helper(*MF, ApplySALU, B);
2580
2581 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2582 llvm_unreachable("widen scalar should have succeeded");
2583
2584 // FIXME: s16 shift amounts should be legal.
2585 if (Opc == AMDGPU::G_SHL || Opc == AMDGPU::G_LSHR ||
2586 Opc == AMDGPU::G_ASHR) {
2587 B.setInsertPt(*MBB, MI.getIterator());
2588 if (Helper.widenScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2589 llvm_unreachable("widen scalar should have succeeded");
2590 }
2591 }
2592
2593 return;
2594 }
2595 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
2596 case AMDGPU::G_AMDGPU_S_MUL_U64_U32: {
2597 // This is a special case for s_mul_u64. We use
2598 // G_AMDGPU_S_MUL_I64_I32 opcode to represent an s_mul_u64 operation
2599 // where the 33 higher bits are sign-extended and
2600 // G_AMDGPU_S_MUL_U64_U32 opcode to represent an s_mul_u64 operation
2601 // where the 32 higher bits are zero-extended. In case scalar registers are
2602 // selected, both opcodes are lowered as s_mul_u64. If the vector registers
2603 // are selected, then G_AMDGPU_S_MUL_I64_I32 and
2604 // G_AMDGPU_S_MUL_U64_U32 are lowered with a vector mad instruction.
2605
2606 // Insert basic copies.
2607 applyDefaultMapping(OpdMapper);
2608
2609 Register DstReg = MI.getOperand(0).getReg();
2610 Register SrcReg0 = MI.getOperand(1).getReg();
2611 Register SrcReg1 = MI.getOperand(2).getReg();
2612 const LLT S32 = LLT::scalar(32);
2613 const LLT S64 = LLT::scalar(64);
2614 assert(MRI.getType(DstReg) == S64 && "This is a special case for s_mul_u64 "
2615 "that handles only 64-bit operands.");
2616 const RegisterBank *DstBank =
2617 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2618
2619 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2620 // with s_mul_u64 operation.
2621 if (DstBank == &AMDGPU::SGPRRegBank) {
2622 MI.setDesc(TII->get(AMDGPU::S_MUL_U64));
2623 MRI.setRegClass(DstReg, &AMDGPU::SGPR_64RegClass);
2624 MRI.setRegClass(SrcReg0, &AMDGPU::SGPR_64RegClass);
2625 MRI.setRegClass(SrcReg1, &AMDGPU::SGPR_64RegClass);
2626 return;
2627 }
2628
2629 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2630 // with a vector mad.
2631 assert(MRI.getRegBankOrNull(DstReg) == &AMDGPU::VGPRRegBank &&
2632 "The destination operand should be in vector registers.");
2633
2634 // Extract the lower subregister from the first operand.
2635 Register Op0L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2636 MRI.setRegClass(Op0L, &AMDGPU::VGPR_32RegClass);
2637 MRI.setType(Op0L, S32);
2638 B.buildTrunc(Op0L, SrcReg0);
2639
2640 // Extract the lower subregister from the second operand.
2641 Register Op1L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2642 MRI.setRegClass(Op1L, &AMDGPU::VGPR_32RegClass);
2643 MRI.setType(Op1L, S32);
2644 B.buildTrunc(Op1L, SrcReg1);
2645
2646 unsigned NewOpc = Opc == AMDGPU::G_AMDGPU_S_MUL_U64_U32
2647 ? AMDGPU::G_AMDGPU_MAD_U64_U32
2648 : AMDGPU::G_AMDGPU_MAD_I64_I32;
2649
2651 Register Zero64 = B.buildConstant(S64, 0).getReg(0);
2652 MRI.setRegClass(Zero64, &AMDGPU::VReg_64RegClass);
2653 Register CarryOut = MRI.createVirtualRegister(&AMDGPU::VReg_64RegClass);
2654 MRI.setRegClass(CarryOut, &AMDGPU::VReg_64RegClass);
2655 B.buildInstr(NewOpc, {DstReg, CarryOut}, {Op0L, Op1L, Zero64});
2656 MI.eraseFromParent();
2657 return;
2658 }
2659 case AMDGPU::G_SEXT_INREG: {
2660 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2661 if (SrcRegs.empty())
2662 break; // Nothing to repair
2663
2664 const LLT S32 = LLT::scalar(32);
2665 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
2666
2667 // Don't use LegalizerHelper's narrowScalar. It produces unwanted G_SEXTs
2668 // we would need to further expand, and doesn't let us directly set the
2669 // result registers.
2670 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2671
2672 int Amt = MI.getOperand(2).getImm();
2673 if (Amt <= 32) {
2674 // Downstream users have expectations for the high bit behavior, so freeze
2675 // incoming undefined bits.
2676 if (Amt == 32) {
2677 // The low bits are unchanged.
2678 B.buildFreeze(DstRegs[0], SrcRegs[0]);
2679 } else {
2680 auto Freeze = B.buildFreeze(S32, SrcRegs[0]);
2681 // Extend in the low bits and propagate the sign bit to the high half.
2682 B.buildSExtInReg(DstRegs[0], Freeze, Amt);
2683 }
2684
2685 B.buildAShr(DstRegs[1], DstRegs[0], B.buildConstant(S32, 31));
2686 } else {
2687 // The low bits are unchanged, and extend in the high bits.
2688 // No freeze required
2689 B.buildCopy(DstRegs[0], SrcRegs[0]);
2690 B.buildSExtInReg(DstRegs[1], DstRegs[0], Amt - 32);
2691 }
2692
2693 Register DstReg = MI.getOperand(0).getReg();
2694 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2695 MI.eraseFromParent();
2696 return;
2697 }
2698 case AMDGPU::G_CTPOP:
2699 case AMDGPU::G_BITREVERSE: {
2700 const RegisterBank *DstBank =
2701 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2702 if (DstBank == &AMDGPU::SGPRRegBank)
2703 break;
2704
2705 Register SrcReg = MI.getOperand(1).getReg();
2706 const LLT S32 = LLT::scalar(32);
2707 LLT Ty = MRI.getType(SrcReg);
2708 if (Ty == S32)
2709 break;
2710
2711 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2712
2713 MachineFunction &MF = B.getMF();
2714 LegalizerHelper Helper(MF, ApplyVALU, B);
2715
2716 if (Helper.narrowScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2717 llvm_unreachable("narrowScalar should have succeeded");
2718 return;
2719 }
2720 case AMDGPU::G_AMDGPU_FFBH_U32:
2721 case AMDGPU::G_AMDGPU_FFBL_B32:
2722 case AMDGPU::G_CTLZ_ZERO_POISON:
2723 case AMDGPU::G_CTTZ_ZERO_POISON: {
2724 const RegisterBank *DstBank =
2725 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2726 if (DstBank == &AMDGPU::SGPRRegBank)
2727 break;
2728
2729 Register SrcReg = MI.getOperand(1).getReg();
2730 const LLT S32 = LLT::scalar(32);
2731 LLT Ty = MRI.getType(SrcReg);
2732 if (Ty == S32)
2733 break;
2734
2735 // We can narrow this more efficiently than Helper can by using ffbh/ffbl
2736 // which return -1 when the input is zero:
2737 // (ctlz_zero_poison hi:lo) -> (umin (ffbh hi), (add (ffbh lo), 32))
2738 // (cttz_zero_poison hi:lo) -> (umin (add (ffbl hi), 32), (ffbl lo))
2739 // (ffbh hi:lo) -> (umin (ffbh hi), (uaddsat (ffbh lo), 32))
2740 // (ffbl hi:lo) -> (umin (uaddsat (ffbh hi), 32), (ffbh lo))
2741 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2742 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2743 unsigned NewOpc = Opc == AMDGPU::G_CTLZ_ZERO_POISON
2744 ? (unsigned)AMDGPU::G_AMDGPU_FFBH_U32
2745 : Opc == AMDGPU::G_CTTZ_ZERO_POISON
2746 ? (unsigned)AMDGPU::G_AMDGPU_FFBL_B32
2747 : Opc;
2748 unsigned Idx = NewOpc == AMDGPU::G_AMDGPU_FFBH_U32;
2749 auto X = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx]});
2750 auto Y = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx ^ 1]});
2751 unsigned AddOpc =
2752 Opc == AMDGPU::G_CTLZ_ZERO_POISON || Opc == AMDGPU::G_CTTZ_ZERO_POISON
2753 ? AMDGPU::G_ADD
2754 : AMDGPU::G_UADDSAT;
2755 Y = B.buildInstr(AddOpc, {S32}, {Y, B.buildConstant(S32, 32)});
2756 Register DstReg = MI.getOperand(0).getReg();
2757 B.buildUMin(DstReg, X, Y);
2758 MI.eraseFromParent();
2759 return;
2760 }
2761 case AMDGPU::G_SEXT:
2762 case AMDGPU::G_ZEXT:
2763 case AMDGPU::G_ANYEXT: {
2764 Register SrcReg = MI.getOperand(1).getReg();
2765 LLT SrcTy = MRI.getType(SrcReg);
2766 const bool Signed = Opc == AMDGPU::G_SEXT;
2767
2768 assert(OpdMapper.getVRegs(1).empty());
2769
2770 const RegisterBank *SrcBank =
2771 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2772
2773 Register DstReg = MI.getOperand(0).getReg();
2774 LLT DstTy = MRI.getType(DstReg);
2775 if (DstTy.isScalar() &&
2776 SrcBank != &AMDGPU::SGPRRegBank &&
2777 SrcBank != &AMDGPU::VCCRegBank &&
2778 // FIXME: Should handle any type that round to s64 when irregular
2779 // breakdowns supported.
2780 DstTy.getSizeInBits() == 64 &&
2781 SrcTy.getSizeInBits() <= 32) {
2782 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2783
2784 // Extend to 32-bit, and then extend the low half.
2785 if (Signed) {
2786 // TODO: Should really be buildSExtOrCopy
2787 B.buildSExtOrTrunc(DefRegs[0], SrcReg);
2788 } else if (Opc == AMDGPU::G_ZEXT) {
2789 B.buildZExtOrTrunc(DefRegs[0], SrcReg);
2790 } else {
2791 B.buildAnyExtOrTrunc(DefRegs[0], SrcReg);
2792 }
2793
2794 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank);
2795 MRI.setRegBank(DstReg, *SrcBank);
2796 MI.eraseFromParent();
2797 return;
2798 }
2799
2800 if (SrcTy != LLT::scalar(1))
2801 return;
2802
2803 // It is not legal to have a legalization artifact with a VCC source. Rather
2804 // than introducing a copy, insert the select we would have to select the
2805 // copy to.
2806 if (SrcBank == &AMDGPU::VCCRegBank) {
2807 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2808
2809 const RegisterBank *DstBank = &AMDGPU::VGPRRegBank;
2810
2811 unsigned DstSize = DstTy.getSizeInBits();
2812 // 64-bit select is SGPR only
2813 const bool UseSel64 = DstSize > 32 &&
2814 SrcBank->getID() == AMDGPU::SGPRRegBankID;
2815
2816 // TODO: Should s16 select be legal?
2817 LLT SelType = UseSel64 ? LLT::scalar(64) : LLT::scalar(32);
2818 auto True = B.buildConstant(SelType, Signed ? -1 : 1);
2819 auto False = B.buildConstant(SelType, 0);
2820
2821 MRI.setRegBank(True.getReg(0), *DstBank);
2822 MRI.setRegBank(False.getReg(0), *DstBank);
2823 MRI.setRegBank(DstReg, *DstBank);
2824
2825 if (DstSize > 32) {
2826 B.buildSelect(DefRegs[0], SrcReg, True, False);
2827 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank, true);
2828 } else if (DstSize < 32) {
2829 auto Sel = B.buildSelect(SelType, SrcReg, True, False);
2830 MRI.setRegBank(Sel.getReg(0), *DstBank);
2831 B.buildTrunc(DstReg, Sel);
2832 } else {
2833 B.buildSelect(DstReg, SrcReg, True, False);
2834 }
2835
2836 MI.eraseFromParent();
2837 return;
2838 }
2839
2840 break;
2841 }
2842 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
2843 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2844
2845 assert(OpdMapper.getVRegs(1).empty() && OpdMapper.getVRegs(2).empty());
2846
2847 Register DstReg = MI.getOperand(0).getReg();
2848 Register SrcReg = MI.getOperand(1).getReg();
2849
2850 const LLT S32 = LLT::scalar(32);
2851 LLT DstTy = MRI.getType(DstReg);
2852 LLT SrcTy = MRI.getType(SrcReg);
2853
2854 if (foldExtractEltToCmpSelect(B, MI, OpdMapper))
2855 return;
2856
2857 const ValueMapping &DstMapping
2858 = OpdMapper.getInstrMapping().getOperandMapping(0);
2859 const RegisterBank *DstBank = DstMapping.BreakDown[0].RegBank;
2860 const RegisterBank *SrcBank =
2861 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2862 const RegisterBank *IdxBank =
2863 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2864
2865 Register BaseIdxReg;
2866 unsigned ConstOffset;
2867 std::tie(BaseIdxReg, ConstOffset) =
2868 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(2).getReg());
2869
2870 // See if the index is an add of a constant which will be foldable by moving
2871 // the base register of the index later if this is going to be executed in a
2872 // waterfall loop. This is essentially to reassociate the add of a constant
2873 // with the readfirstlane.
2874 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2875 ConstOffset > 0 &&
2876 ConstOffset < SrcTy.getNumElements();
2877
2878 // Move the base register. We'll re-insert the add later.
2879 if (ShouldMoveIndexIntoLoop)
2880 MI.getOperand(2).setReg(BaseIdxReg);
2881
2882 // If this is a VGPR result only because the index was a VGPR result, the
2883 // actual indexing will be done on the SGPR source vector, which will
2884 // produce a scalar result. We need to copy to the VGPR result inside the
2885 // waterfall loop.
2886 const bool NeedCopyToVGPR = DstBank == &AMDGPU::VGPRRegBank &&
2887 SrcBank == &AMDGPU::SGPRRegBank;
2888 if (DstRegs.empty()) {
2889 applyDefaultMapping(OpdMapper);
2890
2892
2893 if (NeedCopyToVGPR) {
2894 // We don't want a phi for this temporary reg.
2895 Register TmpReg = MRI.createGenericVirtualRegister(DstTy);
2896 MRI.setRegBank(TmpReg, AMDGPU::SGPRRegBank);
2897 MI.getOperand(0).setReg(TmpReg);
2898 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2899
2900 // Use a v_mov_b32 here to make the exec dependency explicit.
2901 buildVCopy(B, DstReg, TmpReg);
2902 }
2903
2904 // Re-insert the constant offset add inside the waterfall loop.
2905 if (ShouldMoveIndexIntoLoop)
2906 reinsertVectorIndexAdd(B, MI, 2, ConstOffset);
2907
2908 return;
2909 }
2910
2911 assert(DstTy.getSizeInBits() == 64);
2912
2913 LLT Vec32 = LLT::fixed_vector(2 * SrcTy.getNumElements(), 32);
2914
2915 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
2916 auto One = B.buildConstant(S32, 1);
2917
2918 MachineBasicBlock::iterator MII = MI.getIterator();
2919
2920 // Split the vector index into 32-bit pieces. Prepare to move all of the
2921 // new instructions into a waterfall loop if necessary.
2922 //
2923 // Don't put the bitcast or constant in the loop.
2924 MachineInstrSpan Span(MII, &B.getMBB());
2925
2926 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
2927 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
2928 auto IdxHi = B.buildAdd(S32, IdxLo, One);
2929
2930 auto Extract0 = B.buildExtractVectorElement(DstRegs[0], CastSrc, IdxLo);
2931 auto Extract1 = B.buildExtractVectorElement(DstRegs[1], CastSrc, IdxHi);
2932
2933 MRI.setRegBank(DstReg, *DstBank);
2934 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
2935 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
2936 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
2937 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
2938
2939 SmallSet<Register, 4> OpsToWaterfall;
2940 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 2 })) {
2941 MI.eraseFromParent();
2942 return;
2943 }
2944
2945 // Remove the original instruction to avoid potentially confusing the
2946 // waterfall loop logic.
2947 B.setInstr(*Span.begin());
2948 MI.eraseFromParent();
2949 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
2950 OpsToWaterfall);
2951
2952 if (NeedCopyToVGPR) {
2953 MachineBasicBlock *LoopBB = Extract1->getParent();
2956 MRI.setRegBank(TmpReg0, AMDGPU::SGPRRegBank);
2957 MRI.setRegBank(TmpReg1, AMDGPU::SGPRRegBank);
2958
2959 Extract0->getOperand(0).setReg(TmpReg0);
2960 Extract1->getOperand(0).setReg(TmpReg1);
2961
2962 B.setInsertPt(*LoopBB, ++Extract1->getIterator());
2963
2964 buildVCopy(B, DstRegs[0], TmpReg0);
2965 buildVCopy(B, DstRegs[1], TmpReg1);
2966 }
2967
2968 if (ShouldMoveIndexIntoLoop)
2969 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
2970
2971 return;
2972 }
2973 case AMDGPU::G_INSERT_VECTOR_ELT: {
2974 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2975
2976 Register DstReg = MI.getOperand(0).getReg();
2977 LLT VecTy = MRI.getType(DstReg);
2978
2979 assert(OpdMapper.getVRegs(0).empty());
2980 assert(OpdMapper.getVRegs(3).empty());
2981
2982 if (substituteSimpleCopyRegs(OpdMapper, 1))
2983 MRI.setType(MI.getOperand(1).getReg(), VecTy);
2984
2985 if (foldInsertEltToCmpSelect(B, MI, OpdMapper))
2986 return;
2987
2988 const RegisterBank *IdxBank =
2989 OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2990
2991 Register SrcReg = MI.getOperand(1).getReg();
2992 Register InsReg = MI.getOperand(2).getReg();
2993 LLT InsTy = MRI.getType(InsReg);
2994 (void)InsTy;
2995
2996 Register BaseIdxReg;
2997 unsigned ConstOffset;
2998 std::tie(BaseIdxReg, ConstOffset) =
2999 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(3).getReg());
3000
3001 // See if the index is an add of a constant which will be foldable by moving
3002 // the base register of the index later if this is going to be executed in a
3003 // waterfall loop. This is essentially to reassociate the add of a constant
3004 // with the readfirstlane.
3005 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
3006 ConstOffset > 0 &&
3007 ConstOffset < VecTy.getNumElements();
3008
3009 // Move the base register. We'll re-insert the add later.
3010 if (ShouldMoveIndexIntoLoop)
3011 MI.getOperand(3).setReg(BaseIdxReg);
3012
3013
3014 if (InsRegs.empty()) {
3016
3017 // Re-insert the constant offset add inside the waterfall loop.
3018 if (ShouldMoveIndexIntoLoop) {
3019 reinsertVectorIndexAdd(B, MI, 3, ConstOffset);
3020 }
3021
3022 return;
3023 }
3024
3025 assert(InsTy.getSizeInBits() == 64);
3026
3027 const LLT S32 = LLT::scalar(32);
3028 LLT Vec32 = LLT::fixed_vector(2 * VecTy.getNumElements(), 32);
3029
3030 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
3031 auto One = B.buildConstant(S32, 1);
3032
3033 // Split the vector index into 32-bit pieces. Prepare to move all of the
3034 // new instructions into a waterfall loop if necessary.
3035 //
3036 // Don't put the bitcast or constant in the loop.
3038
3039 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
3040 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
3041 auto IdxHi = B.buildAdd(S32, IdxLo, One);
3042
3043 auto InsLo = B.buildInsertVectorElement(Vec32, CastSrc, InsRegs[0], IdxLo);
3044 auto InsHi = B.buildInsertVectorElement(Vec32, InsLo, InsRegs[1], IdxHi);
3045
3046 const RegisterBank *DstBank =
3047 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
3048 const RegisterBank *SrcBank =
3049 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
3050 const RegisterBank *InsSrcBank =
3051 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
3052
3053 MRI.setRegBank(InsReg, *InsSrcBank);
3054 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
3055 MRI.setRegBank(InsLo.getReg(0), *DstBank);
3056 MRI.setRegBank(InsHi.getReg(0), *DstBank);
3057 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
3058 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
3059 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
3060
3061
3062 SmallSet<Register, 4> OpsToWaterfall;
3063 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 3 })) {
3064 B.setInsertPt(B.getMBB(), MI);
3065 B.buildBitcast(DstReg, InsHi);
3066 MI.eraseFromParent();
3067 return;
3068 }
3069
3070 B.setInstr(*Span.begin());
3071 MI.eraseFromParent();
3072
3073 // Figure out the point after the waterfall loop before mangling the control
3074 // flow.
3075 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
3076 OpsToWaterfall);
3077
3078 // The insertion point is now right after the original instruction.
3079 //
3080 // Keep the bitcast to the original vector type out of the loop. Doing this
3081 // saved an extra phi we don't need inside the loop.
3082 B.buildBitcast(DstReg, InsHi);
3083
3084 // Re-insert the constant offset add inside the waterfall loop.
3085 if (ShouldMoveIndexIntoLoop)
3086 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
3087
3088 return;
3089 }
3090 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
3091 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
3092 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
3093 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
3094 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
3095 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
3096 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
3097 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
3098 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
3099 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
3100 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
3101 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
3102 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
3103 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
3104 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
3105 case AMDGPU::G_AMDGPU_BUFFER_STORE:
3106 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
3107 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
3108 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
3109 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16:
3110 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
3111 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16: {
3112 applyDefaultMapping(OpdMapper);
3113 executeInWaterfallLoop(B, MI, {1, 4});
3114 return;
3115 }
3116 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
3117 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
3118 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
3119 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
3120 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
3121 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
3122 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
3123 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
3124 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
3125 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
3126 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
3127 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
3128 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB_CLAMP_U32:
3129 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_COND_SUB_U32:
3130 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
3131 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
3132 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
3133 applyDefaultMapping(OpdMapper);
3134 executeInWaterfallLoop(B, MI, {2, 5});
3135 return;
3136 }
3137 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
3138 applyDefaultMapping(OpdMapper);
3139 executeInWaterfallLoop(B, MI, {3, 6});
3140 return;
3141 }
3142 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
3143 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
3144 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
3145 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
3146 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
3147 applyMappingSBufferLoad(B, OpdMapper);
3148 return;
3149 }
3150 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
3153 return;
3154 case AMDGPU::G_INTRINSIC:
3155 case AMDGPU::G_INTRINSIC_CONVERGENT: {
3156 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
3157 case Intrinsic::amdgcn_readlane: {
3158 substituteSimpleCopyRegs(OpdMapper, 2);
3159
3160 assert(OpdMapper.getVRegs(0).empty());
3161 assert(OpdMapper.getVRegs(3).empty());
3162
3163 // Make sure the index is an SGPR. It doesn't make sense to run this in a
3164 // waterfall loop, so assume it's a uniform value.
3165 constrainOpWithReadfirstlane(B, MI, 3); // Index
3166 return;
3167 }
3168 case Intrinsic::amdgcn_writelane: {
3169 assert(OpdMapper.getVRegs(0).empty());
3170 assert(OpdMapper.getVRegs(2).empty());
3171 assert(OpdMapper.getVRegs(3).empty());
3172
3173 substituteSimpleCopyRegs(OpdMapper, 4); // VGPR input val
3174 constrainOpWithReadfirstlane(B, MI, 2); // Source value
3175 constrainOpWithReadfirstlane(B, MI, 3); // Index
3176 return;
3177 }
3178 case Intrinsic::amdgcn_interp_p1:
3179 case Intrinsic::amdgcn_interp_p2:
3180 case Intrinsic::amdgcn_interp_mov:
3181 case Intrinsic::amdgcn_interp_p1_f16:
3182 case Intrinsic::amdgcn_interp_p2_f16:
3183 case Intrinsic::amdgcn_lds_param_load: {
3184 applyDefaultMapping(OpdMapper);
3185
3186 // Readlane for m0 value, which is always the last operand.
3187 // FIXME: Should this be a waterfall loop instead?
3188 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3189 return;
3190 }
3191 case Intrinsic::amdgcn_interp_inreg_p10:
3192 case Intrinsic::amdgcn_interp_inreg_p2:
3193 case Intrinsic::amdgcn_interp_inreg_p10_f16:
3194 case Intrinsic::amdgcn_interp_inreg_p2_f16:
3195 case Intrinsic::amdgcn_interp_p10_rtz_f16:
3196 case Intrinsic::amdgcn_interp_p2_rtz_f16:
3197 case Intrinsic::amdgcn_permlane16_swap:
3198 case Intrinsic::amdgcn_permlane32_swap:
3199 applyDefaultMapping(OpdMapper);
3200 return;
3201 case Intrinsic::amdgcn_permlane16:
3202 case Intrinsic::amdgcn_permlanex16: {
3203 // Doing a waterfall loop over these wouldn't make any sense.
3204 substituteSimpleCopyRegs(OpdMapper, 2);
3205 substituteSimpleCopyRegs(OpdMapper, 3);
3208 return;
3209 }
3210 case Intrinsic::amdgcn_permlane_bcast:
3211 case Intrinsic::amdgcn_permlane_up:
3212 case Intrinsic::amdgcn_permlane_down:
3213 case Intrinsic::amdgcn_permlane_xor:
3214 // Doing a waterfall loop over these wouldn't make any sense.
3217 return;
3218 case Intrinsic::amdgcn_permlane_idx_gen: {
3220 return;
3221 }
3222 case Intrinsic::amdgcn_sbfe:
3223 applyMappingBFE(B, OpdMapper, true);
3224 return;
3225 case Intrinsic::amdgcn_ubfe:
3226 applyMappingBFE(B, OpdMapper, false);
3227 return;
3228 case Intrinsic::amdgcn_inverse_ballot:
3229 case Intrinsic::amdgcn_s_bitreplicate:
3230 case Intrinsic::amdgcn_s_quadmask:
3231 case Intrinsic::amdgcn_s_wqm:
3232 applyDefaultMapping(OpdMapper);
3233 constrainOpWithReadfirstlane(B, MI, 2); // Mask
3234 return;
3235 case Intrinsic::amdgcn_ballot:
3236 // Use default handling and insert copy to vcc source.
3237 break;
3238 }
3239 break;
3240 }
3241 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
3242 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
3243 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
3244 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
3245 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
3246 const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3248 assert(RSrcIntrin && RSrcIntrin->IsImage);
3249 // Non-images can have complications from operands that allow both SGPR
3250 // and VGPR. For now it's too complicated to figure out the final opcode
3251 // to derive the register bank from the MCInstrDesc.
3252 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3253 return;
3254 }
3255 case AMDGPU::G_AMDGPU_BVH_INTERSECT_RAY:
3256 case AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY:
3257 case AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY: {
3258 bool IsDualOrBVH8 =
3259 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY ||
3260 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY;
3261 unsigned NumMods = IsDualOrBVH8 ? 0 : 1; // Has A16 modifier
3262 unsigned LastRegOpIdx = MI.getNumExplicitOperands() - 1 - NumMods;
3263 applyDefaultMapping(OpdMapper);
3264 executeInWaterfallLoop(B, MI, {LastRegOpIdx});
3265 return;
3266 }
3267 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
3268 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
3269 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
3270 switch (IntrID) {
3271 case Intrinsic::amdgcn_ds_ordered_add:
3272 case Intrinsic::amdgcn_ds_ordered_swap: {
3273 // This is only allowed to execute with 1 lane, so readfirstlane is safe.
3274 assert(OpdMapper.getVRegs(0).empty());
3275 substituteSimpleCopyRegs(OpdMapper, 3);
3277 return;
3278 }
3279 case Intrinsic::amdgcn_ds_gws_init:
3280 case Intrinsic::amdgcn_ds_gws_barrier:
3281 case Intrinsic::amdgcn_ds_gws_sema_br: {
3282 // Only the first lane is executes, so readfirstlane is safe.
3283 substituteSimpleCopyRegs(OpdMapper, 1);
3285 return;
3286 }
3287 case Intrinsic::amdgcn_ds_gws_sema_v:
3288 case Intrinsic::amdgcn_ds_gws_sema_p:
3289 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
3290 // Only the first lane is executes, so readfirstlane is safe.
3292 return;
3293 }
3294 case Intrinsic::amdgcn_ds_append:
3295 case Intrinsic::amdgcn_ds_consume: {
3297 return;
3298 }
3299 case Intrinsic::amdgcn_s_alloc_vgpr:
3301 return;
3302 case Intrinsic::amdgcn_s_sendmsg:
3303 case Intrinsic::amdgcn_s_sendmsghalt: {
3304 // FIXME: Should this use a waterfall loop?
3306 return;
3307 }
3308 case Intrinsic::amdgcn_s_setreg: {
3310 return;
3311 }
3312 case Intrinsic::amdgcn_s_ttracedata:
3314 return;
3315 case Intrinsic::amdgcn_raw_buffer_load_lds:
3316 case Intrinsic::amdgcn_raw_buffer_load_async_lds:
3317 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
3318 case Intrinsic::amdgcn_raw_ptr_buffer_load_async_lds: {
3319 applyDefaultMapping(OpdMapper);
3320 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3322 constrainOpWithReadfirstlane(B, MI, 5); // soffset
3323 return;
3324 }
3325 case Intrinsic::amdgcn_struct_buffer_load_lds:
3326 case Intrinsic::amdgcn_struct_buffer_load_async_lds:
3327 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds:
3328 case Intrinsic::amdgcn_struct_ptr_buffer_load_async_lds: {
3329 applyDefaultMapping(OpdMapper);
3330 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3332 constrainOpWithReadfirstlane(B, MI, 6); // soffset
3333 return;
3334 }
3335 case Intrinsic::amdgcn_cluster_load_async_to_lds_b8:
3336 case Intrinsic::amdgcn_cluster_load_async_to_lds_b32:
3337 case Intrinsic::amdgcn_cluster_load_async_to_lds_b64:
3338 case Intrinsic::amdgcn_cluster_load_async_to_lds_b128: {
3339 applyDefaultMapping(OpdMapper);
3341 return;
3342 }
3343 case Intrinsic::amdgcn_load_to_lds:
3344 case Intrinsic::amdgcn_load_async_to_lds:
3345 case Intrinsic::amdgcn_global_load_lds:
3346 case Intrinsic::amdgcn_global_load_async_lds: {
3347 applyDefaultMapping(OpdMapper);
3349 return;
3350 }
3351 case Intrinsic::amdgcn_lds_direct_load: {
3352 applyDefaultMapping(OpdMapper);
3353 // Readlane for m0 value, which is always the last operand.
3354 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3355 return;
3356 }
3357 case Intrinsic::amdgcn_exp_row:
3358 applyDefaultMapping(OpdMapper);
3360 return;
3361 case Intrinsic::amdgcn_cluster_load_b32:
3362 case Intrinsic::amdgcn_cluster_load_b64:
3363 case Intrinsic::amdgcn_cluster_load_b128: {
3364 applyDefaultMapping(OpdMapper);
3366 return;
3367 }
3368 case Intrinsic::amdgcn_s_sleep_var:
3369 assert(OpdMapper.getVRegs(1).empty());
3371 return;
3372 case Intrinsic::amdgcn_s_barrier_join:
3373 case Intrinsic::amdgcn_s_wakeup_barrier:
3375 return;
3376 case Intrinsic::amdgcn_s_barrier_init:
3377 case Intrinsic::amdgcn_s_barrier_signal_var:
3380 return;
3381 case Intrinsic::amdgcn_s_get_barrier_state:
3382 case Intrinsic::amdgcn_s_get_named_barrier_state: {
3384 return;
3385 }
3386 case Intrinsic::amdgcn_s_prefetch_data:
3387 case Intrinsic::amdgcn_s_prefetch_inst: {
3388 Register PtrReg = MI.getOperand(1).getReg();
3389 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3393 } else
3394 MI.eraseFromParent();
3395 return;
3396 }
3397 case Intrinsic::amdgcn_tensor_load_to_lds:
3398 case Intrinsic::amdgcn_tensor_store_from_lds: {
3404 return;
3405 }
3406 default: {
3407 if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3409 // Non-images can have complications from operands that allow both SGPR
3410 // and VGPR. For now it's too complicated to figure out the final opcode
3411 // to derive the register bank from the MCInstrDesc.
3412 if (RSrcIntrin->IsImage) {
3413 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3414 return;
3415 }
3416 }
3417
3418 break;
3419 }
3420 }
3421 break;
3422 }
3423 case AMDGPU::G_SI_CALL: {
3424 // Use a set to avoid extra readfirstlanes in the case where multiple
3425 // operands are the same register.
3426 SmallSet<Register, 4> SGPROperandRegs;
3427
3428 if (!collectWaterfallOperands(SGPROperandRegs, MI, MRI, {1}))
3429 break;
3430
3431 // Move all copies to physical SGPRs that are used by the call instruction
3432 // into the loop block. Start searching for these copies until the
3433 // ADJCALLSTACKUP.
3434 unsigned FrameSetupOpcode = AMDGPU::ADJCALLSTACKUP;
3435 unsigned FrameDestroyOpcode = AMDGPU::ADJCALLSTACKDOWN;
3436
3437 // Move all non-copies before the copies, so that a complete range can be
3438 // moved into the waterfall loop.
3439 SmallVector<MachineInstr *, 4> NonCopyInstrs;
3440 // Count of NonCopyInstrs found until the current LastCopy.
3441 unsigned NonCopyInstrsLen = 0;
3443 MachineBasicBlock::iterator LastCopy = Start;
3444 MachineBasicBlock *MBB = MI.getParent();
3445 const SIMachineFunctionInfo *Info =
3446 MBB->getParent()->getInfo<SIMachineFunctionInfo>();
3447 while (Start->getOpcode() != FrameSetupOpcode) {
3448 --Start;
3449 bool IsCopy = false;
3450 if (Start->getOpcode() == AMDGPU::COPY) {
3451 auto &Dst = Start->getOperand(0);
3452 if (Dst.isReg()) {
3453 Register Reg = Dst.getReg();
3454 if (Reg.isPhysical() && MI.readsRegister(Reg, TRI)) {
3455 IsCopy = true;
3456 } else {
3457 // Also move the copy from the scratch rsrc descriptor into the loop
3458 // to allow it to be optimized away.
3459 auto &Src = Start->getOperand(1);
3460 if (Src.isReg()) {
3461 Reg = Src.getReg();
3462 IsCopy = Info->getScratchRSrcReg() == Reg;
3463 }
3464 }
3465 }
3466 }
3467
3468 if (IsCopy) {
3469 LastCopy = Start;
3470 NonCopyInstrsLen = NonCopyInstrs.size();
3471 } else {
3472 NonCopyInstrs.push_back(&*Start);
3473 }
3474 }
3475 NonCopyInstrs.resize(NonCopyInstrsLen);
3476
3477 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3478 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3479 }
3480 Start = LastCopy;
3481
3482 // Do the same for copies after the loop
3483 NonCopyInstrs.clear();
3484 NonCopyInstrsLen = 0;
3486 LastCopy = End;
3487 while (End->getOpcode() != FrameDestroyOpcode) {
3488 ++End;
3489 bool IsCopy = false;
3490 if (End->getOpcode() == AMDGPU::COPY) {
3491 auto &Src = End->getOperand(1);
3492 if (Src.isReg()) {
3493 Register Reg = Src.getReg();
3494 IsCopy = Reg.isPhysical() && MI.modifiesRegister(Reg, TRI);
3495 }
3496 }
3497
3498 if (IsCopy) {
3499 LastCopy = End;
3500 NonCopyInstrsLen = NonCopyInstrs.size();
3501 } else {
3502 NonCopyInstrs.push_back(&*End);
3503 }
3504 }
3505 NonCopyInstrs.resize(NonCopyInstrsLen);
3506
3507 End = LastCopy;
3508 ++LastCopy;
3509 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3510 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3511 }
3512
3513 ++End;
3514 B.setInsertPt(B.getMBB(), Start);
3515 executeInWaterfallLoop(B, make_range(Start, End), SGPROperandRegs);
3516 break;
3517 }
3518 case AMDGPU::G_AMDGPU_FLAT_LOAD_MONITOR:
3519 case AMDGPU::G_AMDGPU_GLOBAL_LOAD_MONITOR:
3520 case AMDGPU::G_LOAD:
3521 case AMDGPU::G_ZEXTLOAD:
3522 case AMDGPU::G_SEXTLOAD: {
3523 if (applyMappingLoad(B, OpdMapper, MI))
3524 return;
3525 break;
3526 }
3527 case AMDGPU::G_DYN_STACKALLOC:
3528 applyMappingDynStackAlloc(B, OpdMapper, MI);
3529 return;
3530 case AMDGPU::G_STACKRESTORE: {
3531 applyDefaultMapping(OpdMapper);
3533 return;
3534 }
3535 case AMDGPU::G_SBFX:
3536 applyMappingBFE(B, OpdMapper, /*Signed*/ true);
3537 return;
3538 case AMDGPU::G_UBFX:
3539 applyMappingBFE(B, OpdMapper, /*Signed*/ false);
3540 return;
3541 case AMDGPU::G_AMDGPU_MAD_U64_U32:
3542 case AMDGPU::G_AMDGPU_MAD_I64_I32:
3543 applyMappingMAD_64_32(B, OpdMapper);
3544 return;
3545 case AMDGPU::G_PREFETCH: {
3546 if (!Subtarget.hasSafeSmemPrefetch() && !Subtarget.hasVmemPrefInsts()) {
3547 MI.eraseFromParent();
3548 return;
3549 }
3550 Register PtrReg = MI.getOperand(0).getReg();
3551 unsigned PtrBank = getRegBankID(PtrReg, MRI, AMDGPU::SGPRRegBankID);
3552 if (PtrBank == AMDGPU::VGPRRegBankID &&
3553 (!Subtarget.hasVmemPrefInsts() || !MI.getOperand(3).getImm())) {
3554 // Cannot do I$ prefetch with divergent pointer.
3555 MI.eraseFromParent();
3556 return;
3557 }
3558 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3561 (!Subtarget.hasSafeSmemPrefetch() &&
3563 !MI.getOperand(3).getImm() /* I$ prefetch */))) {
3564 MI.eraseFromParent();
3565 return;
3566 }
3567 applyDefaultMapping(OpdMapper);
3568 return;
3569 }
3570 default:
3571 break;
3572 }
3573
3574 return applyDefaultMapping(OpdMapper);
3575}
3576
3577// vgpr, sgpr -> vgpr
3578// vgpr, agpr -> vgpr
3579// agpr, agpr -> agpr
3580// agpr, sgpr -> vgpr
3581static unsigned regBankUnion(unsigned RB0, unsigned RB1) {
3582 if (RB0 == AMDGPU::InvalidRegBankID)
3583 return RB1;
3584 if (RB1 == AMDGPU::InvalidRegBankID)
3585 return RB0;
3586
3587 if (RB0 == AMDGPU::SGPRRegBankID && RB1 == AMDGPU::SGPRRegBankID)
3588 return AMDGPU::SGPRRegBankID;
3589
3590 if (RB0 == AMDGPU::AGPRRegBankID && RB1 == AMDGPU::AGPRRegBankID)
3591 return AMDGPU::AGPRRegBankID;
3592
3593 return AMDGPU::VGPRRegBankID;
3594}
3595
3596static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1) {
3597 if (RB0 == AMDGPU::InvalidRegBankID)
3598 return RB1;
3599 if (RB1 == AMDGPU::InvalidRegBankID)
3600 return RB0;
3601
3602 // vcc, vcc -> vcc
3603 // vcc, sgpr -> vcc
3604 // vcc, vgpr -> vcc
3605 if (RB0 == AMDGPU::VCCRegBankID || RB1 == AMDGPU::VCCRegBankID)
3606 return AMDGPU::VCCRegBankID;
3607
3608 // vcc, vgpr -> vgpr
3609 return regBankUnion(RB0, RB1);
3610}
3611
3613 const MachineInstr &MI) const {
3614 unsigned RegBank = AMDGPU::InvalidRegBankID;
3615
3616 for (const MachineOperand &MO : MI.operands()) {
3617 if (!MO.isReg())
3618 continue;
3619 Register Reg = MO.getReg();
3620 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3621 RegBank = regBankUnion(RegBank, Bank->getID());
3622 if (RegBank == AMDGPU::VGPRRegBankID)
3623 break;
3624 }
3625 }
3626
3627 return RegBank;
3628}
3629
3631 const MachineFunction &MF = *MI.getMF();
3632 const MachineRegisterInfo &MRI = MF.getRegInfo();
3633 for (const MachineOperand &MO : MI.operands()) {
3634 if (!MO.isReg())
3635 continue;
3636 Register Reg = MO.getReg();
3637 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3638 if (Bank->getID() != AMDGPU::SGPRRegBankID)
3639 return false;
3640 }
3641 }
3642 return true;
3643}
3644
3647 const MachineFunction &MF = *MI.getMF();
3648 const MachineRegisterInfo &MRI = MF.getRegInfo();
3649 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3650
3651 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3652 const MachineOperand &SrcOp = MI.getOperand(i);
3653 if (!SrcOp.isReg())
3654 continue;
3655
3656 unsigned Size = getSizeInBits(SrcOp.getReg(), MRI, *TRI);
3657 OpdsMapping[i] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3658 }
3659 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3660 MI.getNumOperands());
3661}
3662
3665 const MachineFunction &MF = *MI.getMF();
3666 const MachineRegisterInfo &MRI = MF.getRegInfo();
3667 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3668
3669 // Even though we technically could use SGPRs, this would require knowledge of
3670 // the constant bus restriction. Force all sources to VGPR (except for VCC).
3671 //
3672 // TODO: Unary ops are trivially OK, so accept SGPRs?
3673 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3674 const MachineOperand &Src = MI.getOperand(i);
3675 if (!Src.isReg())
3676 continue;
3677
3678 unsigned Size = getSizeInBits(Src.getReg(), MRI, *TRI);
3679 unsigned BankID = Size == 1 ? AMDGPU::VCCRegBankID : AMDGPU::VGPRRegBankID;
3680 OpdsMapping[i] = AMDGPU::getValueMapping(BankID, Size);
3681 }
3682
3683 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3684 MI.getNumOperands());
3685}
3686
3689 const MachineFunction &MF = *MI.getMF();
3690 const MachineRegisterInfo &MRI = MF.getRegInfo();
3691 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3692
3693 for (unsigned I = 0, E = MI.getNumOperands(); I != E; ++I) {
3694 const MachineOperand &Op = MI.getOperand(I);
3695 if (!Op.isReg())
3696 continue;
3697
3698 unsigned Size = getSizeInBits(Op.getReg(), MRI, *TRI);
3699 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3700 }
3701
3702 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3703 MI.getNumOperands());
3704}
3705
3708 const MachineInstr &MI,
3709 int RsrcIdx) const {
3710 // The reported argument index is relative to the IR intrinsic call arguments,
3711 // so we need to shift by the number of defs and the intrinsic ID.
3712 RsrcIdx += MI.getNumExplicitDefs() + 1;
3713
3714 const int NumOps = MI.getNumOperands();
3716
3717 // TODO: Should packed/unpacked D16 difference be reported here as part of
3718 // the value mapping?
3719 for (int I = 0; I != NumOps; ++I) {
3720 if (!MI.getOperand(I).isReg())
3721 continue;
3722
3723 Register OpReg = MI.getOperand(I).getReg();
3724 // We replace some dead address operands with $noreg
3725 if (!OpReg)
3726 continue;
3727
3728 unsigned Size = getSizeInBits(OpReg, MRI, *TRI);
3729
3730 // FIXME: Probably need a new intrinsic register bank searchable table to
3731 // handle arbitrary intrinsics easily.
3732 //
3733 // If this has a sampler, it immediately follows rsrc.
3734 const bool MustBeSGPR = I == RsrcIdx || I == RsrcIdx + 1;
3735
3736 if (MustBeSGPR) {
3737 // If this must be an SGPR, so we must report whatever it is as legal.
3738 unsigned NewBank = getRegBankID(OpReg, MRI, AMDGPU::SGPRRegBankID);
3739 OpdsMapping[I] = AMDGPU::getValueMapping(NewBank, Size);
3740 } else {
3741 // Some operands must be VGPR, and these are easy to copy to.
3742 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3743 }
3744 }
3745
3746 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping), NumOps);
3747}
3748
3749/// Return the mapping for a pointer argument.
3752 Register PtrReg) const {
3753 LLT PtrTy = MRI.getType(PtrReg);
3754 unsigned Size = PtrTy.getSizeInBits();
3755 if (Subtarget.useFlatForGlobal() ||
3757 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3758
3759 // If we're using MUBUF instructions for global memory, an SGPR base register
3760 // is possible. Otherwise this needs to be a VGPR.
3761 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3762 return AMDGPU::getValueMapping(PtrBank->getID(), Size);
3763}
3764
3767
3768 const MachineFunction &MF = *MI.getMF();
3769 const MachineRegisterInfo &MRI = MF.getRegInfo();
3771 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3772 Register PtrReg = MI.getOperand(1).getReg();
3773 LLT PtrTy = MRI.getType(PtrReg);
3774 unsigned AS = PtrTy.getAddressSpace();
3775 unsigned PtrSize = PtrTy.getSizeInBits();
3776
3777 const ValueMapping *ValMapping;
3778 const ValueMapping *PtrMapping;
3779
3780 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3781
3782 if (PtrBank == &AMDGPU::SGPRRegBank && AMDGPU::isFlatGlobalAddrSpace(AS)) {
3783 if (isScalarLoadLegal(MI)) {
3784 // We have a uniform instruction so we want to use an SMRD load
3785 ValMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3786 PtrMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize);
3787 } else {
3788 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3789
3790 // If we're using MUBUF instructions for global memory, an SGPR base
3791 // register is possible. Otherwise this needs to be a VGPR.
3792 unsigned PtrBankID = Subtarget.useFlatForGlobal() ?
3793 AMDGPU::VGPRRegBankID : AMDGPU::SGPRRegBankID;
3794
3795 PtrMapping = AMDGPU::getValueMapping(PtrBankID, PtrSize);
3796 }
3797 } else {
3798 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3799 PtrMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize);
3800 }
3801
3802 OpdsMapping[0] = ValMapping;
3803 OpdsMapping[1] = PtrMapping;
3805 1, 1, getOperandsMapping(OpdsMapping), MI.getNumOperands());
3806 return Mapping;
3807
3808 // FIXME: Do we want to add a mapping for FLAT load, or should we just
3809 // handle that during instruction selection?
3810}
3811
3812unsigned
3814 const MachineRegisterInfo &MRI,
3815 unsigned Default) const {
3816 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3817 return Bank ? Bank->getID() : Default;
3818}
3819
3822 const MachineRegisterInfo &MRI,
3823 const TargetRegisterInfo &TRI) const {
3824 // Lie and claim anything is legal, even though this needs to be an SGPR
3825 // applyMapping will have to deal with it as a waterfall loop.
3826 unsigned Bank = getRegBankID(Reg, MRI, AMDGPU::SGPRRegBankID);
3827 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3828 return AMDGPU::getValueMapping(Bank, Size);
3829}
3830
3833 const MachineRegisterInfo &MRI,
3834 const TargetRegisterInfo &TRI) const {
3835 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3836 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3837}
3838
3841 const MachineRegisterInfo &MRI,
3842 const TargetRegisterInfo &TRI) const {
3843 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3844 return AMDGPU::getValueMapping(AMDGPU::AGPRRegBankID, Size);
3845}
3846
3847///
3848/// This function must return a legal mapping, because
3849/// AMDGPURegisterBankInfo::getInstrAlternativeMappings() is not called
3850/// in RegBankSelect::Mode::Fast. Any mapping that would cause a
3851/// VGPR to SGPR generated is illegal.
3852///
3853// Operands that must be SGPRs must accept potentially divergent VGPRs as
3854// legal. These will be dealt with in applyMappingImpl.
3855//
3858 const MachineFunction &MF = *MI.getMF();
3859 const MachineRegisterInfo &MRI = MF.getRegInfo();
3860
3861 if (MI.isCopy() || MI.getOpcode() == AMDGPU::G_FREEZE) {
3862 Register DstReg = MI.getOperand(0).getReg();
3863 Register SrcReg = MI.getOperand(1).getReg();
3864
3865 // The default logic bothers to analyze impossible alternative mappings. We
3866 // want the most straightforward mapping, so just directly handle this.
3867 const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI);
3868 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
3869
3870 // For COPY between a physical reg and an s1, there is no type associated so
3871 // we need to take the virtual register's type as a hint on how to interpret
3872 // s1 values.
3873 unsigned Size;
3874 if (!SrcReg.isVirtual() && !DstBank &&
3875 MRI.getType(DstReg) == LLT::scalar(1)) {
3876 DstBank = &AMDGPU::VCCRegBank;
3877 Size = 1;
3878 } else if (!DstReg.isVirtual() && MRI.getType(SrcReg) == LLT::scalar(1)) {
3879 DstBank = &AMDGPU::VCCRegBank;
3880 Size = 1;
3881 } else {
3882 Size = getSizeInBits(DstReg, MRI, *TRI);
3883 }
3884
3885 if (!DstBank)
3886 DstBank = SrcBank;
3887 else if (!SrcBank)
3888 SrcBank = DstBank;
3889
3890 if (MI.getOpcode() != AMDGPU::G_FREEZE &&
3891 cannotCopy(*DstBank, *SrcBank, TypeSize::getFixed(Size)))
3893
3894 const ValueMapping &ValMap = getValueMapping(0, Size, *DstBank);
3895 unsigned OpdsMappingSize = MI.isCopy() ? 1 : 2;
3896 SmallVector<const ValueMapping *, 1> OpdsMapping(OpdsMappingSize);
3897 OpdsMapping[0] = &ValMap;
3898 if (MI.getOpcode() == AMDGPU::G_FREEZE)
3899 OpdsMapping[1] = &ValMap;
3900
3901 return getInstructionMapping(
3902 1, /*Cost*/ 1,
3903 /*OperandsMapping*/ getOperandsMapping(OpdsMapping), OpdsMappingSize);
3904 }
3905
3906 if (MI.isRegSequence()) {
3907 // If any input is a VGPR, the result must be a VGPR. The default handling
3908 // assumes any copy between banks is legal.
3909 unsigned BankID = AMDGPU::SGPRRegBankID;
3910
3911 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
3912 auto OpBank = getRegBankID(MI.getOperand(I).getReg(), MRI);
3913 // It doesn't make sense to use vcc or scc banks here, so just ignore
3914 // them.
3915 if (OpBank != AMDGPU::SGPRRegBankID) {
3916 BankID = AMDGPU::VGPRRegBankID;
3917 break;
3918 }
3919 }
3920 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3921
3922 const ValueMapping &ValMap = getValueMapping(0, Size, getRegBank(BankID));
3923 return getInstructionMapping(
3924 1, /*Cost*/ 1,
3925 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3926 }
3927
3928 // The default handling is broken and doesn't handle illegal SGPR->VGPR copies
3929 // properly.
3930 //
3931 // TODO: There are additional exec masking dependencies to analyze.
3932 if (auto *PHI = dyn_cast<GPhi>(&MI)) {
3933 unsigned ResultBank = AMDGPU::InvalidRegBankID;
3934 Register DstReg = PHI->getReg(0);
3935
3936 // Sometimes the result may have already been assigned a bank.
3937 if (const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI))
3938 ResultBank = DstBank->getID();
3939
3940 for (unsigned I = 0; I < PHI->getNumIncomingValues(); ++I) {
3941 Register Reg = PHI->getIncomingValue(I);
3942 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3943
3944 // FIXME: Assuming VGPR for any undetermined inputs.
3945 if (!Bank || Bank->getID() == AMDGPU::VGPRRegBankID) {
3946 ResultBank = AMDGPU::VGPRRegBankID;
3947 break;
3948 }
3949
3950 // FIXME: Need to promote SGPR case to s32
3951 unsigned OpBank = Bank->getID();
3952 ResultBank = regBankBoolUnion(ResultBank, OpBank);
3953 }
3954
3955 assert(ResultBank != AMDGPU::InvalidRegBankID);
3956
3957 unsigned Size = MRI.getType(DstReg).getSizeInBits();
3958
3959 const ValueMapping &ValMap =
3960 getValueMapping(0, Size, getRegBank(ResultBank));
3961 return getInstructionMapping(
3962 1, /*Cost*/ 1,
3963 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3964 }
3965
3967 if (Mapping.isValid())
3968 return Mapping;
3969
3970 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3971
3972 switch (MI.getOpcode()) {
3973 default:
3975
3976 case AMDGPU::G_AND:
3977 case AMDGPU::G_OR:
3978 case AMDGPU::G_XOR:
3979 case AMDGPU::G_MUL: {
3980 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3981 if (Size == 1) {
3982 const RegisterBank *DstBank
3983 = getRegBank(MI.getOperand(0).getReg(), MRI, *TRI);
3984
3985 unsigned TargetBankID = AMDGPU::InvalidRegBankID;
3986 unsigned BankLHS = AMDGPU::InvalidRegBankID;
3987 unsigned BankRHS = AMDGPU::InvalidRegBankID;
3988 if (DstBank) {
3989 TargetBankID = DstBank->getID();
3990 if (DstBank == &AMDGPU::VCCRegBank) {
3991 TargetBankID = AMDGPU::VCCRegBankID;
3992 BankLHS = AMDGPU::VCCRegBankID;
3993 BankRHS = AMDGPU::VCCRegBankID;
3994 } else {
3995 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3996 AMDGPU::SGPRRegBankID);
3997 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3998 AMDGPU::SGPRRegBankID);
3999 }
4000 } else {
4001 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
4002 AMDGPU::VCCRegBankID);
4003 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
4004 AMDGPU::VCCRegBankID);
4005
4006 // Both inputs should be true booleans to produce a boolean result.
4007 if (BankLHS == AMDGPU::VGPRRegBankID || BankRHS == AMDGPU::VGPRRegBankID) {
4008 TargetBankID = AMDGPU::VGPRRegBankID;
4009 } else if (BankLHS == AMDGPU::VCCRegBankID || BankRHS == AMDGPU::VCCRegBankID) {
4010 TargetBankID = AMDGPU::VCCRegBankID;
4011 BankLHS = AMDGPU::VCCRegBankID;
4012 BankRHS = AMDGPU::VCCRegBankID;
4013 } else if (BankLHS == AMDGPU::SGPRRegBankID && BankRHS == AMDGPU::SGPRRegBankID) {
4014 TargetBankID = AMDGPU::SGPRRegBankID;
4015 }
4016 }
4017
4018 OpdsMapping[0] = AMDGPU::getValueMapping(TargetBankID, Size);
4019 OpdsMapping[1] = AMDGPU::getValueMapping(BankLHS, Size);
4020 OpdsMapping[2] = AMDGPU::getValueMapping(BankRHS, Size);
4021 break;
4022 }
4023
4024 if (Size == 64) {
4025
4026 if (isSALUMapping(MI)) {
4027 OpdsMapping[0] = getValueMappingSGPR64Only(AMDGPU::SGPRRegBankID, Size);
4028 OpdsMapping[1] = OpdsMapping[2] = OpdsMapping[0];
4029 } else {
4030 if (MI.getOpcode() == AMDGPU::G_MUL && Subtarget.hasVMulU64Inst())
4031 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4032 else
4033 OpdsMapping[0] =
4034 getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size);
4035 unsigned Bank1 = getRegBankID(MI.getOperand(1).getReg(), MRI /*, DefaultBankID*/);
4036 OpdsMapping[1] = AMDGPU::getValueMapping(Bank1, Size);
4037
4038 unsigned Bank2 = getRegBankID(MI.getOperand(2).getReg(), MRI /*, DefaultBankID*/);
4039 OpdsMapping[2] = AMDGPU::getValueMapping(Bank2, Size);
4040 }
4041
4042 break;
4043 }
4044
4045 [[fallthrough]];
4046 }
4047 case AMDGPU::G_PTR_ADD:
4048 case AMDGPU::G_PTRMASK:
4049 case AMDGPU::G_ADD:
4050 case AMDGPU::G_SUB:
4051 case AMDGPU::G_SHL:
4052 case AMDGPU::G_LSHR:
4053 case AMDGPU::G_ASHR:
4054 case AMDGPU::G_UADDO:
4055 case AMDGPU::G_USUBO:
4056 case AMDGPU::G_UADDE:
4057 case AMDGPU::G_SADDE:
4058 case AMDGPU::G_USUBE:
4059 case AMDGPU::G_SSUBE:
4060 case AMDGPU::G_ABS:
4061 case AMDGPU::G_SHUFFLE_VECTOR:
4062 case AMDGPU::G_SBFX:
4063 case AMDGPU::G_UBFX:
4064 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
4065 case AMDGPU::G_AMDGPU_S_MUL_U64_U32:
4066 if (isSALUMapping(MI))
4067 return getDefaultMappingSOP(MI);
4068 return getDefaultMappingVOP(MI);
4069 case AMDGPU::G_SMIN:
4070 case AMDGPU::G_SMAX:
4071 case AMDGPU::G_UMIN:
4072 case AMDGPU::G_UMAX:
4073 if (isSALUMapping(MI)) {
4074 // There are no scalar 64-bit min and max, use vector instruction instead.
4075 if (MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 64 &&
4076 Subtarget.hasMinMaxI64Insts())
4077 return getDefaultMappingVOP(MI);
4078 return getDefaultMappingSOP(MI);
4079 }
4080 return getDefaultMappingVOP(MI);
4081 case AMDGPU::G_FADD:
4082 case AMDGPU::G_FSUB:
4083 case AMDGPU::G_FMUL:
4084 case AMDGPU::G_FMA:
4085 case AMDGPU::G_FFLOOR:
4086 case AMDGPU::G_FCEIL:
4087 case AMDGPU::G_INTRINSIC_ROUNDEVEN:
4088 case AMDGPU::G_FMINNUM:
4089 case AMDGPU::G_FMAXNUM:
4090 case AMDGPU::G_FMINIMUMNUM:
4091 case AMDGPU::G_FMAXIMUMNUM:
4092 case AMDGPU::G_INTRINSIC_TRUNC:
4093 case AMDGPU::G_STRICT_FADD:
4094 case AMDGPU::G_STRICT_FSUB:
4095 case AMDGPU::G_STRICT_FMUL:
4096 case AMDGPU::G_STRICT_FMA: {
4097 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
4098 unsigned Size = Ty.getSizeInBits();
4099 if (Subtarget.hasSALUFloatInsts() && Ty.isScalar() &&
4100 (Size == 32 || Size == 16) && isSALUMapping(MI))
4101 return getDefaultMappingSOP(MI);
4102 return getDefaultMappingVOP(MI);
4103 }
4104 case AMDGPU::G_FMINIMUM:
4105 case AMDGPU::G_FMAXIMUM: {
4106 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
4107 unsigned Size = Ty.getSizeInBits();
4108 if (Subtarget.hasSALUMinimumMaximumInsts() && Ty.isScalar() &&
4109 (Size == 32 || Size == 16) && isSALUMapping(MI))
4110 return getDefaultMappingSOP(MI);
4111 return getDefaultMappingVOP(MI);
4112 }
4113 case AMDGPU::G_FPTOSI:
4114 case AMDGPU::G_FPTOUI:
4115 case AMDGPU::G_FPTOSI_SAT:
4116 case AMDGPU::G_FPTOUI_SAT:
4117 case AMDGPU::G_SITOFP:
4118 case AMDGPU::G_UITOFP: {
4119 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4120 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4121 if (Subtarget.hasSALUFloatInsts() && SizeDst == 32 && SizeSrc == 32 &&
4123 return getDefaultMappingSOP(MI);
4124 return getDefaultMappingVOP(MI);
4125 }
4126 case AMDGPU::G_FPTRUNC:
4127 case AMDGPU::G_FPEXT: {
4128 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4129 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4130 if (Subtarget.hasSALUFloatInsts() && SizeDst != 64 && SizeSrc != 64 &&
4132 return getDefaultMappingSOP(MI);
4133 return getDefaultMappingVOP(MI);
4134 }
4135 case AMDGPU::G_FSQRT:
4136 case AMDGPU::G_FEXP2:
4137 case AMDGPU::G_FLOG2: {
4138 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4139 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4141 return getDefaultMappingSOP(MI);
4142 return getDefaultMappingVOP(MI);
4143 }
4144 case AMDGPU::G_SADDSAT: // FIXME: Could lower sat ops for SALU
4145 case AMDGPU::G_SSUBSAT:
4146 case AMDGPU::G_UADDSAT:
4147 case AMDGPU::G_USUBSAT:
4148 case AMDGPU::G_FMAD:
4149 case AMDGPU::G_FLDEXP:
4150 case AMDGPU::G_FMINNUM_IEEE:
4151 case AMDGPU::G_FMAXNUM_IEEE:
4152 case AMDGPU::G_FCANONICALIZE:
4153 case AMDGPU::G_STRICT_FLDEXP:
4154 case AMDGPU::G_BSWAP: // TODO: Somehow expand for scalar?
4155 case AMDGPU::G_FSHR: // TODO: Expand for scalar
4156 case AMDGPU::G_AMDGPU_FMIN_LEGACY:
4157 case AMDGPU::G_AMDGPU_FMAX_LEGACY:
4158 case AMDGPU::G_AMDGPU_RCP_IFLAG:
4159 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE0:
4160 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE1:
4161 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE2:
4162 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE3:
4163 case AMDGPU::G_AMDGPU_CVT_PK_I16_I32:
4164 case AMDGPU::G_AMDGPU_SMED3:
4165 case AMDGPU::G_AMDGPU_FMED3:
4166 return getDefaultMappingVOP(MI);
4167 case AMDGPU::G_UMULH:
4168 case AMDGPU::G_SMULH: {
4169 if (Subtarget.hasScalarMulHiInsts() && isSALUMapping(MI))
4170 return getDefaultMappingSOP(MI);
4171 return getDefaultMappingVOP(MI);
4172 }
4173 case AMDGPU::G_AMDGPU_MAD_U64_U32:
4174 case AMDGPU::G_AMDGPU_MAD_I64_I32: {
4175 // Three possible mappings:
4176 //
4177 // - Default SOP
4178 // - Default VOP
4179 // - Scalar multiply: src0 and src1 are SGPRs, the rest is VOP.
4180 //
4181 // This allows instruction selection to keep the multiplication part of the
4182 // instruction on the SALU.
4183 bool AllSalu = true;
4184 bool MulSalu = true;
4185 for (unsigned i = 0; i < 5; ++i) {
4186 Register Reg = MI.getOperand(i).getReg();
4187 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
4188 if (Bank->getID() != AMDGPU::SGPRRegBankID) {
4189 AllSalu = false;
4190 if (i == 2 || i == 3) {
4191 MulSalu = false;
4192 break;
4193 }
4194 }
4195 }
4196 }
4197
4198 if (AllSalu)
4199 return getDefaultMappingSOP(MI);
4200
4201 // If the multiply-add is full-rate in VALU, use that even if the
4202 // multiplication part is scalar. Accumulating separately on the VALU would
4203 // take two instructions.
4204 if (!MulSalu || Subtarget.hasFullRate64Ops())
4205 return getDefaultMappingVOP(MI);
4206
4207 // Keep the multiplication on the SALU, then accumulate on the VALU.
4208 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4209 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4210 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4211 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4212 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4213 break;
4214 }
4215 case AMDGPU::G_IMPLICIT_DEF: {
4216 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4217 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4218 break;
4219 }
4220 case AMDGPU::G_FCONSTANT:
4221 case AMDGPU::G_CONSTANT:
4222 case AMDGPU::G_GLOBAL_VALUE:
4223 case AMDGPU::G_FRAME_INDEX:
4224 case AMDGPU::G_BLOCK_ADDR:
4225 case AMDGPU::G_READSTEADYCOUNTER:
4226 case AMDGPU::G_READCYCLECOUNTER: {
4227 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4228 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4229 break;
4230 }
4231 case AMDGPU::G_DYN_STACKALLOC: {
4232 // Result is always uniform, and a wave reduction is needed for the source.
4233 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4234 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4235 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, 32);
4236 break;
4237 }
4238 case AMDGPU::G_AMDGPU_WAVE_ADDRESS: {
4239 // This case is weird because we expect a physical register in the source,
4240 // but need to set a bank anyway.
4241 //
4242 // TODO: We could select the result to SGPR or VGPR
4243 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4244 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4245 break;
4246 }
4247 case AMDGPU::G_INSERT: {
4248 unsigned BankID = getMappingType(MRI, MI);
4249 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4250 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4251 unsigned EltSize = getSizeInBits(MI.getOperand(2).getReg(), MRI, *TRI);
4252 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4253 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4254 OpdsMapping[2] = AMDGPU::getValueMapping(BankID, EltSize);
4255 OpdsMapping[3] = nullptr;
4256 break;
4257 }
4258 case AMDGPU::G_EXTRACT: {
4259 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4260 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4261 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4262 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4263 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4264 OpdsMapping[2] = nullptr;
4265 break;
4266 }
4267 case AMDGPU::G_BUILD_VECTOR:
4268 case AMDGPU::G_BUILD_VECTOR_TRUNC: {
4269 LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
4270 if (DstTy == LLT::fixed_vector(2, 16)) {
4271 unsigned DstSize = DstTy.getSizeInBits();
4272 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4273 unsigned Src0BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4274 unsigned Src1BankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4275 unsigned DstBankID = regBankUnion(Src0BankID, Src1BankID);
4276
4277 OpdsMapping[0] = AMDGPU::getValueMapping(DstBankID, DstSize);
4278 OpdsMapping[1] = AMDGPU::getValueMapping(Src0BankID, SrcSize);
4279 OpdsMapping[2] = AMDGPU::getValueMapping(Src1BankID, SrcSize);
4280 break;
4281 }
4282
4283 [[fallthrough]];
4284 }
4285 case AMDGPU::G_MERGE_VALUES:
4286 case AMDGPU::G_CONCAT_VECTORS: {
4287 unsigned Bank = getMappingType(MRI, MI);
4288 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4289 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4290
4291 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4292 // Op1 and Dst should use the same register bank.
4293 for (unsigned i = 1, e = MI.getNumOperands(); i != e; ++i)
4294 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, SrcSize);
4295 break;
4296 }
4297 case AMDGPU::G_BITREVERSE:
4298 case AMDGPU::G_BITCAST:
4299 case AMDGPU::G_INTTOPTR:
4300 case AMDGPU::G_PTRTOINT:
4301 case AMDGPU::G_FABS:
4302 case AMDGPU::G_FNEG: {
4303 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4304 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4305 OpdsMapping[0] = OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4306 break;
4307 }
4308 case AMDGPU::G_AMDGPU_FFBH_U32:
4309 case AMDGPU::G_AMDGPU_FFBL_B32:
4310 case AMDGPU::G_CTLZ_ZERO_POISON:
4311 case AMDGPU::G_CTTZ_ZERO_POISON: {
4312 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4313 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4314 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4315 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(BankID, Size);
4316 break;
4317 }
4318 case AMDGPU::G_CTPOP: {
4319 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4320 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4321 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4322
4323 // This should really be getValueMappingSGPR64Only, but allowing the generic
4324 // code to handle the register split just makes using LegalizerHelper more
4325 // difficult.
4326 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4327 break;
4328 }
4329 case AMDGPU::G_TRUNC: {
4330 Register Dst = MI.getOperand(0).getReg();
4331 Register Src = MI.getOperand(1).getReg();
4332 unsigned Bank = getRegBankID(Src, MRI);
4333 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4334 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4335 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4336 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, SrcSize);
4337 break;
4338 }
4339 case AMDGPU::G_ZEXT:
4340 case AMDGPU::G_SEXT:
4341 case AMDGPU::G_ANYEXT:
4342 case AMDGPU::G_SEXT_INREG: {
4343 Register Dst = MI.getOperand(0).getReg();
4344 Register Src = MI.getOperand(1).getReg();
4345 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4346 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4347
4348 unsigned DstBank;
4349 const RegisterBank *SrcBank = getRegBank(Src, MRI, *TRI);
4350 assert(SrcBank);
4351 switch (SrcBank->getID()) {
4352 case AMDGPU::SGPRRegBankID:
4353 DstBank = AMDGPU::SGPRRegBankID;
4354 break;
4355 default:
4356 DstBank = AMDGPU::VGPRRegBankID;
4357 break;
4358 }
4359
4360 // Scalar extend can use 64-bit BFE, but VGPRs require extending to
4361 // 32-bits, and then to 64.
4362 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(DstBank, DstSize);
4363 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(SrcBank->getID(),
4364 SrcSize);
4365 break;
4366 }
4367 case AMDGPU::G_IS_FPCLASS: {
4368 Register SrcReg = MI.getOperand(1).getReg();
4369 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4370 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4371 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4372 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4373 break;
4374 }
4375 case AMDGPU::G_STORE: {
4376 assert(MI.getOperand(0).isReg());
4377 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4378
4379 // FIXME: We need to specify a different reg bank once scalar stores are
4380 // supported.
4381 const ValueMapping *ValMapping =
4382 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4383 OpdsMapping[0] = ValMapping;
4384 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
4385 break;
4386 }
4387 case AMDGPU::G_ICMP:
4388 case AMDGPU::G_FCMP: {
4389 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4390
4391 // See if the result register has already been constrained to vcc, which may
4392 // happen due to control flow intrinsic lowering.
4393 unsigned DstBank = getRegBankID(MI.getOperand(0).getReg(), MRI,
4394 AMDGPU::SGPRRegBankID);
4395 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4396 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI);
4397
4398 auto canUseSCCICMP = [&]() {
4399 auto Pred =
4400 static_cast<CmpInst::Predicate>(MI.getOperand(1).getPredicate());
4401 return Size == 32 ||
4402 (Size == 64 &&
4403 (Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) &&
4404 Subtarget.hasScalarCompareEq64());
4405 };
4406 auto canUseSCCFCMP = [&]() {
4407 return Subtarget.hasSALUFloatInsts() && (Size == 32 || Size == 16);
4408 };
4409
4410 bool isICMP = MI.getOpcode() == AMDGPU::G_ICMP;
4411 bool CanUseSCC = DstBank == AMDGPU::SGPRRegBankID &&
4412 Op2Bank == AMDGPU::SGPRRegBankID &&
4413 Op3Bank == AMDGPU::SGPRRegBankID &&
4414 (isICMP ? canUseSCCICMP() : canUseSCCFCMP());
4415
4416 DstBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
4417 unsigned SrcBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4418
4419 // TODO: Use 32-bit for scalar output size.
4420 // SCC results will need to be copied to a 32-bit SGPR virtual register.
4421 const unsigned ResultSize = 1;
4422
4423 OpdsMapping[0] = AMDGPU::getValueMapping(DstBank, ResultSize);
4424 OpdsMapping[1] = nullptr; // Predicate Operand.
4425 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, Size);
4426 OpdsMapping[3] = AMDGPU::getValueMapping(SrcBank, Size);
4427 break;
4428 }
4429 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
4430 // VGPR index can be used for waterfall when indexing a SGPR vector.
4431 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4432 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4433 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4434 unsigned IdxSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4435 unsigned IdxBank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4436 unsigned OutputBankID = regBankUnion(SrcBankID, IdxBank);
4437
4438 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(OutputBankID, DstSize);
4439 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, SrcSize);
4440
4441 // The index can be either if the source vector is VGPR.
4442 OpdsMapping[2] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4443 break;
4444 }
4445 case AMDGPU::G_INSERT_VECTOR_ELT: {
4446 unsigned OutputBankID = isSALUMapping(MI) ?
4447 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4448
4449 unsigned VecSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4450 unsigned InsertSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4451 unsigned IdxSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4452 unsigned InsertEltBankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4453 unsigned IdxBankID = getRegBankID(MI.getOperand(3).getReg(), MRI);
4454
4455 OpdsMapping[0] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4456 OpdsMapping[1] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4457
4458 // This is a weird case, because we need to break down the mapping based on
4459 // the register bank of a different operand.
4460 if (InsertSize == 64 && OutputBankID == AMDGPU::VGPRRegBankID) {
4461 OpdsMapping[2] = AMDGPU::getValueMappingSplit64(InsertEltBankID,
4462 InsertSize);
4463 } else {
4464 assert(InsertSize == 32 || InsertSize == 64);
4465 OpdsMapping[2] = AMDGPU::getValueMapping(InsertEltBankID, InsertSize);
4466 }
4467
4468 // The index can be either if the source vector is VGPR.
4469 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBankID, IdxSize);
4470 break;
4471 }
4472 case AMDGPU::G_UNMERGE_VALUES: {
4473 unsigned Bank = getMappingType(MRI, MI);
4474
4475 // Op1 and Dst should use the same register bank.
4476 // FIXME: Shouldn't this be the default? Why do we need to handle this?
4477 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
4478 unsigned Size = getSizeInBits(MI.getOperand(i).getReg(), MRI, *TRI);
4479 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, Size);
4480 }
4481 break;
4482 }
4483 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
4484 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
4485 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
4486 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
4487 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
4488 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
4489 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
4490 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
4491 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
4492 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
4493 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
4494 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
4495 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
4496 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
4497 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
4498 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
4499 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16:
4500 case AMDGPU::G_AMDGPU_BUFFER_STORE:
4501 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
4502 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
4503 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
4504 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16: {
4505 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4506
4507 // rsrc
4508 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4509
4510 // vindex
4511 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4512
4513 // voffset
4514 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4515
4516 // soffset
4517 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4518
4519 // Any remaining operands are immediates and were correctly null
4520 // initialized.
4521 break;
4522 }
4523 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
4524 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
4525 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
4526 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
4527 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
4528 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
4529 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
4530 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
4531 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
4532 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
4533 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
4534 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
4535 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB_CLAMP_U32:
4536 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_COND_SUB_U32:
4537 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
4538 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
4539 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
4540 // vdata_out
4541 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4542
4543 // vdata_in
4544 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4545
4546 // rsrc
4547 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4548
4549 // vindex
4550 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4551
4552 // voffset
4553 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4554
4555 // soffset
4556 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4557
4558 // Any remaining operands are immediates and were correctly null
4559 // initialized.
4560 break;
4561 }
4562 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
4563 // vdata_out
4564 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4565
4566 // vdata_in
4567 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4568
4569 // cmp
4570 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4571
4572 // rsrc
4573 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4574
4575 // vindex
4576 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4577
4578 // voffset
4579 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4580
4581 // soffset
4582 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
4583
4584 // Any remaining operands are immediates and were correctly null
4585 // initialized.
4586 break;
4587 }
4588 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
4589 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
4590 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
4591 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
4592 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
4593 // Lie and claim everything is legal, even though some need to be
4594 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
4595 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4596 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4597
4598 // We need to convert this to a MUBUF if either the resource of offset is
4599 // VGPR.
4600 unsigned RSrcBank = OpdsMapping[1]->BreakDown[0].RegBank->getID();
4601 unsigned OffsetBank = OpdsMapping[2]->BreakDown[0].RegBank->getID();
4602 unsigned ResultBank = regBankUnion(RSrcBank, OffsetBank);
4603
4604 unsigned Size0 = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4605 OpdsMapping[0] = AMDGPU::getValueMapping(ResultBank, Size0);
4606 break;
4607 }
4608 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
4609 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4610 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4611 break;
4612 case AMDGPU::G_AMDGPU_SPONENTRY: {
4613 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4614 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4615 break;
4616 }
4617 case AMDGPU::G_INTRINSIC:
4618 case AMDGPU::G_INTRINSIC_CONVERGENT: {
4619 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
4620 default:
4622 case Intrinsic::amdgcn_div_fmas:
4623 case Intrinsic::amdgcn_div_fixup:
4624 case Intrinsic::amdgcn_trig_preop:
4625 case Intrinsic::amdgcn_sin:
4626 case Intrinsic::amdgcn_cos:
4627 case Intrinsic::amdgcn_log_clamp:
4628 case Intrinsic::amdgcn_rcp_legacy:
4629 case Intrinsic::amdgcn_rsq_legacy:
4630 case Intrinsic::amdgcn_rsq_clamp:
4631 case Intrinsic::amdgcn_tanh:
4632 case Intrinsic::amdgcn_fmul_legacy:
4633 case Intrinsic::amdgcn_fma_legacy:
4634 case Intrinsic::amdgcn_frexp_mant:
4635 case Intrinsic::amdgcn_frexp_exp:
4636 case Intrinsic::amdgcn_fract:
4637 case Intrinsic::amdgcn_cvt_pknorm_i16:
4638 case Intrinsic::amdgcn_cvt_pknorm_u16:
4639 case Intrinsic::amdgcn_cvt_pk_i16:
4640 case Intrinsic::amdgcn_cvt_pk_u16:
4641 case Intrinsic::amdgcn_cvt_sr_pk_f16_f32:
4642 case Intrinsic::amdgcn_cvt_sr_pk_bf16_f32:
4643 case Intrinsic::amdgcn_cvt_pk_f16_fp8:
4644 case Intrinsic::amdgcn_cvt_pk_f16_bf8:
4645 case Intrinsic::amdgcn_cvt_pk_fp8_f16:
4646 case Intrinsic::amdgcn_cvt_pk_bf8_f16:
4647 case Intrinsic::amdgcn_cvt_sr_fp8_f16:
4648 case Intrinsic::amdgcn_cvt_sr_bf8_f16:
4649 case Intrinsic::amdgcn_cvt_scale_pk8_f16_fp8:
4650 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_fp8:
4651 case Intrinsic::amdgcn_cvt_scale_pk8_f16_bf8:
4652 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_bf8:
4653 case Intrinsic::amdgcn_cvt_scale_pk8_f16_fp4:
4654 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_fp4:
4655 case Intrinsic::amdgcn_cvt_scale_pk8_f32_fp8:
4656 case Intrinsic::amdgcn_cvt_scale_pk8_f32_bf8:
4657 case Intrinsic::amdgcn_cvt_scale_pk8_f32_fp4:
4658 case Intrinsic::amdgcn_cvt_scale_pk16_f16_fp6:
4659 case Intrinsic::amdgcn_cvt_scale_pk16_bf16_fp6:
4660 case Intrinsic::amdgcn_cvt_scale_pk16_f16_bf6:
4661 case Intrinsic::amdgcn_cvt_scale_pk16_bf16_bf6:
4662 case Intrinsic::amdgcn_cvt_scale_pk16_f32_fp6:
4663 case Intrinsic::amdgcn_cvt_scale_pk16_f32_bf6:
4664 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_bf16:
4665 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_bf16:
4666 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_f16:
4667 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_f16:
4668 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_f32:
4669 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_f32:
4670 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_f32:
4671 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_f16:
4672 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_bf16:
4673 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_f32:
4674 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_f32:
4675 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_f16:
4676 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_f16:
4677 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_bf16:
4678 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_bf16:
4679 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_bf16:
4680 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_bf16:
4681 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_f16:
4682 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_f16:
4683 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_f32:
4684 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_f32:
4685 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_f32:
4686 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_f16:
4687 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_bf16:
4688 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_f32:
4689 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_f32:
4690 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_f16:
4691 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_f16:
4692 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_bf16:
4693 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_bf16:
4694 case Intrinsic::amdgcn_sat_pk4_i4_i8:
4695 case Intrinsic::amdgcn_sat_pk4_u4_u8:
4696 case Intrinsic::amdgcn_fmed3:
4697 case Intrinsic::amdgcn_cubeid:
4698 case Intrinsic::amdgcn_cubema:
4699 case Intrinsic::amdgcn_cubesc:
4700 case Intrinsic::amdgcn_cubetc:
4701 case Intrinsic::amdgcn_sffbh:
4702 case Intrinsic::amdgcn_fmad_ftz:
4703 case Intrinsic::amdgcn_mbcnt_lo:
4704 case Intrinsic::amdgcn_mbcnt_hi:
4705 case Intrinsic::amdgcn_mul_u24:
4706 case Intrinsic::amdgcn_mul_i24:
4707 case Intrinsic::amdgcn_mulhi_u24:
4708 case Intrinsic::amdgcn_mulhi_i24:
4709 case Intrinsic::amdgcn_lerp:
4710 case Intrinsic::amdgcn_sad_u8:
4711 case Intrinsic::amdgcn_msad_u8:
4712 case Intrinsic::amdgcn_sad_hi_u8:
4713 case Intrinsic::amdgcn_sad_u16:
4714 case Intrinsic::amdgcn_qsad_pk_u16_u8:
4715 case Intrinsic::amdgcn_mqsad_pk_u16_u8:
4716 case Intrinsic::amdgcn_mqsad_u32_u8:
4717 case Intrinsic::amdgcn_cvt_pk_u8_f32:
4718 case Intrinsic::amdgcn_alignbyte:
4719 case Intrinsic::amdgcn_perm:
4720 case Intrinsic::amdgcn_prng_b32:
4721 case Intrinsic::amdgcn_fdot2:
4722 case Intrinsic::amdgcn_sdot2:
4723 case Intrinsic::amdgcn_udot2:
4724 case Intrinsic::amdgcn_sdot4:
4725 case Intrinsic::amdgcn_udot4:
4726 case Intrinsic::amdgcn_sdot8:
4727 case Intrinsic::amdgcn_udot8:
4728 case Intrinsic::amdgcn_fdot2_bf16_bf16:
4729 case Intrinsic::amdgcn_fdot2_f16_f16:
4730 case Intrinsic::amdgcn_fdot2_f32_bf16:
4731 case Intrinsic::amdgcn_fdot2c_f32_bf16:
4732 case Intrinsic::amdgcn_sudot4:
4733 case Intrinsic::amdgcn_sudot8:
4734 case Intrinsic::amdgcn_dot4_f32_fp8_bf8:
4735 case Intrinsic::amdgcn_dot4_f32_bf8_fp8:
4736 case Intrinsic::amdgcn_dot4_f32_fp8_fp8:
4737 case Intrinsic::amdgcn_dot4_f32_bf8_bf8:
4738 case Intrinsic::amdgcn_cvt_f32_fp8:
4739 case Intrinsic::amdgcn_cvt_f32_fp8_e5m3:
4740 case Intrinsic::amdgcn_cvt_f32_bf8:
4741 case Intrinsic::amdgcn_cvt_off_f32_i4:
4742 case Intrinsic::amdgcn_cvt_pk_f32_fp8:
4743 case Intrinsic::amdgcn_cvt_pk_f32_bf8:
4744 case Intrinsic::amdgcn_cvt_pk_fp8_f32:
4745 case Intrinsic::amdgcn_cvt_pk_fp8_f32_e5m3:
4746 case Intrinsic::amdgcn_cvt_pk_bf8_f32:
4747 case Intrinsic::amdgcn_cvt_sr_fp8_f32:
4748 case Intrinsic::amdgcn_cvt_sr_fp8_f32_e5m3:
4749 case Intrinsic::amdgcn_cvt_sr_bf8_f32:
4750 case Intrinsic::amdgcn_cvt_sr_bf16_f32:
4751 case Intrinsic::amdgcn_cvt_sr_f16_f32:
4752 case Intrinsic::amdgcn_cvt_f16_fp8:
4753 case Intrinsic::amdgcn_cvt_f16_bf8:
4754 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_f16:
4755 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_f16:
4756 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_bf16:
4757 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_bf16:
4758 case Intrinsic::amdgcn_cvt_scalef32_f16_fp8:
4759 case Intrinsic::amdgcn_cvt_scalef32_f16_bf8:
4760 case Intrinsic::amdgcn_cvt_scalef32_f32_fp8:
4761 case Intrinsic::amdgcn_cvt_scalef32_f32_bf8:
4762 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f32:
4763 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f32:
4764 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp8:
4765 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_bf8:
4766 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f16:
4767 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_bf16:
4768 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f16:
4769 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_bf16:
4770 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp4:
4771 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f32:
4772 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp4:
4773 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp4:
4774 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_fp6:
4775 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_bf6:
4776 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_bf6:
4777 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_bf6:
4778 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_fp6:
4779 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_fp6:
4780 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_bf8:
4781 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_bf8:
4782 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp8:
4783 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp8:
4784 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f16:
4785 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_bf16:
4786 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f16:
4787 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_bf16:
4788 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f32:
4789 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_bf16:
4790 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f16:
4791 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f32:
4792 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_bf16:
4793 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f16:
4794 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f32:
4795 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_bf16:
4796 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f16:
4797 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f32:
4798 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_bf16:
4799 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f16:
4800 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f32:
4801 case Intrinsic::amdgcn_ashr_pk_i8_i32:
4802 case Intrinsic::amdgcn_ashr_pk_u8_i32:
4803 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_fp6_f32:
4804 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_bf6_f32:
4805 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16:
4806 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16:
4807 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied:
4808 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied:
4809 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf16:
4810 case Intrinsic::amdgcn_wmma_f32_16x16x16_f16:
4811 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu4:
4812 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu8:
4813 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_fp8:
4814 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_bf8:
4815 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_fp8:
4816 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_bf8:
4817 case Intrinsic::amdgcn_wmma_i32_16x16x32_iu4:
4818 case Intrinsic::amdgcn_swmmac_f32_16x16x32_f16:
4819 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf16:
4820 case Intrinsic::amdgcn_swmmac_f16_16x16x32_f16:
4821 case Intrinsic::amdgcn_swmmac_bf16_16x16x32_bf16:
4822 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu8:
4823 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu4:
4824 case Intrinsic::amdgcn_swmmac_i32_16x16x64_iu4:
4825 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_fp8:
4826 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_bf8:
4827 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_fp8:
4828 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_bf8:
4829 case Intrinsic::amdgcn_wmma_f32_16x16x4_f32:
4830 case Intrinsic::amdgcn_wmma_f32_16x16x32_bf16:
4831 case Intrinsic::amdgcn_wmma_f32_16x16x32_f16:
4832 case Intrinsic::amdgcn_wmma_f16_16x16x32_f16:
4833 case Intrinsic::amdgcn_wmma_bf16_16x16x32_bf16:
4834 case Intrinsic::amdgcn_wmma_bf16f32_16x16x32_bf16:
4835 case Intrinsic::amdgcn_wmma_f32_16x16x64_fp8_fp8:
4836 case Intrinsic::amdgcn_wmma_f32_16x16x64_fp8_bf8:
4837 case Intrinsic::amdgcn_wmma_f32_16x16x64_bf8_fp8:
4838 case Intrinsic::amdgcn_wmma_f32_16x16x64_bf8_bf8:
4839 case Intrinsic::amdgcn_wmma_f16_16x16x64_fp8_fp8:
4840 case Intrinsic::amdgcn_wmma_f16_16x16x64_fp8_bf8:
4841 case Intrinsic::amdgcn_wmma_f16_16x16x64_bf8_fp8:
4842 case Intrinsic::amdgcn_wmma_f16_16x16x64_bf8_bf8:
4843 case Intrinsic::amdgcn_wmma_f16_16x16x128_fp8_fp8:
4844 case Intrinsic::amdgcn_wmma_f16_16x16x128_fp8_bf8:
4845 case Intrinsic::amdgcn_wmma_f16_16x16x128_bf8_fp8:
4846 case Intrinsic::amdgcn_wmma_f16_16x16x128_bf8_bf8:
4847 case Intrinsic::amdgcn_wmma_f32_16x16x128_fp8_fp8:
4848 case Intrinsic::amdgcn_wmma_f32_16x16x128_fp8_bf8:
4849 case Intrinsic::amdgcn_wmma_f32_16x16x128_bf8_fp8:
4850 case Intrinsic::amdgcn_wmma_f32_16x16x128_bf8_bf8:
4851 case Intrinsic::amdgcn_wmma_i32_16x16x64_iu8:
4852 case Intrinsic::amdgcn_wmma_f32_16x16x128_f8f6f4:
4853 case Intrinsic::amdgcn_wmma_scale_f32_16x16x128_f8f6f4:
4854 case Intrinsic::amdgcn_wmma_scale16_f32_16x16x128_f8f6f4:
4855 case Intrinsic::amdgcn_wmma_f32_32x16x128_f4:
4856 case Intrinsic::amdgcn_wmma_scale_f32_32x16x128_f4:
4857 case Intrinsic::amdgcn_wmma_scale16_f32_32x16x128_f4:
4858 case Intrinsic::amdgcn_swmmac_f16_16x16x64_f16:
4859 case Intrinsic::amdgcn_swmmac_bf16_16x16x64_bf16:
4860 case Intrinsic::amdgcn_swmmac_f32_16x16x64_bf16:
4861 case Intrinsic::amdgcn_swmmac_bf16f32_16x16x64_bf16:
4862 case Intrinsic::amdgcn_swmmac_f32_16x16x64_f16:
4863 case Intrinsic::amdgcn_swmmac_f32_16x16x128_fp8_fp8:
4864 case Intrinsic::amdgcn_swmmac_f32_16x16x128_fp8_bf8:
4865 case Intrinsic::amdgcn_swmmac_f32_16x16x128_bf8_fp8:
4866 case Intrinsic::amdgcn_swmmac_f32_16x16x128_bf8_bf8:
4867 case Intrinsic::amdgcn_swmmac_f16_16x16x128_fp8_fp8:
4868 case Intrinsic::amdgcn_swmmac_f16_16x16x128_fp8_bf8:
4869 case Intrinsic::amdgcn_swmmac_f16_16x16x128_bf8_fp8:
4870 case Intrinsic::amdgcn_swmmac_f16_16x16x128_bf8_bf8:
4871 case Intrinsic::amdgcn_swmmac_i32_16x16x128_iu8:
4872 case Intrinsic::amdgcn_perm_pk16_b4_u4:
4873 case Intrinsic::amdgcn_perm_pk16_b6_u4:
4874 case Intrinsic::amdgcn_perm_pk16_b8_u4:
4875 case Intrinsic::amdgcn_add_max_i32:
4876 case Intrinsic::amdgcn_add_max_u32:
4877 case Intrinsic::amdgcn_add_min_i32:
4878 case Intrinsic::amdgcn_add_min_u32:
4879 case Intrinsic::amdgcn_pk_add_max_i16:
4880 case Intrinsic::amdgcn_pk_add_max_u16:
4881 case Intrinsic::amdgcn_pk_add_min_i16:
4882 case Intrinsic::amdgcn_pk_add_min_u16:
4883 return getDefaultMappingVOP(MI);
4884 case Intrinsic::amdgcn_log:
4885 case Intrinsic::amdgcn_exp2:
4886 case Intrinsic::amdgcn_rcp:
4887 case Intrinsic::amdgcn_rsq:
4888 case Intrinsic::amdgcn_sqrt: {
4889 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4890 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4892 return getDefaultMappingSOP(MI);
4893 return getDefaultMappingVOP(MI);
4894 }
4895 case Intrinsic::amdgcn_sbfe:
4896 case Intrinsic::amdgcn_ubfe:
4897 if (isSALUMapping(MI))
4898 return getDefaultMappingSOP(MI);
4899 return getDefaultMappingVOP(MI);
4900 case Intrinsic::amdgcn_ds_swizzle:
4901 case Intrinsic::amdgcn_ds_permute:
4902 case Intrinsic::amdgcn_ds_bpermute:
4903 case Intrinsic::amdgcn_update_dpp:
4904 case Intrinsic::amdgcn_mov_dpp8:
4905 case Intrinsic::amdgcn_mov_dpp:
4906 case Intrinsic::amdgcn_strict_wwm:
4907 case Intrinsic::amdgcn_wwm:
4908 case Intrinsic::amdgcn_strict_wqm:
4909 case Intrinsic::amdgcn_wqm:
4910 case Intrinsic::amdgcn_softwqm:
4911 case Intrinsic::amdgcn_set_inactive:
4912 case Intrinsic::amdgcn_set_inactive_chain_arg:
4913 case Intrinsic::amdgcn_permlane64:
4914 case Intrinsic::amdgcn_ds_bpermute_fi_b32:
4916 case Intrinsic::amdgcn_cvt_pkrtz:
4917 if (Subtarget.hasSALUFloatInsts() && isSALUMapping(MI))
4918 return getDefaultMappingSOP(MI);
4919 return getDefaultMappingVOP(MI);
4920 case Intrinsic::amdgcn_kernarg_segment_ptr:
4921 case Intrinsic::amdgcn_s_getpc:
4922 case Intrinsic::amdgcn_groupstaticsize:
4923 case Intrinsic::amdgcn_reloc_constant:
4924 case Intrinsic::returnaddress: {
4925 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4926 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4927 break;
4928 }
4929 case Intrinsic::amdgcn_wqm_vote: {
4930 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4931 OpdsMapping[0] = OpdsMapping[2]
4932 = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size);
4933 break;
4934 }
4935 case Intrinsic::amdgcn_ps_live: {
4936 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4937 break;
4938 }
4939 case Intrinsic::amdgcn_div_scale: {
4940 unsigned Dst0Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4941 unsigned Dst1Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4942 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Dst0Size);
4943 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Dst1Size);
4944
4945 unsigned SrcSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4946 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4947 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4948 break;
4949 }
4950 case Intrinsic::amdgcn_class: {
4951 Register Src0Reg = MI.getOperand(2).getReg();
4952 Register Src1Reg = MI.getOperand(3).getReg();
4953 unsigned Src0Size = MRI.getType(Src0Reg).getSizeInBits();
4954 unsigned Src1Size = MRI.getType(Src1Reg).getSizeInBits();
4955 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4956 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4957 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src0Size);
4958 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src1Size);
4959 break;
4960 }
4961 case Intrinsic::amdgcn_icmp:
4962 case Intrinsic::amdgcn_fcmp: {
4963 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4964 // This is not VCCRegBank because this is not used in boolean contexts.
4965 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4966 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4967 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4968 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4969 break;
4970 }
4971 case Intrinsic::amdgcn_readlane: {
4972 // This must be an SGPR, but accept a VGPR.
4973 Register IdxReg = MI.getOperand(3).getReg();
4974 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4975 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4976 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4977 [[fallthrough]];
4978 }
4979 case Intrinsic::amdgcn_readfirstlane: {
4980 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4981 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4982 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4983 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4984 break;
4985 }
4986 case Intrinsic::amdgcn_writelane: {
4987 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4988 Register SrcReg = MI.getOperand(2).getReg();
4989 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4990 unsigned SrcBank = getRegBankID(SrcReg, MRI, AMDGPU::SGPRRegBankID);
4991 Register IdxReg = MI.getOperand(3).getReg();
4992 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4993 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4994 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4995
4996 // These 2 must be SGPRs, but accept VGPRs. Readfirstlane will be inserted
4997 // to legalize.
4998 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, SrcSize);
4999 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
5000 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
5001 break;
5002 }
5003 case Intrinsic::amdgcn_if_break: {
5004 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5005 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5006 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5007 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5008 break;
5009 }
5010 case Intrinsic::amdgcn_permlane16:
5011 case Intrinsic::amdgcn_permlanex16: {
5012 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5013 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5014 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5015 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5016 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5017 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5018 break;
5019 }
5020 case Intrinsic::amdgcn_permlane_bcast:
5021 case Intrinsic::amdgcn_permlane_up:
5022 case Intrinsic::amdgcn_permlane_down:
5023 case Intrinsic::amdgcn_permlane_xor: {
5024 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5025 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5026 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5027 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5028 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5029 break;
5030 }
5031 case Intrinsic::amdgcn_permlane_idx_gen: {
5032 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5033 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5034 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5035 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5036 break;
5037 }
5038 case Intrinsic::amdgcn_permlane16_var:
5039 case Intrinsic::amdgcn_permlanex16_var: {
5040 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5041 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5042 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5043 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5044 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5045 break;
5046 }
5047 case Intrinsic::amdgcn_mfma_f32_4x4x1f32:
5048 case Intrinsic::amdgcn_mfma_f32_4x4x4f16:
5049 case Intrinsic::amdgcn_mfma_i32_4x4x4i8:
5050 case Intrinsic::amdgcn_mfma_f32_4x4x2bf16:
5051 case Intrinsic::amdgcn_mfma_f32_16x16x1f32:
5052 case Intrinsic::amdgcn_mfma_f32_16x16x4f32:
5053 case Intrinsic::amdgcn_mfma_f32_16x16x4f16:
5054 case Intrinsic::amdgcn_mfma_f32_16x16x16f16:
5055 case Intrinsic::amdgcn_mfma_i32_16x16x4i8:
5056 case Intrinsic::amdgcn_mfma_i32_16x16x16i8:
5057 case Intrinsic::amdgcn_mfma_f32_16x16x2bf16:
5058 case Intrinsic::amdgcn_mfma_f32_16x16x8bf16:
5059 case Intrinsic::amdgcn_mfma_f32_32x32x1f32:
5060 case Intrinsic::amdgcn_mfma_f32_32x32x2f32:
5061 case Intrinsic::amdgcn_mfma_f32_32x32x4f16:
5062 case Intrinsic::amdgcn_mfma_f32_32x32x8f16:
5063 case Intrinsic::amdgcn_mfma_i32_32x32x4i8:
5064 case Intrinsic::amdgcn_mfma_i32_32x32x8i8:
5065 case Intrinsic::amdgcn_mfma_f32_32x32x2bf16:
5066 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16:
5067 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16_1k:
5068 case Intrinsic::amdgcn_mfma_f32_16x16x4bf16_1k:
5069 case Intrinsic::amdgcn_mfma_f32_4x4x4bf16_1k:
5070 case Intrinsic::amdgcn_mfma_f32_32x32x8bf16_1k:
5071 case Intrinsic::amdgcn_mfma_f32_16x16x16bf16_1k:
5072 case Intrinsic::amdgcn_mfma_f64_16x16x4f64:
5073 case Intrinsic::amdgcn_mfma_f64_4x4x4f64:
5074 case Intrinsic::amdgcn_mfma_i32_16x16x32_i8:
5075 case Intrinsic::amdgcn_mfma_i32_32x32x16_i8:
5076 case Intrinsic::amdgcn_mfma_f32_16x16x8_xf32:
5077 case Intrinsic::amdgcn_mfma_f32_32x32x4_xf32:
5078 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_bf8:
5079 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_fp8:
5080 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_bf8:
5081 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_fp8:
5082 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_bf8:
5083 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_fp8:
5084 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_bf8:
5085 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_fp8:
5086 case Intrinsic::amdgcn_mfma_f32_16x16x32_f16:
5087 case Intrinsic::amdgcn_mfma_f32_32x32x16_f16:
5088 case Intrinsic::amdgcn_mfma_i32_16x16x64_i8:
5089 case Intrinsic::amdgcn_mfma_i32_32x32x32_i8:
5090 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf16: {
5091 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5092 unsigned MinNumRegsRequired = DstSize / 32;
5093
5094 // Default for MAI intrinsics.
5095 // srcC can also be an immediate which can be folded later.
5096 // FIXME: Should we eventually add an alternative mapping with AGPR src
5097 // for srcA/srcB?
5098 //
5099 // vdst, srcA, srcB, srcC
5101
5102 bool UseAGPRForm = !Subtarget.hasGFX90AInsts() ||
5103 Info->selectAGPRFormMFMA(MinNumRegsRequired);
5104
5105 OpdsMapping[0] =
5106 UseAGPRForm ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
5107 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5108 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5109 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5110 OpdsMapping[4] =
5111 UseAGPRForm ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5112 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5113 break;
5114 }
5115 case Intrinsic::amdgcn_mfma_scale_f32_16x16x128_f8f6f4:
5116 case Intrinsic::amdgcn_mfma_scale_f32_32x32x64_f8f6f4: {
5117 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5118 unsigned MinNumRegsRequired = DstSize / 32;
5119
5121 bool UseAGPRForm = Info->selectAGPRFormMFMA(MinNumRegsRequired);
5122
5123 OpdsMapping[0] =
5124 UseAGPRForm ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
5125 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5126
5127 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5128 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5129 OpdsMapping[4] =
5130 UseAGPRForm ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5131 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5132
5133 OpdsMapping[8] = getVGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5134 OpdsMapping[10] = getVGPROpMapping(MI.getOperand(10).getReg(), MRI, *TRI);
5135 break;
5136 }
5137 case Intrinsic::amdgcn_smfmac_f32_16x16x32_f16:
5138 case Intrinsic::amdgcn_smfmac_f32_32x32x16_f16:
5139 case Intrinsic::amdgcn_smfmac_f32_16x16x32_bf16:
5140 case Intrinsic::amdgcn_smfmac_f32_32x32x16_bf16:
5141 case Intrinsic::amdgcn_smfmac_i32_16x16x64_i8:
5142 case Intrinsic::amdgcn_smfmac_i32_32x32x32_i8:
5143 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_bf8:
5144 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_fp8:
5145 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_bf8:
5146 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_fp8:
5147 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_bf8:
5148 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_fp8:
5149 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_bf8:
5150 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_fp8:
5151 case Intrinsic::amdgcn_smfmac_f32_16x16x64_f16:
5152 case Intrinsic::amdgcn_smfmac_f32_32x32x32_f16:
5153 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf16:
5154 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf16:
5155 case Intrinsic::amdgcn_smfmac_i32_16x16x128_i8:
5156 case Intrinsic::amdgcn_smfmac_i32_32x32x64_i8:
5157 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_bf8:
5158 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_fp8:
5159 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_bf8:
5160 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_fp8:
5161 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_bf8:
5162 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_fp8:
5163 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_bf8:
5164 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_fp8: {
5165 Register DstReg = MI.getOperand(0).getReg();
5166 unsigned DstSize = MRI.getType(DstReg).getSizeInBits();
5167 unsigned MinNumRegsRequired = DstSize / 32;
5169 bool UseAGPRForm = Info->selectAGPRFormMFMA(MinNumRegsRequired);
5170
5171 // vdst, srcA, srcB, srcC, idx
5172 OpdsMapping[0] = UseAGPRForm ? getAGPROpMapping(DstReg, MRI, *TRI)
5173 : getVGPROpMapping(DstReg, MRI, *TRI);
5174
5175 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5176 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5177 OpdsMapping[4] =
5178 UseAGPRForm ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5179 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5180 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5181 break;
5182 }
5183 case Intrinsic::amdgcn_interp_p1:
5184 case Intrinsic::amdgcn_interp_p2:
5185 case Intrinsic::amdgcn_interp_mov:
5186 case Intrinsic::amdgcn_interp_p1_f16:
5187 case Intrinsic::amdgcn_interp_p2_f16:
5188 case Intrinsic::amdgcn_lds_param_load: {
5189 const int M0Idx = MI.getNumOperands() - 1;
5190 Register M0Reg = MI.getOperand(M0Idx).getReg();
5191 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5192 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5193
5194 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5195 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5196 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5197
5198 // Must be SGPR, but we must take whatever the original bank is and fix it
5199 // later.
5200 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5201 break;
5202 }
5203 case Intrinsic::amdgcn_interp_inreg_p10:
5204 case Intrinsic::amdgcn_interp_inreg_p2:
5205 case Intrinsic::amdgcn_interp_inreg_p10_f16:
5206 case Intrinsic::amdgcn_interp_inreg_p2_f16:
5207 case Intrinsic::amdgcn_interp_p10_rtz_f16:
5208 case Intrinsic::amdgcn_interp_p2_rtz_f16: {
5209 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5210 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5211 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5212 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5213 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5214 break;
5215 }
5216 case Intrinsic::amdgcn_permlane16_swap:
5217 case Intrinsic::amdgcn_permlane32_swap: {
5218 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5219 OpdsMapping[0] = OpdsMapping[1] = OpdsMapping[3] = OpdsMapping[4] =
5220 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5221 break;
5222 }
5223 case Intrinsic::amdgcn_ballot: {
5224 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5225 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5226 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
5227 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, SrcSize);
5228 break;
5229 }
5230 case Intrinsic::amdgcn_inverse_ballot: {
5231 // This must be an SGPR, but accept a VGPR.
5232 Register MaskReg = MI.getOperand(2).getReg();
5233 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
5234 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5235 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5236 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
5237 break;
5238 }
5239 case Intrinsic::amdgcn_bitop3: {
5240 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5241 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5242 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5243 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5244 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5245 break;
5246 }
5247 case Intrinsic::amdgcn_s_quadmask:
5248 case Intrinsic::amdgcn_s_wqm: {
5249 Register MaskReg = MI.getOperand(2).getReg();
5250 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
5251 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5252 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, MaskSize);
5253 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
5254 break;
5255 }
5256 case Intrinsic::amdgcn_wave_reduce_add:
5257 case Intrinsic::amdgcn_wave_reduce_fadd:
5258 case Intrinsic::amdgcn_wave_reduce_sub:
5259 case Intrinsic::amdgcn_wave_reduce_fsub:
5260 case Intrinsic::amdgcn_wave_reduce_min:
5261 case Intrinsic::amdgcn_wave_reduce_umin:
5262 case Intrinsic::amdgcn_wave_reduce_fmin:
5263 case Intrinsic::amdgcn_wave_reduce_max:
5264 case Intrinsic::amdgcn_wave_reduce_umax:
5265 case Intrinsic::amdgcn_wave_reduce_fmax:
5266 case Intrinsic::amdgcn_wave_reduce_and:
5267 case Intrinsic::amdgcn_wave_reduce_or:
5268 case Intrinsic::amdgcn_wave_reduce_xor: {
5269 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5270 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
5271 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5272 auto regBankID =
5273 isSALUMapping(MI) ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5274 OpdsMapping[2] = AMDGPU::getValueMapping(regBankID, OpSize);
5275 break;
5276 }
5277 case Intrinsic::amdgcn_s_bitreplicate: {
5278 Register MaskReg = MI.getOperand(2).getReg();
5279 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5280 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5281 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, 32);
5282 break;
5283 }
5284 case Intrinsic::amdgcn_wave_shuffle: {
5285 unsigned OpSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5286 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
5287 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
5288 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
5289 break;
5290 }
5291 }
5292 break;
5293 }
5294 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
5295 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
5296 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
5297 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
5298 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
5299 auto IntrID = AMDGPU::getIntrinsicID(MI);
5300 const AMDGPU::RsrcIntrinsic *RSrcIntrin = AMDGPU::lookupRsrcIntrinsic(IntrID);
5301 assert(RSrcIntrin && "missing RsrcIntrinsic for image intrinsic");
5302 // Non-images can have complications from operands that allow both SGPR
5303 // and VGPR. For now it's too complicated to figure out the final opcode
5304 // to derive the register bank from the MCInstrDesc.
5305 assert(RSrcIntrin->IsImage);
5306 return getImageMapping(MRI, MI, RSrcIntrin->RsrcArg);
5307 }
5308 case AMDGPU::G_AMDGPU_BVH_INTERSECT_RAY:
5309 case AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY:
5310 case AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY: {
5311 bool IsDualOrBVH8 =
5312 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY ||
5313 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY;
5314 unsigned NumMods = IsDualOrBVH8 ? 0 : 1; // Has A16 modifier
5315 unsigned LastRegOpIdx = MI.getNumExplicitOperands() - 1 - NumMods;
5316 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5317 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5318 if (IsDualOrBVH8) {
5319 OpdsMapping[1] = AMDGPU::getValueMapping(
5320 AMDGPU::VGPRRegBankID,
5321 MRI.getType(MI.getOperand(1).getReg()).getSizeInBits());
5322 OpdsMapping[2] = AMDGPU::getValueMapping(
5323 AMDGPU::VGPRRegBankID,
5324 MRI.getType(MI.getOperand(2).getReg()).getSizeInBits());
5325 }
5326 OpdsMapping[LastRegOpIdx] =
5327 getSGPROpMapping(MI.getOperand(LastRegOpIdx).getReg(), MRI, *TRI);
5328 if (LastRegOpIdx == 3) {
5329 // Sequential form: all operands combined into VGPR256/VGPR512
5330 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5331 if (Size > 256)
5332 Size = 512;
5333 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5334 } else {
5335 // NSA form
5336 unsigned FirstSrcOpIdx = IsDualOrBVH8 ? 4 : 2;
5337 for (unsigned I = FirstSrcOpIdx; I < LastRegOpIdx; ++I) {
5338 unsigned Size = MRI.getType(MI.getOperand(I).getReg()).getSizeInBits();
5339 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5340 }
5341 }
5342 break;
5343 }
5344 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
5345 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
5346 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
5347 switch (IntrID) {
5348 case Intrinsic::amdgcn_s_getreg:
5349 case Intrinsic::amdgcn_s_memtime:
5350 case Intrinsic::amdgcn_s_memrealtime:
5351 case Intrinsic::amdgcn_s_get_waveid_in_workgroup:
5352 case Intrinsic::amdgcn_s_sendmsg_rtn: {
5353 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5354 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5355 break;
5356 }
5357 case Intrinsic::amdgcn_global_atomic_fmin_num:
5358 case Intrinsic::amdgcn_global_atomic_fmax_num:
5359 case Intrinsic::amdgcn_flat_atomic_fmin_num:
5360 case Intrinsic::amdgcn_flat_atomic_fmax_num:
5361 case Intrinsic::amdgcn_global_atomic_ordered_add_b64:
5362 case Intrinsic::amdgcn_global_load_tr_b64:
5363 case Intrinsic::amdgcn_global_load_tr_b128:
5364 case Intrinsic::amdgcn_global_load_tr4_b64:
5365 case Intrinsic::amdgcn_global_load_tr6_b96:
5366 case Intrinsic::amdgcn_ds_load_tr8_b64:
5367 case Intrinsic::amdgcn_ds_load_tr16_b128:
5368 case Intrinsic::amdgcn_ds_load_tr4_b64:
5369 case Intrinsic::amdgcn_ds_load_tr6_b96:
5370 case Intrinsic::amdgcn_ds_read_tr4_b64:
5371 case Intrinsic::amdgcn_ds_read_tr6_b96:
5372 case Intrinsic::amdgcn_ds_read_tr8_b64:
5373 case Intrinsic::amdgcn_ds_read_tr16_b64:
5374 case Intrinsic::amdgcn_ds_atomic_async_barrier_arrive_b64:
5375 case Intrinsic::amdgcn_ds_atomic_barrier_arrive_rtn_b64:
5377 case Intrinsic::amdgcn_ds_ordered_add:
5378 case Intrinsic::amdgcn_ds_ordered_swap: {
5379 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5380 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5381 unsigned M0Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5382 AMDGPU::SGPRRegBankID);
5383 OpdsMapping[2] = AMDGPU::getValueMapping(M0Bank, 32);
5384 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5385 break;
5386 }
5387 case Intrinsic::amdgcn_ds_append:
5388 case Intrinsic::amdgcn_ds_consume: {
5389 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5390 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5391 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5392 break;
5393 }
5394 case Intrinsic::amdgcn_exp_compr:
5395 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5396 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5397 break;
5398 case Intrinsic::amdgcn_exp:
5399 // FIXME: Could we support packed types here?
5400 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5401 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5402 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5403 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5404 break;
5405 case Intrinsic::amdgcn_exp_row:
5406 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5407 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5408 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5409 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5410 OpdsMapping[8] = getSGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5411 break;
5412 case Intrinsic::amdgcn_s_alloc_vgpr:
5413 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1);
5414 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
5415 break;
5416 case Intrinsic::amdgcn_s_sendmsg:
5417 case Intrinsic::amdgcn_s_sendmsghalt: {
5418 // This must be an SGPR, but accept a VGPR.
5419 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5420 AMDGPU::SGPRRegBankID);
5421 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5422 break;
5423 }
5424 case Intrinsic::amdgcn_s_setreg: {
5425 // This must be an SGPR, but accept a VGPR.
5426 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5427 AMDGPU::SGPRRegBankID);
5428 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5429 break;
5430 }
5431 case Intrinsic::amdgcn_s_ttracedata: {
5432 // This must be an SGPR, but accept a VGPR.
5433 unsigned Bank =
5434 getRegBankID(MI.getOperand(1).getReg(), MRI, AMDGPU::SGPRRegBankID);
5435 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5436 break;
5437 }
5438 case Intrinsic::amdgcn_end_cf: {
5439 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5440 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5441 break;
5442 }
5443 case Intrinsic::amdgcn_else: {
5444 unsigned WaveSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5445 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5446 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5447 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5448 break;
5449 }
5450 case Intrinsic::amdgcn_init_whole_wave:
5451 case Intrinsic::amdgcn_live_mask: {
5452 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5453 break;
5454 }
5455 case Intrinsic::amdgcn_wqm_demote:
5456 case Intrinsic::amdgcn_kill: {
5457 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5458 break;
5459 }
5460 case Intrinsic::amdgcn_raw_buffer_load:
5461 case Intrinsic::amdgcn_raw_ptr_buffer_load:
5462 case Intrinsic::amdgcn_raw_atomic_buffer_load:
5463 case Intrinsic::amdgcn_raw_ptr_atomic_buffer_load:
5464 case Intrinsic::amdgcn_raw_tbuffer_load:
5465 case Intrinsic::amdgcn_raw_ptr_tbuffer_load: {
5466 // FIXME: Should make intrinsic ID the last operand of the instruction,
5467 // then this would be the same as store
5468 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5469 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5470 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5471 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5472 break;
5473 }
5474 case Intrinsic::amdgcn_raw_buffer_load_lds:
5475 case Intrinsic::amdgcn_raw_buffer_load_async_lds:
5476 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
5477 case Intrinsic::amdgcn_raw_ptr_buffer_load_async_lds: {
5478 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5479 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5480 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5481 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5482 break;
5483 }
5484 case Intrinsic::amdgcn_raw_buffer_store:
5485 case Intrinsic::amdgcn_raw_ptr_buffer_store:
5486 case Intrinsic::amdgcn_raw_buffer_store_format:
5487 case Intrinsic::amdgcn_raw_ptr_buffer_store_format:
5488 case Intrinsic::amdgcn_raw_tbuffer_store:
5489 case Intrinsic::amdgcn_raw_ptr_tbuffer_store: {
5490 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5491 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5492 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5493 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5494 break;
5495 }
5496 case Intrinsic::amdgcn_struct_buffer_load:
5497 case Intrinsic::amdgcn_struct_ptr_buffer_load:
5498 case Intrinsic::amdgcn_struct_tbuffer_load:
5499 case Intrinsic::amdgcn_struct_ptr_tbuffer_load:
5500 case Intrinsic::amdgcn_struct_atomic_buffer_load:
5501 case Intrinsic::amdgcn_struct_ptr_atomic_buffer_load: {
5502 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5503 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5504 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5505 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5506 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5507 break;
5508 }
5509 case Intrinsic::amdgcn_struct_buffer_load_lds:
5510 case Intrinsic::amdgcn_struct_buffer_load_async_lds:
5511 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds:
5512 case Intrinsic::amdgcn_struct_ptr_buffer_load_async_lds: {
5513 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5514 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5515 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5516 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5517 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
5518 break;
5519 }
5520 case Intrinsic::amdgcn_struct_buffer_store:
5521 case Intrinsic::amdgcn_struct_ptr_buffer_store:
5522 case Intrinsic::amdgcn_struct_tbuffer_store:
5523 case Intrinsic::amdgcn_struct_ptr_tbuffer_store: {
5524 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5525 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5526 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5527 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5528 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5529 break;
5530 }
5531 case Intrinsic::amdgcn_init_exec_from_input: {
5532 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5533 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5534 break;
5535 }
5536 case Intrinsic::amdgcn_ds_gws_init:
5537 case Intrinsic::amdgcn_ds_gws_barrier:
5538 case Intrinsic::amdgcn_ds_gws_sema_br: {
5539 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5540
5541 // This must be an SGPR, but accept a VGPR.
5542 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5543 AMDGPU::SGPRRegBankID);
5544 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5545 break;
5546 }
5547 case Intrinsic::amdgcn_ds_gws_sema_v:
5548 case Intrinsic::amdgcn_ds_gws_sema_p:
5549 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
5550 // This must be an SGPR, but accept a VGPR.
5551 unsigned Bank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5552 AMDGPU::SGPRRegBankID);
5553 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5554 break;
5555 }
5556 case Intrinsic::amdgcn_cluster_load_b32:
5557 case Intrinsic::amdgcn_cluster_load_b64:
5558 case Intrinsic::amdgcn_cluster_load_b128: {
5559 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5560 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5561 unsigned M0Bank =
5562 getRegBankID(MI.getOperand(4).getReg(), MRI, AMDGPU::SGPRRegBankID);
5563 OpdsMapping[4] = AMDGPU::getValueMapping(M0Bank, 32);
5564 break;
5565 }
5566 case Intrinsic::amdgcn_cluster_load_async_to_lds_b8:
5567 case Intrinsic::amdgcn_cluster_load_async_to_lds_b32:
5568 case Intrinsic::amdgcn_cluster_load_async_to_lds_b64:
5569 case Intrinsic::amdgcn_cluster_load_async_to_lds_b128: {
5570 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5571 // LDS address goes into $vdst (VGPR).
5572 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5573 unsigned M0Bank =
5574 getRegBankID(MI.getOperand(5).getReg(), MRI, AMDGPU::SGPRRegBankID);
5575 OpdsMapping[5] = AMDGPU::getValueMapping(M0Bank, 32);
5576 break;
5577 }
5578 case Intrinsic::amdgcn_global_store_async_from_lds_b8:
5579 case Intrinsic::amdgcn_global_store_async_from_lds_b32:
5580 case Intrinsic::amdgcn_global_store_async_from_lds_b64:
5581 case Intrinsic::amdgcn_global_store_async_from_lds_b128:
5582 case Intrinsic::amdgcn_global_load_async_to_lds_b8:
5583 case Intrinsic::amdgcn_global_load_async_to_lds_b32:
5584 case Intrinsic::amdgcn_global_load_async_to_lds_b64:
5585 case Intrinsic::amdgcn_global_load_async_to_lds_b128: {
5586 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5587 // LDS address goes into $vdst/$vdata (VGPR).
5588 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5589 break;
5590 }
5591 case Intrinsic::amdgcn_load_to_lds:
5592 case Intrinsic::amdgcn_load_async_to_lds:
5593 case Intrinsic::amdgcn_global_load_lds:
5594 case Intrinsic::amdgcn_global_load_async_lds: {
5595 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5596 // LDS address goes into M0 (SGPR).
5597 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5598 break;
5599 }
5600 case Intrinsic::amdgcn_lds_direct_load: {
5601 const int M0Idx = MI.getNumOperands() - 1;
5602 Register M0Reg = MI.getOperand(M0Idx).getReg();
5603 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5604 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5605
5606 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5607 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5608 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5609
5610 // Must be SGPR, but we must take whatever the original bank is and fix it
5611 // later.
5612 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5613 break;
5614 }
5615 case Intrinsic::amdgcn_ds_add_gs_reg_rtn:
5616 case Intrinsic::amdgcn_ds_sub_gs_reg_rtn:
5617 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5618 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5619 break;
5620 case Intrinsic::amdgcn_ds_bvh_stack_rtn:
5621 case Intrinsic::amdgcn_ds_bvh_stack_push4_pop1_rtn:
5622 case Intrinsic::amdgcn_ds_bvh_stack_push8_pop1_rtn:
5623 case Intrinsic::amdgcn_ds_bvh_stack_push8_pop2_rtn: {
5624 OpdsMapping[0] =
5625 getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI); // %vdst
5626 OpdsMapping[1] =
5627 getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI); // %addr
5628 OpdsMapping[3] =
5629 getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI); // %addr
5630 OpdsMapping[4] =
5631 getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI); // %data0
5632 OpdsMapping[5] =
5633 getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI); // %data1
5634 break;
5635 }
5636 case Intrinsic::amdgcn_s_sleep_var:
5637 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5638 break;
5639 case Intrinsic::amdgcn_s_barrier_join:
5640 case Intrinsic::amdgcn_s_wakeup_barrier:
5641 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5642 break;
5643 case Intrinsic::amdgcn_s_barrier_init:
5644 case Intrinsic::amdgcn_s_barrier_signal_var:
5645 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5646 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5647 break;
5648 case Intrinsic::amdgcn_s_barrier_signal_isfirst: {
5649 const unsigned ResultSize = 1;
5650 OpdsMapping[0] =
5651 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, ResultSize);
5652 break;
5653 }
5654 case Intrinsic::amdgcn_s_get_barrier_state:
5655 case Intrinsic::amdgcn_s_get_named_barrier_state: {
5656 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5657 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5658 break;
5659 }
5660 case Intrinsic::amdgcn_pops_exiting_wave_id:
5661 return getDefaultMappingSOP(MI);
5662 case Intrinsic::amdgcn_tensor_load_to_lds:
5663 case Intrinsic::amdgcn_tensor_store_from_lds: {
5664 // Lie and claim everything is legal, even all operands need to be
5665 // SGPRs. applyMapping will have to deal with it with readfirstlane.
5666 for (unsigned I = 1; I < MI.getNumOperands(); ++I) {
5667 if (MI.getOperand(I).isReg()) {
5668 Register Reg = MI.getOperand(I).getReg();
5669 auto OpBank = getRegBankID(Reg, MRI);
5670 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5671 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5672 }
5673 }
5674 break;
5675 }
5676 case Intrinsic::amdgcn_s_prefetch_data:
5677 case Intrinsic::amdgcn_s_prefetch_inst: {
5678 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5679 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5680 break;
5681 }
5682 case Intrinsic::amdgcn_flat_prefetch:
5683 case Intrinsic::amdgcn_global_prefetch:
5684 return getDefaultMappingVOP(MI);
5685 default:
5687 }
5688 break;
5689 }
5690 case AMDGPU::G_SELECT: {
5691 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5692 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5693 AMDGPU::SGPRRegBankID);
5694 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI,
5695 AMDGPU::SGPRRegBankID);
5696 bool SGPRSrcs = Op2Bank == AMDGPU::SGPRRegBankID &&
5697 Op3Bank == AMDGPU::SGPRRegBankID;
5698
5699 unsigned CondBankDefault = SGPRSrcs ?
5700 AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5701 unsigned CondBank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5702 CondBankDefault);
5703 if (CondBank == AMDGPU::SGPRRegBankID)
5704 CondBank = SGPRSrcs ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5705 else if (CondBank == AMDGPU::VGPRRegBankID)
5706 CondBank = AMDGPU::VCCRegBankID;
5707
5708 unsigned Bank = SGPRSrcs && CondBank == AMDGPU::SGPRRegBankID ?
5709 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5710
5711 assert(CondBank == AMDGPU::VCCRegBankID || CondBank == AMDGPU::SGPRRegBankID);
5712
5713 // TODO: Should report 32-bit for scalar condition type.
5714 if (Size == 64) {
5715 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5716 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5717 OpdsMapping[2] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5718 OpdsMapping[3] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5719 } else {
5720 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, Size);
5721 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5722 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, Size);
5723 OpdsMapping[3] = AMDGPU::getValueMapping(Bank, Size);
5724 }
5725
5726 break;
5727 }
5728
5729 case AMDGPU::G_SI_CALL: {
5730 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5731 // Lie and claim everything is legal, even though some need to be
5732 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
5733 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5734
5735 // Allow anything for implicit arguments
5736 for (unsigned I = 4; I < MI.getNumOperands(); ++I) {
5737 if (MI.getOperand(I).isReg()) {
5738 Register Reg = MI.getOperand(I).getReg();
5739 auto OpBank = getRegBankID(Reg, MRI);
5740 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5741 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5742 }
5743 }
5744 break;
5745 }
5746 case AMDGPU::G_LOAD:
5747 case AMDGPU::G_ZEXTLOAD:
5748 case AMDGPU::G_SEXTLOAD:
5749 return getInstrMappingForLoad(MI);
5750
5751 case AMDGPU::G_ATOMICRMW_XCHG:
5752 case AMDGPU::G_ATOMICRMW_ADD:
5753 case AMDGPU::G_ATOMICRMW_SUB:
5754 case AMDGPU::G_ATOMICRMW_AND:
5755 case AMDGPU::G_ATOMICRMW_OR:
5756 case AMDGPU::G_ATOMICRMW_XOR:
5757 case AMDGPU::G_ATOMICRMW_MAX:
5758 case AMDGPU::G_ATOMICRMW_MIN:
5759 case AMDGPU::G_ATOMICRMW_UMAX:
5760 case AMDGPU::G_ATOMICRMW_UMIN:
5761 case AMDGPU::G_ATOMICRMW_FADD:
5762 case AMDGPU::G_ATOMICRMW_FMIN:
5763 case AMDGPU::G_ATOMICRMW_FMAX:
5764 case AMDGPU::G_ATOMICRMW_UINC_WRAP:
5765 case AMDGPU::G_ATOMICRMW_UDEC_WRAP:
5766 case AMDGPU::G_ATOMICRMW_USUB_COND:
5767 case AMDGPU::G_ATOMICRMW_USUB_SAT:
5768 case AMDGPU::G_AMDGPU_ATOMIC_CMPXCHG: {
5769 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5770 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5771 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5772 break;
5773 }
5774 case AMDGPU::G_ATOMIC_CMPXCHG: {
5775 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5776 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5777 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5778 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5779 break;
5780 }
5781 case AMDGPU::G_BRCOND: {
5782 unsigned Bank = getRegBankID(MI.getOperand(0).getReg(), MRI,
5783 AMDGPU::SGPRRegBankID);
5784 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
5785 if (Bank != AMDGPU::SGPRRegBankID)
5786 Bank = AMDGPU::VCCRegBankID;
5787
5788 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, 1);
5789 break;
5790 }
5791 case AMDGPU::G_INTRINSIC_FPTRUNC_ROUND:
5792 return getDefaultMappingVOP(MI);
5793 case AMDGPU::G_PREFETCH:
5794 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5795 break;
5796 case AMDGPU::G_AMDGPU_WHOLE_WAVE_FUNC_SETUP:
5797 case AMDGPU::G_AMDGPU_WHOLE_WAVE_FUNC_RETURN:
5798 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5799 break;
5800 case AMDGPU::G_AMDGPU_FLAT_LOAD_MONITOR:
5801 case AMDGPU::G_AMDGPU_GLOBAL_LOAD_MONITOR: {
5802 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5803 unsigned PtrSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5804 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5805 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize);
5806 break;
5807 }
5808 }
5809
5810 return getInstructionMapping(/*ID*/1, /*Cost*/1,
5811 getOperandsMapping(OpdsMapping),
5812 MI.getNumOperands());
5813}
static unsigned getIntrinsicID(const SDNode *N)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
Contains the definition of a TargetInstrInfo class that is common to all AMD GPUs.
constexpr LLT S16
constexpr LLT S1
constexpr LLT S32
constexpr LLT S64
AMDGPU Register Bank Select
static bool substituteSimpleCopyRegs(const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx)
static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1)
static std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg)
static Register constrainRegToBank(MachineRegisterInfo &MRI, MachineIRBuilder &B, Register &Reg, const RegisterBank &Bank)
static std::pair< Register, Register > unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode)
static void extendLow32IntoHigh32(MachineIRBuilder &B, Register Hi32Reg, Register Lo32Reg, unsigned ExtOpc, const RegisterBank &RegBank, bool IsBooleanSrc=false)
Implement extending a 32-bit value to a 64-bit value.
static unsigned getExtendOp(unsigned Opc)
static bool isVectorRegisterBank(const RegisterBank &Bank)
static unsigned regBankUnion(unsigned RB0, unsigned RB1)
static std::pair< LLT, LLT > splitUnequalType(LLT Ty, unsigned FirstSize)
Split Ty into 2 pieces.
static void setRegsToType(MachineRegisterInfo &MRI, ArrayRef< Register > Regs, LLT NewTy)
Replace the current type each register in Regs has with NewTy.
static void reinsertVectorIndexAdd(MachineIRBuilder &B, MachineInstr &IdxUseInstr, unsigned OpIdx, unsigned ConstOffset)
Utility function for pushing dynamic vector indexes with a constant offset into waterfall loops.
static LLT widen96To128(LLT Ty)
static LLT getHalfSizedType(LLT Ty)
static unsigned getSBufferLoadCorrespondingBufferLoadOpcode(unsigned Opc)
This file declares the targeting of the RegisterBankInfo class for AMDGPU.
Rewrite undef for PHI
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
#define X(NUM, ENUM, NAME)
Definition ELF.h:853
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
AMD GCN specific subclass of TargetSubtarget.
Declares convenience wrapper classes for interpreting MachineInstr instances as specific generic oper...
IRTranslator LLVM IR MI
const size_t AbstractManglingParser< Derived, Alloc >::NumOps
const AbstractManglingParser< Derived, Alloc >::OperatorInfo AbstractManglingParser< Derived, Alloc >::Ops[]
#define I(x, y, z)
Definition MD5.cpp:57
Contains matchers for matching SSA Machine Instructions.
This file declares the MachineIRBuilder class.
Register Reg
Promote Memory to Register
Definition Mem2Reg.cpp:110
static bool isReg(const MCInst &MI, unsigned OpNo)
MachineInstr unsigned OpIdx
ConstantRange Range(APInt(BitWidth, Low), APInt(BitWidth, High))
static constexpr MCPhysReg SPReg
Interface definition for SIRegisterInfo.
static TableGen::Emitter::Opt Y("gen-skeleton-entry", EmitSkeleton, "Generate example skeleton entry")
bool applyMappingDynStackAlloc(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
std::pair< Register, unsigned > splitBufferOffsets(MachineIRBuilder &B, Register Offset) const
bool collectWaterfallOperands(SmallSet< Register, 4 > &SGPROperandRegs, MachineInstr &MI, MachineRegisterInfo &MRI, ArrayRef< unsigned > OpIndices) const
const InstructionMapping & getImageMapping(const MachineRegisterInfo &MRI, const MachineInstr &MI, int RsrcIdx) const
InstructionMappings addMappingFromTable(const MachineInstr &MI, const MachineRegisterInfo &MRI, const std::array< unsigned, NumOps > RegSrcOpIdx, ArrayRef< OpRegBankEntry< NumOps > > Table) const
unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const override
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsicWSideEffects(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool buildVCopy(MachineIRBuilder &B, Register DstReg, Register SrcReg) const
bool executeInWaterfallLoop(MachineIRBuilder &B, iterator_range< MachineBasicBlock::iterator > Range, SmallSet< Register, 4 > &SGPROperandRegs) const
Legalize instruction MI where operands in OpIndices must be SGPRs.
const RegisterBank & getRegBankFromRegClass(const TargetRegisterClass &RC, LLT) const override
Get a register bank that covers RC.
AMDGPURegisterBankInfo(const GCNSubtarget &STI)
bool applyMappingMAD_64_32(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
unsigned getRegBankID(Register Reg, const MachineRegisterInfo &MRI, unsigned Default=AMDGPU::VGPRRegBankID) const
Register handleD16VData(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Reg) const
Handle register layout difference for f16 images for some subtargets.
const RegisterBankInfo::InstructionMapping & getInstrMappingForLoad(const MachineInstr &MI) const
void applyMappingImpl(MachineIRBuilder &Builder, const OperandsMapper &OpdMapper) const override
See RegisterBankInfo::applyMapping.
bool applyMappingBFE(MachineIRBuilder &B, const OperandsMapper &OpdMapper, bool Signed) const
bool applyMappingImage(MachineIRBuilder &B, MachineInstr &MI, const OperandsMapper &OpdMapper, int RSrcIdx) const
const ValueMapping * getVGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool isScalarLoadLegal(const MachineInstr &MI) const
unsigned setBufferOffsets(MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg, Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const
const ValueMapping * getSGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool applyMappingLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
void split64BitValueForMapping(MachineIRBuilder &B, SmallVector< Register, 2 > &Regs, LLT HalfTy, Register Reg) const
Split 64-bit value Reg into two 32-bit halves and populate them into Regs.
const ValueMapping * getValueMappingForPtr(const MachineRegisterInfo &MRI, Register Ptr) const
Return the mapping for a pointer argument.
unsigned getMappingType(const MachineRegisterInfo &MRI, const MachineInstr &MI) const
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsic(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool isDivergentRegBank(const RegisterBank *RB) const override
Returns true if the register bank is considered divergent.
void constrainOpWithReadfirstlane(MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const
InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const override
Get the alternative mappings for MI.
const InstructionMapping & getDefaultMappingSOP(const MachineInstr &MI) const
const InstructionMapping & getDefaultMappingAllVGPR(const MachineInstr &MI) const
const InstructionMapping & getInstrMapping(const MachineInstr &MI) const override
This function must return a legal mapping, because AMDGPURegisterBankInfo::getInstrAlternativeMapping...
unsigned getBreakDownCost(const ValueMapping &ValMapping, const RegisterBank *CurBank=nullptr) const override
Get the cost of using ValMapping to decompose a register.
const ValueMapping * getAGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
const InstructionMapping & getDefaultMappingVOP(const MachineInstr &MI) const
bool isSALUMapping(const MachineInstr &MI) const
Register buildReadFirstLane(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Src) const
bool applyMappingSBufferLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
void applyMappingSMULU64(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
static const LaneMaskConstants & get(const GCNSubtarget &ST)
Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:740
@ ICMP_SLT
signed less than
Definition InstrTypes.h:769
@ ICMP_NE
not equal
Definition InstrTypes.h:762
A debug info location.
Definition DebugLoc.h:124
iterator find(const_arg_type_t< KeyT > Val)
Definition DenseMap.h:225
iterator end()
Definition DenseMap.h:143
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Definition DenseMap.h:286
static constexpr ElementCount getFixed(ScalarTy MinVal)
Definition TypeSize.h:309
Abstract class that contains various methods for clients to notify about changes.
constexpr unsigned getScalarSizeInBits() const
constexpr bool isScalar() const
LLT getScalarType() const
static constexpr LLT scalar(unsigned SizeInBits)
Get a low-level scalar or aggregate "bag of bits".
constexpr uint16_t getNumElements() const
Returns the number of elements in a vector LLT.
constexpr bool isVector() const
constexpr TypeSize getSizeInBits() const
Returns the total size of the type. Must only be called on sized types.
LLT divide(int Factor) const
Return a type that is Factor times smaller.
constexpr unsigned getAddressSpace() const
static constexpr LLT fixed_vector(unsigned NumElements, unsigned ScalarSizeInBits)
Get a low-level fixed-width vector of some number of elements and element width.
LLT getElementType() const
Returns the vector's element type. Only valid for vector types.
static constexpr LLT scalarOrVector(ElementCount EC, LLT ScalarTy)
This is an important class for using LLVM in a threaded context.
Definition LLVMContext.h:68
LLVM_ABI void widenScalarSrc(MachineInstr &MI, LLT WideTy, unsigned OpIdx, unsigned ExtOpcode)
Legalize a single operand OpIdx of the machine instruction MI as a Use by extending the operand's typ...
LLVM_ABI LegalizeResult lowerAbsToMaxNeg(MachineInstr &MI)
LLVM_ABI LegalizeResult narrowScalar(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize an instruction by reducing the width of the underlying scalar type.
LLVM_ABI LegalizeResult reduceLoadStoreWidth(GLoadStore &MI, unsigned TypeIdx, LLT NarrowTy)
@ Legalized
Instruction has been legalized and the MachineFunction changed.
LLVM_ABI LegalizeResult fewerElementsVector(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize a vector instruction by splitting into multiple components, each acting on the same scalar t...
LLVM_ABI LegalizeResult widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy)
Legalize an instruction by performing the operation on a wider scalar type (for example a 16-bit addi...
LLVM_ABI void widenScalarDst(MachineInstr &MI, LLT WideTy, unsigned OpIdx=0, unsigned TruncOpcode=TargetOpcode::G_TRUNC)
Legalize a single operand OpIdx of the machine instruction MI as a Def by extending the operand's typ...
TypeSize getValue() const
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
Helper class to build MachineInstr.
const MachineInstrBuilder & addReg(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a new virtual register operand.
MachineInstrSpan provides an interface to get an iteration range containing the instruction it was in...
MachineBasicBlock::iterator begin()
MachineBasicBlock::iterator end()
Representation of each machine instruction.
const MachineBasicBlock * getParent() const
const MachineOperand & getOperand(unsigned i) const
A description of a memory reference used in the backend.
LocationSize getSize() const
Return the size in bytes of the memory reference.
unsigned getAddrSpace() const
bool isAtomic() const
Returns true if this operation has an atomic ordering requirement of unordered or higher,...
@ MODereferenceable
The memory access is dereferenceable (i.e., doesn't trap).
@ MOLoad
The memory access reads data.
@ MOInvariant
The memory access always returns the same value (or traps).
Flags getFlags() const
Return the raw flags of the source value,.
LLVM_ABI Align getAlign() const
Return the minimum known alignment in bytes of the actual memory reference.
MachineOperand class - Representation of each machine instruction operand.
LLVM_ABI void setReg(Register Reg)
Change the register this operand corresponds to.
Register getReg() const
getReg - Returns the register number.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
const RegClassOrRegBank & getRegClassOrRegBank(Register Reg) const
Return the register bank or register class of Reg.
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLT getType(Register Reg) const
Get the low-level type of Reg or LLT{} if Reg is not a generic (target independent) virtual register.
const RegisterBank * getRegBankOrNull(Register Reg) const
Return the register bank of Reg, or null if Reg has not been assigned a register bank or has been ass...
LLVM_ABI void setRegBank(Register Reg, const RegisterBank &RegBank)
Set the register bank to RegBank for Reg.
LLVM_ABI void setType(Register VReg, LLT Ty)
Set the low-level type of VReg to Ty.
LLVM_ABI void setRegClass(Register Reg, const TargetRegisterClass *RC)
setRegClass - Set the register class of the specified virtual register.
LLVM_ABI Register createGenericVirtualRegister(LLT Ty, StringRef Name="")
Create and return a new generic virtual register with low-level type Ty.
void setSimpleHint(Register VReg, Register PrefReg)
Specify the preferred (target independent) register allocation hint for the specified virtual registe...
Helper class that represents how the value of an instruction may be mapped and what is the related co...
bool isValid() const
Check whether this object is valid.
Helper class used to get/create the virtual registers that will be used to replace the MachineOperand...
const InstructionMapping & getInstrMapping() const
The final mapping of the instruction.
MachineRegisterInfo & getMRI() const
The MachineRegisterInfo we used to realize the mapping.
LLVM_ABI iterator_range< SmallVectorImpl< Register >::const_iterator > getVRegs(unsigned OpIdx, bool ForDebug=false) const
Get all the virtual registers required to map the OpIdx-th operand of the instruction.
virtual InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const
Get the alternative mappings for MI.
static const TargetRegisterClass * constrainGenericRegister(Register Reg, const TargetRegisterClass &RC, MachineRegisterInfo &MRI)
Constrain the (possibly generic) virtual register Reg to RC.
const InstructionMapping & getInstructionMapping(unsigned ID, unsigned Cost, const ValueMapping *OperandsMapping, unsigned NumOperands) const
Method to get a uniquely generated InstructionMapping.
static void applyDefaultMapping(const OperandsMapper &OpdMapper)
Helper method to apply something that is like the default mapping.
const ValueMapping & getValueMapping(unsigned StartIdx, unsigned Length, const RegisterBank &RegBank) const
The most common ValueMapping consists of a single PartialMapping.
const InstructionMapping & getInvalidInstructionMapping() const
Method to get a uniquely generated invalid InstructionMapping.
const RegisterBank & getRegBank(unsigned ID)
Get the register bank identified by ID.
const unsigned * Sizes
Hold the sizes of the register banks for all HwModes.
bool cannotCopy(const RegisterBank &Dst, const RegisterBank &Src, TypeSize Size) const
TypeSize getSizeInBits(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
Get the size in bits of Reg.
const ValueMapping * getOperandsMapping(Iterator Begin, Iterator End) const
Get the uniquely generated array of ValueMapping for the elements of between Begin and End.
SmallVector< const InstructionMapping *, 4 > InstructionMappings
Convenient type to represent the alternatives for mapping an instruction.
virtual unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
const InstructionMapping & getInstrMappingImpl(const MachineInstr &MI) const
Try to get the mapping of MI.
This class implements the register bank concept.
unsigned getID() const
Get the identifier of this register bank.
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isVirtual() const
Return true if the specified register number is in the virtual register namespace.
Definition Register.h:79
static unsigned getMaxMUBUFImmOffset(const GCNSubtarget &ST)
This class keeps track of the SPI_SP_INPUT_ADDR config register, which tells the hardware which inter...
bool selectAGPRFormMFMA(unsigned NumRegs) const
Return true if an MFMA that requires at least NumRegs should select to the AGPR form,...
static bool shouldExpandVectorDynExt(unsigned EltSize, unsigned NumElem, bool IsDivergentIdx, const GCNSubtarget *Subtarget)
Check if EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT (<n x e>, var-idx) should be expanded into a set of cmp...
SmallSet - This maintains a set of unique values, optimizing for the case when the set is small (less...
Definition SmallSet.h:134
size_type count(const T &V) const
count - Return 1 if the element is in the set, 0 otherwise.
Definition SmallSet.h:176
bool empty() const
Definition SmallSet.h:169
std::pair< const_iterator, bool > insert(const T &V)
insert - Insert an element into the set if it isn't already there.
Definition SmallSet.h:184
void resize(size_type N)
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Register getReg() const
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition TypeSize.h:343
static LLVM_ABI IntegerType * getInt32Ty(LLVMContext &C)
Definition Type.cpp:309
self_iterator getIterator()
Definition ilist_node.h:123
A range adaptor for a pair of iterators.
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ CONSTANT_ADDRESS_32BIT
Address space for 32-bit constant memory.
@ REGION_ADDRESS
Address space for region memory. (GDS)
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
@ PRIVATE_ADDRESS
Address space for private memory.
@ BUFFER_RESOURCE
Address space for 128-bit buffer resources.
bool isFlatGlobalAddrSpace(unsigned AS)
bool isUniformMMO(const MachineMemOperand *MMO)
bool isExtendedGlobalAddrSpace(unsigned AS)
Intrinsic::ID getIntrinsicID(const MachineInstr &I)
Return the intrinsic ID for opcodes with the G_AMDGPU_INTRIN_ prefix.
std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg, GISelValueTracking *ValueTracking=nullptr, bool CheckNUW=false)
Returns base register and constant offset.
const RsrcIntrinsic * lookupRsrcIntrinsic(unsigned Intr)
operand_type_match m_Reg()
SpecificConstantMatch m_ZeroInt()
Convenience matchers for specific integer values.
ConstantMatch< APInt > m_ICst(APInt &Cst)
BinaryOp_match< LHS, RHS, TargetOpcode::G_ADD, true > m_GAdd(const LHS &L, const RHS &R)
bool mi_match(Reg R, const MachineRegisterInfo &MRI, Pattern &&P)
SpecificConstantOrSplatMatch m_SpecificICstOrSplat(const APInt &RequestedValue)
Matches a RequestedValue constant or a constant splat of RequestedValue.
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:558
LLVM_ABI MachineInstr * getOpcodeDef(unsigned Opcode, Register Reg, const MachineRegisterInfo &MRI)
See if Reg is defined by an single def instruction that is Opcode.
Definition Utils.cpp:653
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
@ Kill
The last use of a register.
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
LLVM_ABI void constrainSelectedInstRegOperands(MachineInstr &I, const TargetInstrInfo &TII, const TargetRegisterInfo &TRI, const RegisterBankInfo &RBI)
Mutate the newly-selected instruction I to constrain its (possibly generic) virtual register operands...
Definition Utils.cpp:156
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
LLVM_ABI std::optional< int64_t > getIConstantVRegSExtVal(Register VReg, const MachineRegisterInfo &MRI)
If VReg is defined by a G_CONSTANT fits in int64_t returns it.
Definition Utils.cpp:314
static const MachineMemOperand::Flags MONoClobber
Mark the MMO of a uniform load if there are no potentially clobbering stores on any path from the sta...
Definition SIInstrInfo.h:44
auto reverse(ContainerTy &&C)
Definition STLExtras.h:407
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
@ Add
Sum of integers.
DWARFExpression::Operation Op
void call_once(once_flag &flag, Function &&F, Args &&... ArgList)
Execute the function specified as a parameter once.
Definition Threading.h:86
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
LLVM_ABI std::optional< ValueAndVReg > getIConstantVRegValWithLookThrough(Register VReg, const MachineRegisterInfo &MRI, bool LookThroughInstrs=true)
If VReg is defined by a statically evaluable chain of instructions rooted on a G_CONSTANT returns its...
Definition Utils.cpp:433
Align assumeAligned(uint64_t Value)
Treats the value 0 as a 1, so Align is always at least 1.
Definition Alignment.h:100
unsigned Log2(Align A)
Returns the log2 of the alignment.
Definition Alignment.h:197
LLVM_ABI Register getSrcRegIgnoringCopies(Register Reg, const MachineRegisterInfo &MRI)
Find the source register for Reg, folding away any trivial copies.
Definition Utils.cpp:501
constexpr T maskTrailingOnes(unsigned N)
Create a bitmask with the N right-most bits set to 1, and all other bits set to 0.
Definition MathExtras.h:77
@ Default
The result value is uniform if and only if all operands are uniform.
Definition Uniformity.h:20
#define N
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
This class contains a discriminated union of information about pointers in memory operands,...
unsigned StartIdx
Number of bits at which this partial mapping starts in the original value.
const RegisterBank * RegBank
Register bank where the partial value lives.
unsigned Length
Length of this mapping in bits.
Helper struct that represents how a value is mapped through different register banks.
unsigned NumBreakDowns
Number of partial mapping to break down this value.
const PartialMapping * BreakDown
How the value is broken down between the different register banks.
The llvm::once_flag structure.
Definition Threading.h:67