Defining the instruction formats and the instruction information – The Target Description-1
By Peggy Johnston / July 28, 2021 / No Comments / Adding the M88k backend to LLVM, Creating the disassembler, Emitting machine instructions, Exams of IT, Global instruction selection, Implementing M88kSubtarget, Implementing the assembler parser, ITCertification Exams
An instruction is defined using the TableGen Instruction class. Defining an instruction is a complex task because we have to consider many details. An instruction has a textual representation used by the assembler and the disassembler. It has a name, for example, and, and it may have operands. The assembler transforms the textual representation into a binary format, therefore, we must define the layout of that format. For instruction selection, we need to attach a pattern to the instruction. To manage this complexity, we define a class hierarchy. The base classes will describe the various instruction formats and are stored in the M88kIntrFormats.td file. The instructions themselves and other definitions required for the instruction selection are stored in the M88kInstrInfo.td file.
Let’s begin with defining a class for the instructions of the m88k architecture called M88kInst. We derive this class from the predefined Instruction class. Our new class has a couple of parameters. The outs and ins parameters describe the output and input operands as a list, using the special dag type. The textual representation of the instruction is split into the mnemonic given in the asm parameter, and the operands. Last, the pattern parameter can hold a pattern used for instruction selection.
We also need to define two new fields:
- The Inst field is used to hold the bit pattern of the instruction. Because the size of an instruction depends on the platform, this field cannot be predefined. All instructions of the m88k architecture are 32-bit wide, and so this field has the bits<32> type.
- The other field is called SoftFail and has the same type as Inst. It holds a bit mask used with an instruction for which the actual encoding can differ from the bits in the Inst field and still be valid. The only platform that requires this is ARM, so we can simply set this field to 0.
The other fields are defined in the superclass, and we only set the value. Simple computations are possible in the TableGen language, and we use this when we create the value for the AsmString field, which holds the full assembler representation. If the operands operand string is empty, then the AsmString field will just have the value of the asm parameter, otherwise, it will be the concatenation of both strings, with a space between them:
class InstM88k<dag outs, dag ins, string asm, string operands,
list<dag> pattern = []>
: Instruction {
bits<32> Inst;
bits<32> SoftFail = 0;
let Namespace = “M88k”;
let Size = 4;
dag OutOperandList = outs;
dag InOperandList = ins;
let AsmString = !if(!eq(operands, “”), asm,
!strconcat(asm, ” “, operands));
let Pattern = pattern;
let DecoderNamespace = “M88k”;
}
For the instruction encoding, the manufacturer usually groups instructions together, and the instructions of one group have a similar encoding. We can use those groups to systematically create classes defining the instruction formats. For example, all logical operations of the m88k architecture encode the destination register in the bits from 21 to 25 and the first source register in the bits from 16 to 20. Please note the implementation pattern here: we declare the rd and rs1 fields for the values, and we assign those values to the correct bit positions of the Inst field, which we defined previously in the superclass:
class F_L<dag outs, dag ins, string asm, string operands,
list<dag> pattern = []>
: InstM88k<outs, ins, asm, operands, pattern> {
bits<5> rd;
bits<5> rs1;
let Inst{25-21} = rd;
let Inst{20-16} = rs1;
}
There are several groups of logical operations based on this format. One of them is the group of instructions using three registers, which is called triadic addressing mode in the manual:
class F_LR<bits<5> func, bits<1> comp, string asm,
list<dag> pattern = []>
: F_L<(outs GPROpnd:$rd), (ins GPROpnd:$rs1, GPROpnd:$rs2),
!if(comp, !strconcat(asm, “.c”), asm),
“$rd, $rs1, $rs2”, pattern> {
bits<5> rs2;
let Inst{31-26} = 0b111101;
let Inst{15-11} = func;
let Inst{10} = comp;
let Inst{9-5} = 0b00000;
let Inst{4-0} = rs2;
}