Defining the instruction formats and the instruction information – The Target Description-2
By Peggy Johnston / September 8, 2021 / No Comments / Adding the M88k backend to LLVM, Creating the disassembler, Exams of IT, ITCertification Exams
Let’s examine the functionality provided by this class in more detail. The func parameter specifies the operation. As a special feature, the second operand can be complemented before the operation, which is indicated by setting the flag comp to 1. The mnemonic is given in the asm parameter, and an instruction selection pattern can be passed.
With initializing the superclass, we can give more information. The full assembler text template for the and instruction is and $rd, $rs1, $rs2. The operand string is fixed for all instructions of this group, so we can define it here. The mnemonic is given by the user of this class, but we can concatenate the .c suffix here, which denotes that the second operand should be complemented first. And last, we can define the output and input operands. These operands are expressed as directed acyclic graphs or dag for short. A dag has an operation and a list of arguments. An argument can also be a dag, which allows the construction of complex graphs. For example, the output operand is (outs GPROpnd:$rd).
The outs operation denotes this dag as the output operand list. The only argument, GPROpnd:$rd, consists of a type and a name. It connects several pieces we have already seen. The type is GPROnd, which is the name of the register operand we have defined in the previous section. The name $rd refers to the destination register. We used this name in the operand string earlier, and also as a field name in the F_L superclass. The input operands are defined similarly. The rest of the class initializes the other bits of the Inst field. Please take the time and check that all 32 bits are indeed now assigned.
We put the final instruction definition in the M88kInstrInfo.td file. Since we have two variants of each logical instruction, we use a multiclass to define both instructions at once. We also define here the pattern for the instruction selection as a directed acyclic graph. The operation in the pattern is set, and the first argument is the destination register. The second argument is a nested graph, which is the actual pattern. Once again, the name of the operation is the first OpNode element. LLVM has many predefined operations, which you find in the llvm/include/llvm/Target/TargetSelectionDAG.td file (https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Target/TargetSelectionDAG.td). For example, there is the and operation, which denotes a bitwise AND operation. The arguments are the two source registers, $rs1 and $rs2.You read this pattern roughly as follows: if the input to the instruction selection contains an OpNode operation using two registers, then assign the result of this operation to the $rd register and generate this instruction. Utilizing the graph structure, you can define more complex patterns. For example, the second pattern integrates the complement into the pattern using the not operand.
A small detail to point out is that the logical operations are commutative. This can be helpful for the instruction selection, so we set the isCommutable flag to 1 for those instructions:
multiclass Logic<bits<5> Fun, string OpcStr, SDNode OpNode> {
let isCommutable = 1 in
def rr : F_LR<Fun, /*comp=*/0b0, OpcStr,
[(set i32:$rd,
(OpNode GPROpnd:$rs1, GPROpnd:$rs2))]>;
def rrc : F_LR<Fun, /*comp=*/0b1, OpcStr,
[(set i32:$rd,
(OpNode GPROpnd:$rs1, (not GPROpnd:$rs2)))]>;
}
And finally, we define the records for the instructions:
defm AND : Logic<0b01000, “and”, and>;
defm XOR : Logic<0b01010, “xor”, xor>;
defm OR : Logic<0b01011, “or”, or>;
The first parameter is the bit pattern for the function, the second is the mnemonic, and the third parameter is the dag operation used in the pattern.
To fully understand the class hierarchy, revisit the class definitions. The guiding design principle is to avoid the repetition of information. For example, the 0b01000 function bit pattern is used exactly once. Without the Logic multiclass you would need to type this bit pattern twice and repeat the patterns several times, which is error-prone.
Please also note that it is good to establish a naming scheme for the instructions. For example, the record for the and instruction is named ANDrr, while the variant with the complemented register is named ANDrrc. Those names end up in the generated C++ source code, and using a naming scheme helps to understand to which assembler instruction the name refers.
Up to now, we modeled the register file of the m88k architecture and defined a couple of instructions. In the next section, we will create the top-level file.