Implementing the assembler parser – The Target Description
By Peggy Johnston / January 19, 2022 / No Comments / Adding the M88k backend to LLVM, Global instruction selection, Implementing M88kSubtarget, Implementing the assembler parser, ITCertification Exams
The assembler parser is easy to implement, since LLVM provides a framework for it, and large parts are generated from the target description.
The ParseInstruction() method in our class is called when the framework detects that an instruction needs to be parsed. That method parses in input via the provided lexer and constructs a so-called operand vector. An operand can be a token such as an instruction mnemonic, a register name, or an immediate, or it can be category-specific to the target. For example, two operands are constructed from the jmp %r2 input: a token operand for the mnemonic, and a register operand.
Then a generated matcher tries to match the operand vector against the instructions. If a match is found, then an instance of the MCInst class is created, which holds the parsed instruction. Otherwise, an error message is emitted. The advantage of this approach is that it automatically derives the matcher from the target description, without needing to handle all syntactical quirks.
However, we need to add a couple more support classes to make the assembler parser work. These additional classes are all stored in the MCTargetDesc directory.
Implementing the MCAsmInfo support class for the M88k Target
Within this section, we explore implementing the first required class for the configuration of the assembler parser: the MCAsmInfo class:
- We need to set some customization parameters for the assembler parser. The MCAsmInfo base class (https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/MC/MCAsmInfo.h) contains the common parameters. In addition, a subclass is created for each supported object file format; for example, the MCAsmInfoELF class (https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/MC/MCAsmInfoELF.h). The reasoning behind it is that the system assemblers on systems using the same object file format share common characteristics because they must support similar features. Our target operating system is OpenBSD, and it uses the ELF file format, so we derive our own M88kMCAsmInfo class from the MCAsmInfoELF class. The declaration in the M88kMCAsmInfo.h file is as follows:
namespace llvm {
class Triple;
class M88kMCAsmInfo : public MCAsmInfoELF {
public:
explicit M88kMCAsmInfo(const Triple &TT);
};
- The implementation in the M88kMCAsmInfo.cpp file only sets a couple of default values. Two crucial settings at present are the system using big-endian mode and employing the | symbol for comments. The other settings are for code generation later:
using namespace llvm;
M88kMCAsmInfo::M88kMCAsmInfo(const Triple &TT) {
IsLittleEndian = false;
UseDotAlignForAlignment = true;
MinInstAlignment = 4;
CommentString = “|”; // as comment delimiter is only
// allowed at first column
ZeroDirective = “\t.space\t”;
Data64bitsDirective = “\t.quad\t”;
UsesELFSectionDirectiveForBSS = true;
SupportsDebugInformation = false;
ExceptionsType = ExceptionHandling::SjLj;
}
Now we have completed the implementation for the MCAsmInfo class. The next class we will learn to implement helps us create a binary representation of the instructions within LLVM.