Creating the disassembler – The Target Description
By Peggy Johnston / October 18, 2022 / No Comments / Adding the M88k backend to LLVM, Creating the disassembler, Implementing M88kSubtarget, Implementing the assembler parser, ITCertification Exams
Implementing the disassembler is optional. However, the implementation does not require too much effort, and generating the disassembler table may catch encoding errors that are not checked by the other generators. The disassembler lives in the M88kDisassembler.cpp file, found in the Disassembler subdirectory:
- We begin the implementation by defining a debug type and the DecodeStatus type. Both are required for the generated code:
using namespace llvm;
define DEBUG_TYPE “m88k-disassembler”
using DecodeStatus = MCDisassembler::DecodeStatus;
- The M88kDisassmbler class lives in an anonymous namespace. We only need to implement the getInstruction() method:
namespace {
class M88kDisassembler : public MCDisassembler {
public:
M88kDisassembler(const MCSubtargetInfo &STI,
MCContext &Ctx)
: MCDisassembler(STI, Ctx) {}
~M88kDisassembler() override = default;
DecodeStatus
getInstruction(MCInst &instr, uint64_t &Size,
ArrayRef Bytes,
uint64_t Address,
raw_ostream &CStream) const override;
};
} // end anonymous namespace
- We also need to provide a factory method, which will be registered in the target registry:
static MCDisassembler *
createM88kDisassembler(const Target &T,
const MCSubtargetInfo &STI,
MCContext &Ctx) {
return new M88kDisassembler(STI, Ctx);
}
extern “C” LLVM_EXTERNAL_VISIBILITY void
LLVMInitializeM88kDisassembler() {
TargetRegistry::RegisterMCDisassembler(
getTheM88kTarget(), createM88kDisassembler);
}
- The decodeGPRRegisterClass() function turns a register number into the register enum member generated by TableGen. This is the inverse operation of the M88kInstPrinter:: getMachineOpValue() method. Note that we specified the name of this function in the DecoderMethod field in the M88kRegisterOperand class:
static const uint16_t GPRDecoderTable[] = {
M88k::R0, M88k::R1, M88k::R2, M88k::R3,
// …
};
static DecodeStatus
decodeGPRRegisterClass(MCInst &Inst, uint64_t RegNo,
uint64_t Address,
const void *Decoder) {
if (RegNo > 31)
return MCDisassembler::Fail;
unsigned Register = GPRDecoderTable[RegNo];
Inst.addOperand(MCOperand::createReg(Register));
return MCDisassembler::Success;
}
- Then we include the generated disassembler tables:
include “M88kGenDisassemblerTables.inc”
- And finally, we decode the instruction. For this, we need to take the next four bytes of the Bytes array, create the instruction encoding from them, and call the decodeInstruction() generated function:
DecodeStatus M88kDisassembler::getInstruction(
MCInst &MI, uint64_t &Size, ArrayRef Bytes,
uint64_t Address, raw_ostream &CS) const {
if (Bytes.size() < 4) {
Size = 0;
return MCDisassembler::Fail;
}
Size = 4;
uint32_t Inst = 0;
for (uint32_t I = 0; I < Size; ++I)
Inst = (Inst << 8) | Bytes[I];
if (decodeInstruction(DecoderTableM88k32, MI, Inst,
Address, this, STI) !=
MCDisassembler::Success) {
return MCDisassembler::Fail;
}
return MCDisassembler::Success;
}
That is all that needs to be done for the disassembler. After compiling LLVM, you can test the functionality again with the llvm-mc tool:
$ echo “0xf4,0x22,0x40,0x03” | \
bin/llvm-mc –triple m88k-openbsd –disassemble
.text
and %r1, %r2, %r3
Moreover, we can now use the llvm-objdump tool to disassemble ELF files. However, for it to be really useful, we would need to add all instructions to the target description.
Summary
In this chapter, you learned how to create a LLVM target description, and you developed a simple backend target that supports the assembling and disassembling of instructions for LLVM. You first collected the required documentation and made LLVM aware of the new architecture by enhancing the Triple class. The documentation also includes the relocation definition for the ELF file format, and you added the support for them to LLVM.
You then learned about the register definition and the instruction definition in the target description and used the generated C++ source code to implement an instruction assembler and disassembler. In the next chapter, we will add code generation to the backend.