"libLISA is a tool that can fully automatically scan instruction space, discover instructions and synthesize their semantics. It produces machine-readable, CPU-specific x86-64 instruction semantics. It relies on as little human specification as possible: specifically, it does not rely on a handwritten (dis)assembler to dictate which instructions are executable on a given CPU, or what their operands are."
"Even though heavily researched, a full formal model of the x86-64 instruction set is still not available. This is caused by the sheer complexity of the x86-64 architecture: the informal specification found in Intel manuals is roughly 4700 pages, and even these are known to be not trustworthy."
"libLISA aims to solve this problem by using a CPU as the ground truth, and deriving semantics by observing instruction execution."
"We analyzed five different architectures: AMD 3900X, AMD 7700X, Intel i9-13900 (p), Intel i9-13900 (e) and Intel Xeon Silver 4110. For each architecture, we generated around 120k encodings."
Ok, so when I first learned assembly language (which was actually Z80, not x86, and after that I learned the 6502 assembly language, before learning any x86 or MIPS, which I still don't know all that well...), I learned if you use any bit patterns as instructions that are not defined as instructions in the documentation, "here be dragons!" The computer will do something but who knows what. The computer will do something because every bit pattern given to it as an instruction will be interpreted as an instruction by the silicon, but the silicon is only designed to do the right thing for the official documented instructions, and it's behavior for every other bit patters is just whatever it happens to do. It won't return an error message saying, hey, you can't do that -- in fact, it can't return an error messages because assembly language is way below the level of abstraction where there's such a thing as an "error message".
You might wonder why, then, anyone would care about these "undocumented" instructions and make any effort to document them? Why not just ignore them? Just make sure you write your programs, whether written by hand in assembly language or output from a compiler, such that they avoid the "here be dragons" instructions.
The reason people can't do that is because hackers will try to get binaries on your machine that use undocumented instructions. They, the hackers, taken the trouble to figure out what some of the undocumented instructions do, and if your reverse engineering tools don't know what those instructions do, then you can't reverse engineer code from hackers.
So it turns out that a complete mapping of bit patterns to the instructions they carry out is desired. It's essential to fighting hackers and ensuring good computer security.
What's produced by this project, assuming their results are as advertised, isn't a complete complete mapping, because that's really not possible -- there's just too many possible input bit patterns to test all of them -- but it's a complete mapping of certain groupings that are likely to produce meaningful instructions. They call this "mapping from encodings to semantics".