the Andrew Bailey

Registers of the Itanium CPU Architecture

In this second installment of the CPU register series, I take a look at the Itanium CPUs. Intel and HP designed Itanium throughout the 1990s. Intel hoped that it would be the successor to the old x86 architecture, with a bonus of not being legally obliged to share these secrets with anyone else (AMD specifically). When it went on the market in 2001, its performance was not competitive with x86, and was super expensive. While Itanium had x86 emulation, it was not fast enough to be useful. At the time, AMD was busy at work expanding x86 to 64-bit, which proved to be the winning strategy.

Hopefully this will be less complicated than x86, but the way Itanium uses its registers is vastly different. Itanium is a very long instruction word (VLIW) design, but Intel likes to call it explicitly parallel instruction computing (EPIC). This means that instructions are bigger than in other CPU designs, but the instructions are large enough to engage several execution units and registers. Compare it to x86's complex instruction set computing (CISC) design, which uses smaller instructions with fewer registers, but can be tricked out with out-of-order execution, register renaming, and branch prediction. Itanium relies on compile time optimizations, rather than runtime optimizations.

There are 128 65-bit general purpose (integer) registers GR0-GR127, 128 83-bit floating point registers FR0-FR127, 64 one bit predicate registers, eight 64-bit branch registers BR0-BR7, a 64-bit instruction pointer IP, and a 38-bit current frame marker CFM. The last bit of both the general purpose and floating point registers is a not-a-thing bit, which I can see as being useful for nulls. Applications can only access the last 96 general purpose registers (GR32-GR127), because the first 32 (GR0-GR31) are static registers. GR1 is called the global pointer, and GR12 is the stack pointer. GR0 is hardwired to 0, FR0 is 0.0, and FR1 is 1.0.

Itanium sections off regions of its massive register array and uses them as stack frames. The idea was to keep as much data in the registers as possible, thus enabling independent processing of large data sets. When things get crowded, these frames are SPILLed into memory and FILLed back when necessary. Depending on the app, I think this comes off as rather wasteful, because there could be lots of registers not doing anything, because they would contain data from 1 to N functions up the call stack, which some subroutine can't access or care about.

The good news is that Itaniums were never really popular, affordable, or that fast for most things. What little marketshare it has is being eroded away, but since it's found its way into things like mainframes and other super-reliable systems, these things aren't going to die out over night, or over a decade. At best, it's a novelty; at worst, it's a waste of money. It didn't get called the Itanic for nothing.

Posted under Programming.

You can't complain about this anymore. It's perfect!