The Core Microarchitecture
Background
The Core 2 and Xeon 5100 families of processors belong to the Core microarchitecture. The previous generation Intel desktop chip belongs to the NetBurst microarchitecture, while previous mobile chips belong to the mobile microarchitecture of the Pentium M.
Architectures and Microarchitectures
Though the terms are often used interchangably, there’s nevertheless a clear distinction between a processor’s architecture and its microarchitecture.Intel has gotten alot of mileage out of the IA-32 architecture, much more so than the IA-64 and IXA architectures. IA-32 stands for Intel Architecture, 32-bit.
AMD, for that matter, too, has gotten alot out of IA-32, what with the success of processors like the Athlon 64, A64-X2, A64-FX, and Opteron.
IA-32, or Intel Architecture 32-bit, encompasses several generations of microarchitecture designs, that stretch back all the way to the microarchitectures of the Pentium and its successors, up through the NetBurst and the Core microarchitectures of today.
A microarchitecture is an “implementation of processor architecture in silicon“.
A processor, in turn, is an implementation of a microarchitecture.
It is important that microarchitectures adhere to the instruction set definition and other structures of the parent architecture, so that code written for one will run across all microarchitectures of the parent microarchitecture: for example, so that software will run on both AMD and Intel microprocessors, so that code written for the Pentium 4 will also run on Core 2 Duo.
Intellectual Predecessor
Core is a brand new microarchitecture. Hence, it’s a different implementation of an architecture. Hence it was redesigned mostly from the ground up.
Some of the microarchitectural innovations of Core are themselves new and cannot be traced back to any other microarchitecture before them.
Other elements of the Core microarchitecture go back to Netburst, which is the desktop and server microarchitecture of the Pentium 4 and of many Xeon processors.
That being said, the intellectual predecessor of the Core microarchitecture is not Netburst, but rather the mobile microarchitecture of the Pentium M and later of the Core Duo and Core Solo notebook processors. What the Core and Mobile microarchitectures share is a common set of ideas around which their designs were built.
You see, while Intel was busy driving power usage up on the desktop and server, it was also driving down energy and heat on its notebooks. The reasons for this were eminently practical: longer battery life, and the heat constraints of smaller form factors.
While the reasons for managing energy may have been practical, energy efficiency in itself wasn’t very glamorous. At the time, gigahertz was all the rage.
The mobile microarchitectures of the Pentium M was thus designed around ideas about how to minimize and make the most of power. The same might be said of the Core microarchitecture. Core “extends the energy efficient philosophy first delivered in Intel’s mobile microarchitecture found in the Intel Pentium M processor“.
The Energy Efficient Ideas
One of the mistakes that was made in the past was in equating performance with clock speed. “Contary to a popular misconception, it is not clock frequency (GHz) … that equates to performance“.
While clock speed does not equal performance, it is nevertheless very important to performance. Hence, the industry’s continued fascination with clock speed, justifiably so, even after the gigahertz era has ended.
For clock speed remains one of the two most important determiners of performance, the other being the number of instructions that a processor executes at a time, or instructions per clock, IPC, for short.
The Core microarchitecture contains many features–new and old–that increase the number of instructions that its processors are able to execute at a time: such as Wide Dynamic Execution, Advanced Digital Media Boost, Micro-Ops Fusion, and Macro-Ops Fusion.
Wide Dynamic Execution is the ability of the processor to execute up to four instructions at a time, or per clock cycle. Previous generation microarchitectures (NetBurst and Mobile) were only able to execute up to three at a time.
Advanced Digital Media Boost gives a processor the ability to execute a whole 128-bit SSE (Streaming SIMD Extensions) instruction at a time, or per single clock cycle. Before it took two cycles to execute a single 128-bit SSE instruction. Now it takes one.
Micro-Ops Fusion is the ability to combine together separate instructions after they have been decoded and execute them as one. Core inherited Micro-Ops Fusion from the previous generation mobile microarchitecture.
Macro-Ops Fusion is the ability to combine separate instructions together into one before the instructions are decoded, as opposed to after they are decoded, as with Micro-Ops Fusion.
Wide Dynamic Execution, Advanced Digital Media Boost, Micro-Ops Fusion, and Macro-Ops Fusion are all technologies that enable processors of the Core microarchitecture to carry out more instructions per clock.
Instructions Per Clock
So, performance is a function of clock speed on the one hand, and the number of instructions that a processor executes at a time–or instructions per clock (IPC)–on the other hand.However, it’s not even so simple as that.
For the number of instructions that a processor executes at a time is itself a function of (1) the design of the microarchitecture and (2) the particular application being run: “IPC is a function of processor microarchitecture and the specific application being executed“.
Here we see one potential reason for Intel’s big emphasis on platforms.
A platform is a combination of a processor, chipset, and other technologies, that all work together as a system. Centrino, for example, is a platform for notebooks, and its different components are allegedly optimized to work together to provide, among other things, longer battery life.
Here’s the thing. The technologies that go to make up a platform include software.
Hence, in theory, one could increase the number of instructions that a processor executes at a time, and thereby increase both performance and energy efficiency, by writing the software a certain way.
Perhaps software optimization is one of the ways that the various parts of Intel’s platforms are designed to work together as a system.
The Problem of Power Consumption
So, by now we should be accustomed to the idea that performance is a function of clock speed and the instructions per clock (IPC) cycle that a processor executes.
However, if performance is one side of a two-sided coin, then the other side of the coin is power consumption. What good is performance if power requirements go through the ceiling?
Increasing the Frequency
For decades, microprocessor makers were able to advance performance primarily by increasing the clock speed of CPUs. The problem was that, in order to increase performance by a little, CPU designers had to increase the power by a lot, relatively speaking.
The farsighted knew this could not go on indefinitely. In 1999, Fred Pollack, comparing power requirements and performance between processor generations, made the astute observation, “we are on the wrong side of a square law“.
However, if common sense told us that this process of increasing the frequency could not go on forever, no one seemed to know exactly when it would end either. In the event, it was the NetBurst microarchitecture, and the Pentium 4 processor, that discovered the limits of gigahertz, and the life of the NetBurst microarchitecture had to be cut short.
Today, clock speeds have progressed to the point that it will be difficult to increase them at a sustained rate in the future. Clock speeds may increase: however, not at the rate of the past. In place of gigahertz, we are to see the proliferation of CPU cores on single dies and within CPU packages.
Moving to Multiple Cores
Adding cores is actually just another way of increasing the instructions per clock (IPC) cycle that a processor is able to carry out. Remember, IPC is one of the two most important determiners of performance.
By doubling the number of cores on your CPU, in theory you double the number of instructions that a CPU is able to carry out at a time. That’s a 100% increase in performance. That’s not bad. That’s also theoretical, too.
The problem is that if you have an application that is single-theaded, and written to handle just one programming stream, all the cores in the world will do you no more good than a single core.
So it’s up to the software to extract that extra performance out of multiple computing cores. Remember, the number of instructions that a processor executes at a time (IPC) is itself a function of CPU microarchitecture design and software design.
In the past, software saw an immediate boost as the frequency of the CPU was pumped up. In the future, software shall have to be written to make the most of the hardware.
Another Formula
If performance is a function of clock speed and instructions per clock (IPC), then there’s also an equation for power consumption. It’s complicated.
Power consumption relates to dynamic capacitance, multiplied by the voltage squared, multiplied by the frequency. There, I told you it was complicated.
In order to deliver energy efficient CPUs, the designers of the Core microarchitecture had to take into account both formulas, both the one for performance and the once for power consumption. They had to balance “IPC efficiency and dynamic capacitance with the required voltage and frequency to optimize for performance and power efficiency“.
Scalability
It is well known that processors of the Core microarchitecture are energy efficient and high performing. However, a third feature that is often overlooked is that Core is supposed to be scalable as well. It is “low power, high-performing, and scaleable“.
Basically this means that’s there’s lots of headroom for performance improvement.
It is to be a foundation for more to come, not just the present generation of processors.
Core is optimized for multicore. Dual-core is just the beginning. After that shall come quad-core computers. With any luck, the Core microarchitecture will scale well to quad-core and beyond.



























