[AskAboutComputers.com] AskAboutComputers.com
Keeping up with Technology,
so you do not have to

Ask About Computers

Itanium 2: The Full Monte-cito

The High End

Intel recently released the Itanium 2 9000 family of processors, formerly codenamed Montecito.

The Itanium 2 processor occupies a small but lucrative niche of the server space at the very top, along with the likes of IBM and Sun RISC (Reduced Instruction Set Computer) servers and mainframes.

Traditionally, there has been a sharp distinction between the high-end segment of business-critical computing, on the one hand, and mainstream computing, on the other. The mainstream makes up the large majority of servers. The high end is composed of mainframes, RISC servers, and Itanium.

front and back with blue

For many years the mainframe was considered the best thing for enterprise applications. If the enterprise were to grow suddenly and sharply, you could keep up with the growth primarily by adding CPUs. You scale up by adding processors, you scale out by adding servers. For scaling up, "the mainframe was the gold standard for many years" (p. 3).

Then the RISC systems became more popular.

Over 10 years ago, Intel and Hewlett-Packard began work on what was to become the Itanium processor. The Itanium 2 platform has evolved into an industry-standard alternative for the high end. Intel supplies the central processors. Other companies supply the rest.

Today Intel claims that Itanium 2 has reached critical mass, meaning that growth around the processor is fixing to explode. The IT industry remains skeptical. However, it seems that the platform is growing.

At stake in Itanium 2's progress is roughly $25 billion each year. The mainstream server market may be high volume, but it is not high dollars compared to the other segment. The high end makes up only about 10% of servers. Yet this small percent makes up about half of all revenue!

Intel wants a piece of this. So do a whole host of other companies. This is why the Itanium phenomenon is not going away any time soon. "This accounts for the broad vendor support for Itanium 2-based systems" (pp. 7-8).

Case in point, the Itanium Solutions Alliance recently decided to invest $10 billion in the platform over the coming years.

One Stop Shop vs. Best-of-breed

Never in computing can one say that one size fits all. In spite of the potential advantages that Itanium 2 brings to the table, there may be times when a business-critical system provided by IBM or Sun is best.

Sun, for example, designs their machines to work as a system. The software and the hardware are optimized for each other. It's sort of like Intel's Centrino, in which hardware and software are optimized to provide performance and long battery life for notebooks, except that in Sun's case it's for entire systems.

front side

The IBM and Sun approach is just another spin on the one stop shop preference. One reason why many managers prefer one stop shop is that there's less finger-pointing when something goes wrong. You can only blame yourself, if you are the only vendor.

The Itanium 2 platform is more akin to a best-of-breed approach, wherein one selects the components one wants.

The one stop shop approach has dominated the high end for so long that it is perhaps only natural for a reaction to set in against this approach in this space. Don't be surprised if Itanium 2 continues to grow in the coming years.

Choice

Intel claims that Itanium 2 brings standards-based computing to the high-end server segment. The Itanium 2 systems enable migration "off of proprietary servers and mainframes ... to a standards-based architecture".

Intel's critics, on the other hand, say that Intel overemphasizes the industry-standard nature of Itanium 2, pointing out that only one company makes the CPU and that other companies are not invited to the party in designing it.

Whatever your feelings about this topic, it is difficult to deny that the platform offers greater choice.

Hardware

Take hardware, for instance.

Intel may make the CPUs, but other companies make the servers.

back

You are not limited to one vendor.

You can get competing bids.

If Dell stops making Itanium 2 systems, you can always go to someone else. This, in fact, actually happened. Dell used to make these systems and stopped. It wasn't the end of the world.

However, God forbid that IBM or Sun should stop making their servers.

Operating Systems

You can't say that IBM and Sun do not support operating systems other than their own on their high-end servers. They do. Linux will run on high-end IBM and Sun hardware, and that's just one example.

However, the large majority of high-end IBM and Sun installations use their own operating systems. For example, in spite of the small market share of Itanium 2, "revenue for Itanium 2-based servers running Linux exceeds that of Power-based servers running Linux by a factor of 17" (p. 4).

About 95% of all high-end IBM server deployments use "IBM operating systems" (p.3). "Almost all of the SPARC-based systems being sold today run on Solaris" (p. 4).

Intel, on the other hand, is not the primary provider of OS software for Itanium 2. It's not even a provider.

Linux makes up most installations of Itanium 2. Followed by Unix. Followed by Windows. However, even Windows has a hefty share of the Itanium 2 market.

All in all, there are currently over 10 operating systems that will run on Itanium 2. "Itanium 2-based servers are the only 64-bit servers on the market that support 10 different operating systems" (p. 3).

Itanium 2 servers even support several mainframe-class operating systems. "True mainframe-class systems are also available" (p. 8).

Applications

Thousands of applications exist for IBM mainframes and RISC machines. Thousands, too, for Sun SPARC-based servers. IBM and Sun have been around for a long time.

By contrast, Itanium has been around a much shorter period of time and also has thousands of applications. At the time of this writing, over 8,000 exist, which "more than doubles the number of applications available a year ago".

Multicore

The line between the high-end server segment and mainstream computing has been blurred somewhat by the introduction of 64-bit extensions, so-called AMD64 and Intel 64, or EM64T, technologies. The high end is almost pure 64-bit. Mainstream computing used to be 32-bit or less. Now the mainstream is 64-bit, too.

The introduction of 64-bit to mainstream computing is a good example of the cross-pollination that can occur between the high end and the lower end.

front and back

Technologies from the high end tend to trickle down to the mainstream. Another example is Native Command Queuing (NCQ), which is today found on desktop hard drives. This technology used to be available only on higher-end systems.

However, if technologies trickle down from the high to the low end, it is also true that Itanium 2 borrows technologies from the mainstream--technologies like virtualization, hyper-threading, power management techniques that result in less power consumption, dual-core and multicore.

Of these technologies, by far the most important at the present time is multicore.

Frequency Ramping

Performance in the previous era of computing was primarily driven by increasing the CPU clock. "Frequency ramping became the primary engine behind processor performance gains" (p. 3).

The problem was that, as the CPU clock was increased, power consumption and the heat that it generated became more and more of a problem. "Power consumption and heat generation rise exponentially with clock frequency" (p. 3).

The problem became so bad that the rate of increasing the frequency of the CPU had to stop. Clock frequency shall continue to increase. But it won't be the rate of the past. "Frequency ramping will continue into the future, but at a much slower rate" (p. 3).

A New Strategy

Microarchitectural innovations can improve performance. Within the 32-bit Intel Architecture (IA-32), however, microarchitectural innovations, and other techniques that have been used in the past, in and of themselves are not enough to keep pace with Moore's law of doubling processor power every 18 months.

It is for this reason that "a new strategy is needed" (p. 4).

Transistor Size

While the rate of increasing the frequency can no longer be maintained, there is one thing that can still be done. The size of transistors continues to shrink. State of the art is currently 65 nanometers. 45nm is right around the corner. And there's no good reason why the shrinking process cannot continue beyond 45nm.

Shrinking the CPU die used to be one of the primary means of increasing the clock frequency. Today, however, the frequency can no longer be increased substantially, not even by shrinking the size of the transistors. Yet the die continues to shrink. So what is one to do with all those extra transistors that can be placed on a CPU die?

Additional functions can be added, technologies like virtualization, 64-bit, advanced management technology, maybe.

However, shrinking the size of the transistors also frees up room to double the number of computing cores on CPUs. CPUs currently come with one or two cores. Four cores are within sight by the end of the year. And this is just the beginning.

We thus stand on the cusp of a new era in computing, one in which performance shall be driven by the number of cores on a die or in a package, rather than by clock frequency. It is the age of multicore.

Doubling the cores is a great way to increase performance.

By doubling the cores, you potentially double the number of things that a CPU is able to do at any given time. By doubling the cores, you can therefore theoretically double performance. That's a potential performance increase of a 100%. No amount of microarchitectural innovations, within the 32-bit Intel Architecture (IA-32) at any rate, can yield anywhere near this much added performance. It is for this reason that: "Multi-core processors are the wave of the future" (p. 4).

It is for this reason, too, that CPU makers are racing to put more cores on CPUs.

Software

A problem arises if your application is capable of processing just one programming thread at a time. In this case, your application is single-threaded, in which case doubling your cores yields no benefit whatsoever. Zero. Nada. Performance is flat. This is because only one core can be used at a time, if the application is single-threaded.

However, if the work of a program can be divied up among more than 1 core, your application is then multithreaded, and your application should show some performance improvement on multiple cores.

32-bit operating systems and multithreaded applications saw a significant boost in performance in going from single-core to dual-core CPUs. It is more than a little disconcerting, however, that most of today's multithreaded applications only show very little improvment in going from two cores to four cores.

The applications may be bottlenecked by the graphics subsystem, the memory subsystem, or the hard drives. These potential bottlenecks shall have to be addressed. However, these bottlenecks on the PC are nothing new.

Another possibility is that the software needs to take better advantage of the multiple cores.

In the previous era of frequency ramping, increasing the clock frequency yielded an automatic boost to performance. In the current era of multicore, the software will have to get involved. It should be an exciting opportunity for software engineers, if monopolistic tendencies in the software industry don't impede progress.

Beyond Multicore

Just as the era of increasing the frequency came to an end, so the era of multicore will come to an end eventually. We don't know when, however.

The Itanium Solutions Alliance thinks that this process of shrinking and adding cores shall continue for some time to come. Multi-core holds "the promise of ongoing performance scaling through this decade and beyond" (p. 5).

The number of cores in a CPU should continue to increase for as long as die sizes shrink. Transistors, however, cannot shrink forever. At some point the current rate of shrinking has to end, and the CPU industry will have hit its second wall. It won't be possible to double the number of cores at the same rate of the past.

Maybe this won't happen anytime soon. Maybe the era of multicore has a long life in front of it. However, maybe the end will come sooner than everyone thinks. The gigahertz era came to an untimely end, and no one seemed to see the end coming.

Regardless of when the end comes, at that time the computing industry will have to turn to alternative methods of increasing performance if it is to keep up the rate of the past.

At that time, the current 32-bit Intel Architecture (IA-32) should have run its course. There's only so much additional performance that can be squeezed out of IA-32 through microarchitectural innovations and other techniques.

CPU die

The EPIC architecture of Itanium and Itanium 2, however, was designed with limitations of frequency speed and transistor size in mind.

Back when Itanium was under development, CPU architects could see that transistors could not shrink forever, and that the days of increasing the CPU frequency were numbered, too.

The Itanium family therefore strives to increase performance by increasing the number of things that a processor does at a time. Sound familiar?

The Itanium 2 microarchitecture is in many ways the antithesis of the NetBurst microarchitecture. NetBurst was designed primarily to increase performance by increasing the CPU clock.

Itanium 2, however, is engineered to increase performance through increasing the number of instructions that can be carried out during a given clock cycle, that is, by increasing the instructions per clock (IPC).

Compare Itanium 2 with Intel's latest 32-bit design, the Core microarchitecture, which is also optimized for greater instructions per clock.

One of the advantages that the Core microarchitecture has is it's ability to execute up to 4 instructions for every clock cycle. That's an improvment over the Pentium D and Core Duo, which were only able to carry out three instructions at a time.

Itanium 2, however, can execute up to 6 instructions for every clock cycle.

And that's just the beginning. The architecture was designed to scale using microarchitectural innovations. To this end, the Itanium 2 CPU contains 128 general purpose registers

Ever wonder why the Itanium CPUs are clocked so low? Clock frequency is less important to Itanium 2 in part because it can perform more work for each clock cycle.

Maybe when the size of transistors has become so small that they can shrink no more, and the number of cores can no longer be doubled, and the frequency can be pushed no further, maybe then the IT industry will turn to the Itanium design to continue its onward march in performance.

Intel would certainly like that.