Doesn’t “add more cores” face the same physical limitations as “make the CPU faster”

cpucpu-architecture

In 2014, I hear a lot of programming languages touted for their concurrency
features. Concurrency is said to be crucial for performance gains.

In making this statement, many people point back to a 2005 article called The
Free Lunch Is Over: A Fundamental Turn Toward Concurrency in
Software
. The basic
argument is that it's getting harder to increase the clock speed of
processors, but we can still put more cores on a chip, and that to get
performance gains, software will need to be written to take advantage of
multiple cores.

Some key quotes:

We're used to seeing 500MHz CPUs give way to 1GHz CPUs give way to
2GHz CPUs, and so on. Today we're in the 3GHz range on mainstream
computers.

The key question is: When will it end? After all, Moore's Law predicts
exponential growth, and clearly exponential growth can't continue
forever before we reach hard physical limits; light isn't getting any
faster. The growth must eventually slow down and even end.

… It has become harder and harder to exploit higher clock speeds due
to not just one but several physical issues, notably heat (too much of
it and too hard to dissipate), power consumption (too high), and
current leakage problems.

… Intel's and most processor vendors' future lies elsewhere as chip
companies aggressively pursue the same new multicore directions.

…Multicore is about running two or more actual CPUs on one chip.

This article's predictions seem to have held up, but I don't understand why. I
have only very vague ideas about how hardware works.

My oversimplified view is "it's getting harder to pack more processing power
into the same space" (because of issues with heat, power consumption, etc). I
would expect the conclusion to be "therefore, we'll have to have bigger
computers or run our programs on multiple computers." (And indeed, distributed
cloud computing is a thing we're hearing more about.)

But part of the solution seems to be multi-core architectures. Unless computers
grow in size (which they haven't), this just seems to be another way of saying
"pack more pocessing power into the same space".

Why doesn't "add more cores" face the same physical limitations as "make the
CPU faster"?

Please explain in the simplest terms you can. 🙂

Best Answer

Summary

  • Economics. It's cheaper and easier to design a CPU that has more cores than a higher clock speed, because:

  • Significant increase in power usage. CPU power consumption increases rapidly as you increase the clock speed - you can double the number of cores operating at a lower speed in the thermal space it takes to increase the clock speed by 25%. Quadruple for 50%.

  • There's other ways to increase sequential processing speed, and CPU manufacturers make good use of those.


I'm going to be drawing heavily on the excellent answers at this question on one of our sister SE sites. So go upvote them!


Clock speed limitations

There are a few known physical limitations to clock speed:

  • Transmission time

    The time it takes for an electrical signal to traverse a circuit is limited by the speed of light. This is a hard limit, and there is no known way around it1. At gigahertz-clocks, we are approaching this limit.

    However, we are not there yet. 1 GHz means one nanosecond per clock tick. In that time, light can travel 30cm. At 10 GHz, light can travel 3cm. A single CPU core is about 5mm wide, so we will run into these issues somewhere past 10 GHz.2

  • Switching delay

    It's not enough to merely consider the time it takes for a signal to travel from one end to another. We also need to consider the time it takes for a logic gate within the CPU to switch from one state to another! As we increase clock speed, this can become an issue.

    Unfortunately, I'm not sure about the specifics, and cannot provide any numbers.

    Apparently, pumping more power into it can speed up switching, but this leads to both power consumption and heat dissipation issues. Also, more power means you need bulkier conduits capable of handling it without damage.

  • Heat dissipation/power consumption

    This is the big one. Quoting from fuzzyhair2's answer:

    Recent processors are manufactured using CMOS technology. Every time there is a clock cycle, power is dissipated. Therefore, higher processor speeds means more heat dissipation.

    There's some lovely measurements at this AnandTech forum thread, and they even derived a formula for the power consumption (which goes hand in hand with heat generated):

    Formula
    Credit to Idontcare

    We can visualise this in the following graph:

    Graph
    Credit to Idontcare

    As you can see, power consumption (and heat generated) rises extremely rapidly as the clock speed is increased past a certain point. This makes it impractical to boundlessly increase clock speed.

    The reason for the rapid increase in power usage is probably related to the switching delay - it's not enough to simply increase power proportional to the clock rate; the voltage must also be increased to maintain stability at higher clocks. This may not be completely correct; feel free to point out corrections in a comment, or make an edit to this answer.


More cores?

So why more cores? Well, I can't answer that definitively. You'd have to ask the folks at Intel and AMD. But you can see above that, with modern CPUs, at some point it becomes impractical to increase clock speed.

Yes, multicore also increases power required, and heat dissipation. But it neatly avoids the transmission time and switching delay issues. And, as you can see from the graph, you can easily double the number of cores in a modern CPU with the same thermal overhead as a 25% increase in clock speed.

Some people have done it - the current overclocking world record is just shy of 9 GHz. But it is a significant engineering challenge to do so while keeping power consumption within acceptable bounds. The designers at some point decided that adding more cores to perform more work in parallel would provide a more effective boost to performance in most cases.

That's where the economics come in - it was likely cheaper (less design time, less complicated to manufacture) to go the multicore route. And it's easy to market - who doesn't love the brand new octa-core chip? (Of course, we know that multicore is pretty useless when the software doesn't make use of it...)

There is a downside to multicore: you need more physical space to put the extra core. However, CPU process sizes constantly shrink a lot, so there's plenty of space to put two copies of a previous design - the real tradeoff is not being able to create larger, more-complex, single cores. Then again, increasing core complexity is a bad thing from a design standpoint - more complexity = more mistakes/bugs and manufacturing errors. We seem to have found a happy medium with efficient cores that are simple enough to not take too much space.

We've already hit a limit with the number of cores we can fit on a single die at current process sizes. We might hit a limit of how far we can shrink things soon. So, what's next? Do we need more? That's difficult to answer, unfortunately. Anyone here a clairvoyant?


Other ways to improve performance

So, we can't increase the clock speed. And more cores have an additional disadvantage - namely, they only help when the software running on them can make use of them.

So, what else can we do? How are modern CPUs so much faster than older ones at the same clock speed?

Clock speed is really only a very rough approximation of the internal workings of a CPU. Not all components of a CPU work at that speed - some might operate once every two ticks, etc..

What's more significant is the number of instructions you can execute per unit of time. This is a far better measure of just how much a single CPU core can accomplish. Some instructions; some will take one clock cycle, some will take three. Division, for example, is considerably slower than addition.

So, we could make a CPU perform better by increasing the number of instructions it can execute per second. How? Well, you could make an instruction more efficient - maybe division now takes only two cycles. Then there's instruction pipelining. By breaking each instruction into multiple stages, it's possible to execute instructions "in parallel" - but each instruction still has a well-defined, sequential, order respective to the instructions before and after it, so it doesn't require software support like multicore does.

There is another way: more specialised instructions. We've seen things like SSE, which provide instructions to process large amounts of data at one time. There are new instruction sets constantly being introduced with similar goals. These, again, require software support and increase complexity of the hardware, but they provide a nice performance boost. Recently, there was AES-NI, which provides hardware-accelerated AES encryption and decryption, far faster than a bunch of arithmetic implemented in software.


1 Not without getting quite deep into theoretical quantum physics, anyway.

2 It might actually be lower, since electrical field propagation isn't quite as fast as the speed of light in a vacuum. Also, that's just for straight-line distance - it's likely that there's at least one path that's considerably longer than a straight line.