But other manufacturers were moving fast. Arch-rival AMD was already developing its own 386 CPUs and only Intel’s litigation was delaying their release. Cyrix was already producing Intel-compatible maths co-processors and was threatening to move into CPUs. Intel needed an awesome new product.
On the other hand, there was a lot of conflict within Intel – and within the computing community at large – over the future direction of processor architecture. Many felt that CISC (Complex Instruction Set Computer) architecture, as used in the x86 line, was a technological cul-de-sac; that performance would flatten out within a few years as Moore’s law met fundamental barriers of computing.
They saw RISC (Reduced Instruction Set Computer) architecture as the future, using a smaller number of more versatile instructions and optimising the hell out of the architecture to drive performance. While one team at Intel worked on a successor to the 386, another was working on a new RISC processor which eventually became the i860. You probably haven’t heard of the i860, which tells you a lot about how this situation played out.
In theory, the i860 should have trumped any 386 successor, but in 1985 Intel’s CEO, Andy Grove, put John Crawford and hotshot architect Pat Gelsinger in charge of the design. Crawford and Gelsinger had already worked together on the 386 and shared a strong belief in the potential of the x86 and CISC architecture. Both felt that, while RISC had its advantages, a redesigned x86 chip could keep up.
What’s more, it could do it without forcing big software publishers to redevelop their applications, rebuild operating systems and optimise compilers. When you threw more transistors at the problem and increased their frequency, there was no reason why a CISC chip couldn’t compete with a RISC CPU. Apply Moore’s Law and keep increasing speeds, and a CISC chip might even crush it.
Optimise the pipelines!
Gelsinger and Crawford focused on delivering a processor that was fully 386-compatible and would build on the existing 32-bit architecture but would give you a massive increase in performance – at least double, clock for clock. They took inspiration from what was going on with the new RISC CPUs, paying particular attention to how instructions were loaded, organised, decoded and executed on the CPU.
The big innovation was to combine a tighter, more streamlined pipeline with an integrated L1 cache; a first in a mainstream CPU. With 8KB of high-speed SRAM as a store for recently used instructions and data on the same silicon, the instruction pipeline could be fed with a consistent flow, enabling it to execute the simplest and most commonly-used instructions at a sustained rate of one per clock cycle –an achievement that RISC devotees believed was beyond a CISC processor.
The new pipeline had five stages, though the first – the Fetch stage – wasn’t strictly necessary for each instruction, as the CPU could fetch about five instructions with every 16-byte access to the cache. Once fetched, instructions went through two decoding stages, where they were organised and fed into the execution units. Here they were executed, and the results written back to registers or memory in a final write back stage.
The cache minimised any delay in loading data and instructions and did such an effective job of caching data and instructions that the processor only had to go to system memory on roughly 5 to 10 per cent of memory reads. What’s more, many 486 motherboards incorporated a secondary cache with 16KB or more of high-speed RAM, reducing latency even further. Meanwhile the two decoder stages enabled those instructions to be pipelined and processed more efficiently – with five instructions running through the pipeline, one would normally be processed with every clock cycle.
The result was a spectacular improvement in performance. On integer instructions – very much the meat and potatoes of computing at the time – the 486 was at least twice as fast as a 386 running at the same clock speed, and sometimes 2.5 times as fast. This meant the CISC-based 486 could hit similar levels of performance to the RISC-based i860, while still being compatible with all the existing x86 software. There was no need to rebuild or recompile – code developed for the 286 and 386 just worked.
At this point floating point instructions weren’t so commonly used, but here the news was just as good. Previous Intel processors had worked with optional, discrete maths co-processors, which handled all the floating-point logic. These were expensive and not popular outside of business, as only a few major business applications, such as dBase, Borland Quattro Pro and Lotus 1-2-3, actually used a Floating-Point Unit (FPU). The 486-DX, however, integrated one directly onto the processor die, connected to the CPU by its own dedicated local bus.
This meant there was less overhead in shifting data between CPU and FPU and this, combined with other optimisations, resulted in a significant improvement in floating point performance. Fast forward a few years, and Quake would require a CPU with a floating point unit, with the system requirements citing a 486-DX4 as the minimum. Today, it’s impossible to imagine a CPU without an FPU, and that’s thanks to the mighty 486.
Beyond this, differences from the 386 were relatively small. The 486 had a few extra ‘atomic’ instructions that sped up some basic operations, but nothing compared to the instructions added with the 80286 or 386. The 486 also didn’t mess with the 386’s memory model; it could still address 4GB of RAM across a 32-bit bus, with a protected mode that presented both real and virtual memory as one big pool. However, its improved Memory Management Unit performance meant it was much more efficient at shifting data between the system RAM, the CPU and the cache.
Double the clocks
Yet there was one final architectural change that was to have a major impact, even on the PCs that we use today. Intel CPUs from the 8086 through to the first-generation 486 ran at the same frequency as the external Bus that connected all the core components together. This meant that the initial 486-DX processors, introduced in 1989-1990, ran at the same 20, 25 and 33MHz speeds as the I/O bus. Intel pushed speeds higher, releasing a 50MHz 486-DX, but the 50MHz bus speed began to cause problems for components elsewhere on the bus.
Luckily, the 486 design team had an ace to play: it decoupled the CPU clock speed from the motherboard clock speed and enabled the CPU to run at double the system clock. This fired-up the 486-DX2, launched in 1992, to run at internal speeds of 40MHz, 50MHz and even a staggering 66MHz, making the 66MHz 486-DX2 the RTX 3080 of its day in terms of its impact on gaming performance.
The 486-DX4, introduced two years later, went even further, tripling the bus speed to hit 75MHz and 100MHz; a staggering level of performance that trashed the available RISC competition. The team’s confidence in the x86 architecture no longer looked misplaced.
So, the 486 launched with an undeniable advantage in performance in a market where – thanks to Intel’s ace legal department –other x86 chip vendors had practically nothing. There was just one problem. While Intel had moved production down to a 1-micron process, it still had over 1.2 million transistors – a big step from the 275,00 in the original 1.5-micron 386. This made it a comparatively big chip and, partly thanks to its $250 million US R&D costs, also an expensive one.
At launch, the 33MHz 486DX alone cost around $950 US (nearly $1,900 in today’s money), which was roughly three times the cost of the equivalent (and still pretty speedy) 386. A 486 PC cost users somewhere north of £2,000 (roughly £4,500 today). Intel’s response – you guessed it – was to put out a cut-down, cost-conscious alternative, and 1991’s 486-SX wasn’t actually such a bad deal.
When Intel tried the same trick with the 386, it released a hobbled version with a 16-bit data bus and slower clock speeds, but the 486-SX was basically a 486-DX with the FPU disabled. At the time, with so little software that supported the FPU, this wasn’t much of an issue, and by the time the 486-SX was released it only cost around $250 to $300 US.
The 486 effect
The power of the 486 was transformative at a time when the CPU was the biggest star of the PC show. Sure, it was supported by a platform where VGA and SVGA graphics cards were growing more powerful, and where standardisation around the VESA local bus and, later, PCI standards was opening up the PC for more powerful add-on cards. However, the 486’s advances in integer and floating-point performance arrived just at a point where advances in gaming graphics needed them most.
In the early 1990s, as prices dropped to more affordable levels, the 486 hit its peak. Just check out the games that emerged. Ultima Underworld and its sequel, Strike Commander, Wing Commander III, X-Wing, Ultima VIII: Pagan, IndyCar Racing and Alone in the Dark all launched between March 1993 and December 1994, and with their texture-mapped, gouraud-shaded 3D graphics these PC showcases needed all the processing grunt that they could get.
A few simulations, such as Spectrum Holobyte’s Falcon 3 and Digital Image Design’s TFX, even used the FPU. And then, of course, came Doom; a game you could just about run on a 386 in a stamp-sized patch in the middle of the screen, but which looked amazing running full-screen at the full VGA resolution on a 66MHz 486-DX2.
If all those other games had pushed the PC as the high-end gaming platform of the early 1990s, Doom confirmed it. Even when the PlayStation and Saturn consoles launched a few years later with their fancy-pants, hardware-accelerated 3D tricks, they still struggled to run Doom classic smoothly in full-screen. The 66MHz 486-DX2 could do it on its own, simply using sheer number-crunching power. People saw it, liked it and pulled out their wallets. The idea of the PC as the real gaming powerhouse was born.