The Rise and Fall of the Cray Supercomputer

While working at Control Data Corporation,   computer designer Seymour Cray was once  asked his five and one year goals. He said: > Five-year goal: Build the  biggest computer in the world. > One-year goal: One-fifth of the above. Seymour Cray literally only wanted  one thing. To take it to the limit. In today's video, we look back at a genius’s  lifelong quest to make the biggest supercomputers. ## Beginnings Seymour Cray Jr. was born in the  small town of Chippewa Falls,   Wisconsin in 1925, the son of a civil  engineer at the local power utility. The elder Cray fostered in his son a love for  science and engineering. The son adored chemistry,   did electrician work for his junior  prom, and showed a talent for radio work. After serving in the army as an electrician,   Cray went to university on the GI Bill  and graduated from the University of   Minnesota with a bachelors in electrical  engineering and a masters in applied math. Then in 1950, he joined a small company called   Engineering Research Associates  or ERA in St. Paul, Minnesota. ## ERA During World War II, the US Navy  employed a team of elite code-breakers. This division was known as the  Communications Supplementary   Activity - Washington or CSAW. And  in secret, they designed and built   powerful machines to break Axis naval  codes, mostly to find German submarines. At the War’s end, people realized that  the US Government still needed to break   codes - Soviet codes this time - but with budget  cuts they cannot afford to keep their employees. So the Navy worked with an investment  banker named John Parker to set up a   for-profit company that would hire and employ  the old team at higher salaries. This was ERA. ERA’s people worked largely in secret out of an   old glider factory. Their first products  were code-breaking machines like before. But the team quickly began moving towards  useful general-purpose computers - leveraging   emerging technologies like magnetic drum memory. One of the first such products  was the ERA 1101. The computer   was a commercialized version of a device  sold to one of the precursors of the NSA. You can say that Minnesota in the 50s was  Silicon Valley before Silicon Valley existed.   Thanks to the strong military presence,  it was America's premier computing center. ## Working With Cray Despite being just 25 years old, Cray  showed himself to be a talented worker. Cray was quiet, but had the unshakable  confidence to speak up when something   was wrong. He had incredible focus and rigid  discipline. And of course, he was brilliant,   with a gift for understanding  the world of binary numbers. Thus, Cray was challenged to produce the  control system for ERA's next computer - the   ERA 1103. The control system breaks down the  software's instructions into execution steps. Control system design required front-to-back  knowledge of the computer's guts because it   had to coordinate all those resources  to efficiently execute the program.   Cray did the work, and quickly rose  up the ranks to supervise a team. Like another legend of the times, An Wang of Wang  Labs, Seymour Cray's genius can make him sometimes   difficult to work for. Cray preferred to work  alone, spending late evenings in the workshop.   If someone wasn't doing it right, or was taking  too long, he reassigned them and did it himself. They called it "being scrayed". But despite being very quiet, Seymour treated  his colleagues well and was generally liked and   respected. Just a bit eccentric and enigmatic -  exactly as a genius computer designer should be. ## Rand In December 1951, president and  original investor John Parker   sold ERA to the computer company Remington Rand. Parker didn't tell the employees at ERA about this  - which in bird culture is considered a dick move. And it angered key workers like William Norris,   who joined early and rose  to become VP of Operations. Remington Rand was a typewriter and shaver  company. Why computers? Even more strangely,   Rand had already bought a  computer company earlier: The Eckert–Mauchly Computer Corporation. J. Presper Eckert and John Mauchly  will forever be known for making ENIAC,   the first programmable digital computer. And then EDVAC, pioneering the  stored program computer design   that today underpins the Von Neumann architecture. Along the way, Eckert also created the first  course on electronic digital computers - the Moore   School Lectures, which helped popularize the Von  Neumann architecture. No relation to Moore's Law. And then UNIVAC I, a computer that  garnered renown for predicting that   some dude named Dwight Eisenhower would  win the 1952 US presidential election. The world-famous Eckert and Mauchly had  serious computer credentials so the guys   at ERA accepted the arrangement.  Eckert–Mauchly would focus on the   business computer marketplace while  ERA focused on scientific computing. ## Tensions But tensions remained. Today, the computing needs of both  business and science can be handled   by a general purpose computer. But in those  days, the computers had to be more specific. Computers for business did simple  operations - adding, subtracting,   and multiplying and with only 2 decimal  points of accuracy. However this had to   be done at large scale, thousands of  rows going in and out of the computer. Scientific computing was different in  that it required complex calculations   with up to 20-30 decimal points  of accuracy. Such a computer might   munch on a problem for hours just  to produce a single line of output. The guys at Eckert–Mauchly in  Philadelphia looked down on the   Minnesota guys as mere "farmers" who did  not work on the state-of-the-art. And the   Minnesota guys saw the Philadelphians  as theoreticians who only cared about   making computers faster even if  it meant them breaking down a lot. Things got worse in 1955 when Rand merged with  the Sperry Corporation to become Sperry Rand.   The merger forced together the two formerly  independent units into one Univac division. The tensions really flared between ERA,  Eckert, and their new corporate overlords.   The aforementioned William Norris,  the division's new general manager,   said to the Philadelphia guys, "You people  run a laboratory and ERA runs a business". When Sperry took over, its top management  including president Harry Vickers thought   they were buying a market leader. That was  not the case. UNIVAC had the potential to   be what IBM eventually became, but  it needed the capital to get there. Not only in R&D to build the machines, but  also because computers were an equipment   leasing business back then.  It was very capital intensive. In 1957, enough was enough. Norris left  Sperry Rand to found a new company - Control   Data Corporation, or CDC. a new company  was a huge risk, but Norris figured that   if it did not work then he and his family  could go back to their farm in Nebraska. Norris invited a few of his ERA coworkers to come  with him. Cray was one of those who accepted,   having seen the writing on the  wall when he noticed an accounting   system categorize his project as  "999 Miscellaneous and Other". ## Control Data CDC started in July 1957 with its employees, a vague idea to make computers,  and some money from friends. No plant, no product, and little money.  To get the company off the ground,   CDC IPO'ed. Like literally, they stood  on the street and sold shares to ordinary   members of the public for a dollar each.  This is probably not possible today. The company got its first splash of publicity  from an unexpected spot. Sid Hartman,   a sports columnist at the Minneapolis Tribune  - and former part-time general manager of the   Minneapolis Lakers (what?!) - mentioned the  move at the end of one of his sports column. Many prominent local investors  declined an investment in CDC,   one saying that they "didn't have  a ghost of a chance" against IBM. Decliners included a guy named Warren Buffett.  Despite being William Norris's nephew by marriage,   he declined due to a lack of understanding.  Bet he's poor now and deeply regrets that. Years later, the $1 shares would be worth about  many times that. The early CDC IPO enriched the   company's 300 initial investors and created a  generation of new wealth in the Minnesota area. ## The 1604 Cray was the most technical person at the  new company, and persuaded his cofounders   to build scientific computers rather  than going to the commercial market. His reasoning was that their clientele -  universities and nuclear weapons research   labs - cared less about marketing and client  service. And they programmed their own software. What they wanted was compute. Serious  compute. Due to treaty obligations,   you can't just test-fire a nuclear weapon and even   discounting the environmental issues  cannot easily measure its workings. So to study a detonation, we needed  computers to simulate the bomb's   chain reactions - stepping through all  the equations every micro-second after   detonation. That means the biggest  and fastest computers on the market. Cray was convinced that he could build such  a computer at a relatively bearable cost   using transistors. He went down to the local  electronics shop and found that they were   selling reject bipolar transistors for radios for  cheaper than what you can get from the factory. These reject transistors sucked, outputting a weak   signal. So Cray paired them up in  what is called a Darlington pair,   with the second transistor amplifying the output  of the first. The experience taught him that,   with the right design, you can use substandard  components and still achieve the goal. Over a year into the venture and with  money running low, Bill Norris strikes   a deal with the US Navy for what would  be called the CDC 1604 computer in 1958. But now they have to build it at scale,  buying a factory and hiring more engineers.   Norris and other managers cut their salaries  in half to save money. Engineers resorted to   swiping transistor companies' free  sales samples for their computer. The CDC 1604 first hit the market in 1960,   carrying a price tag of $990,000 or about  $10.4 million today. At 0.2 megahertz,   it was the most powerful commercially-available  computer of the time. A supercomputer. ## Supercomputers A supercomputer is a bit of a squishy term. It is about pushing the envelope in computing,   bringing out a computer that leads  all others in its field. Control Data   was not alone in the market of producing  super-fast computers for niche customers. UNIVAC released the Livermore Automatic  Reaction Calculator LARC in 1960 - the same   year as the 1604's release. It helped  Edward Teller do simulations for the   hydrogen bomb and was the most powerful  computer in the world from 1960 to 1961. The LARC scared IBM so much  that they built the IBM 7030   STRETCH supercomputer - designed by the  legendary Gene Amdahl. The 7030 took   back the crown of the world's most powerful  supercomputer, and retained it until 1964. But the LARC and STRETCH  were basically made-to-order   products. UNIVAC only made 2 LARC units. Control Data turned the supercomputer into  a category - a commercially successful one,   at that. The 1604 sold to the University of  Illinois, Lockheed, the State of Israel, and more. The company began to turn a profit, challenging  the old computer giants like Rand and IBM. CDC   stock went from $1 to $9. Now Norris had to keep  the engineers from selling their stock too early. ## A New Hope After the 1604, Cray and CDC  debated about how to proceed: Follow up on the 1604 and finally attack the  lucrative business data processing market? This   would mean iterating on the 1604 architecture  - making smaller computers like the CDC-160A,   a very good and successful  control applications computer. But Cray only wanted to build  the fastest possible machine.   The scientific community had only started  to realize a new form of computational   modeling - Finite Element Analysis,  which I mentioned in a prior video. Finite Element Analysis involves splitting  something down into millions of simpler   elements and running simulations based on  how those elements might act. For instance,   breaking a car down into tiny shapes and using  that to predict how it might survive a car crash. With Finite Element Analysis, the more steps you  can break something down into, the better you can   model and predict complex systems like the weather  or nuclear explosions. This basically implied an   infinite need for compute. Cray wanted to  be the guy to feed that need for speed. He threatened to leave the company over this  issue, which caused Norris to eventually agree   on splitting the two teams. One CDC  team would work on an 1604 followup. Meanwhile, the 35-year old Cray and his team  were allowed to open their own lab in Cray's   hometown of Chippewa Falls, Wisconsin. This new  lab was just a brief stroll from his house. There   Cray and his team worked on a machine some 15  times faster than the 1604, named the 6600. ## The CDC 6600 When Cray and his team sat  down to make the CDC 6600,   they started off with something like the 1604. The 1604 was built with Ferrite magnetic cores for   main memory and magnetic tape for  secondary storage. For compute,   they had germanium transistors. Everything was  built inside air-cooled pluggable building blocks. But as they worked on the 6600, Cray  changed many things. Critically,   he sourced silicon planar transistors  from Fairchild. They switched far   faster than germanium transistors,  automatically granting a 5x speed boost. The rest of the 10-15x speed up goal though  had to come from somewhere else. Cray soured   on the building-block approach. Each block had  extensive back panel wiring, which not only   caused noise issues, but also limited input/output  and increased how long it took to transmit data. So Cray threw it all out and switched to  using denser, more complex custom modules   called "cordwood modules". The shorter wires  improved speed, but also made it difficult to   repair and necessitated the replacement  of air-cooling with freon gas cooling. Another concept they implemented  was parallelism. Every system has   to do housekeeping functions or the  such in addition to the main compute   task. Why should the "main processor" have  to do that? Offload it to something else. The 6600 contained 11 individual computers  that can execute programs separately from   each other - they only shared a central memory.  Ten of the computers handled secondary work like   peripherals, leaving the eleventh computer  free to do nothing but high speed math. Additionally, the 6600 lived and breathed  simplicity. A computer CPU uses something   called an instruction set architecture  or ISA to define its basic operations,   thus also defining how software can control it. The 6600 simplified its ISA, ditching everything  unrelated to scientific computing. Like for   instance, instructions for handling large amounts  of data, something more geared for commercial   users. This simplified instruction set allowed  the computer to "pipeline" tasks, breaking down   a bigger job to smaller ones that can be assigned  to peripheral computers to work on simultaneously. CDC delivered its first 6600  to Lawrence Livermore in 1964.   The machine's incredible speed - three  times faster than the 7030 - shocked IBM. Chairman Thomas J. Watson Jr.  wrote a scathing memo asking how   "34 people - including the janitor" beat the  biggest technology company in the world. The   answer of course was that IBM's architects could  not bear to sacrifice compatibility for speed. ## Discontent The splitting of the teams within  Control Data prevented a blowup,   but discontent continued to fester. Norris and other managers continued to build  up the business. CDC began making its own   peripherals and software to accompany the  main computer, building a services business. Control Data also purchased a consumer finance  company - Commercial Credit - intending to use   their $3.4 billion of working capital to fund its  computer leasing strategy. This strategy - which   seemed smart at the time - eventually backfired  when Commercial Credit ran into difficulties. Over a hundred CDC 6600s were sold  to big customers like the Atomic   Energy Commission. But each cost $8 million or  about $23 million today. As you might think,   it limited the market to about  50 total customers in the world. But Seymour Cray felt this was a feature,  not a bug. He loved knowing the first   names of each of his customers. Yet Control  Data's management was increasingly coming to   the belief that peripherals and services,  not hardware, were the company's future. Control Data followed up the CDC 6600 with  the 7600. The 7600 was hailed as the world's   fastest computer, five times faster than the  6600. But despite costing only twice as much   of its predecessor, it sold poorly in part  due to frequent breakdowns and a weak economy. Then after that, we had the 8600. This  supercomputer was made with regular   discrete transistors. But Cray wanted  a clock cycle time of 8 nanoseconds,   which meant every wire had to be shorter than 2.5  meters, squeezing those parts very close together. Things got so dense that Cray couldn’t figure out  how to sufficiently cool them. After many months,   he decided to throw everything out and start again  from scratch. It was his style - the "Cray way". But this was 1971, and Control Data was  in the midst of an expensive antitrust   lawsuit against IBM. Cash flow was running  low. Cray was asked to cut expenses by   10%. Unwilling to do that, he cut his own  salary to minimum wage, or $1.25 an hour. This did not solve the issue. In the  end, Norris told Cray that a redo like   before could not be done - they already  pre-sold two 8600 systems. And in 1972,   Cray decided to leave and start  his own shop - Cray Research. ## Cray Research & the Cray-1 Seymour Cray founded Cray Research with $2.5 million - 20% of which was his own  money - and a bunch of bank loans. The company's goal was to build the biggest  computer, one at a time like a master artisan.   It did not care for big revenues, nor did  it expect them. The focus was on research   rather than manufacture. Like Star Trek,  plumbing the outer limits of possibility. In a show of goodwill, Norris and Control  Data arranged a luncheon to say goodbye   and invested a quarter million dollars in the  new company. Norris called it "heart money". For his first computer, the Cray-1, Seymour Cray  wanted revolutionary performance. To get it,   he decided to turn to a new  concept: Vector processing. Most CPUs of the time used scalar  processing, meaning they process   single data items like integers or  floating point numbers one at a time. So imagine the job of adding 1 and 1. A  scalar CPU would load the first 1 into its   register from memory, load the second 1, add  them, and then store the result into memory. Count it up. This job used 4 instructions. So if we are summing up 2 sets of twenty numbers,   that is 80 instructions that  a scalar CPU has to handle. A vector processing machine  shortcuts that by processing   single-dimension arrays of data: Vectors. So if we have those two vectors  containing 20 numbers each,   loading the two vectors into  the register, adding them,   and storing the results vector into memory.  That only uses 4 instructions rather than 80. Control Data knew about vector processing  too. They had a small team working on a   vector computer called the STAR-100. But the  STAR was tremendously complicated and failed   to live up to its promises. Control Data shipped  it four years late and sold only three of them. Cray studied the STAR-100 and realized its  flaws. First, its scalar processing was slow,   bottlenecking the overall system performance. And second, the computer's vector processing  implementation had a hitch. Recall my example   from before. Before you can run the addition  operation on the two vectors, you first have   to load them both into the register. Same  with sending the results back to the memory. The problem was that this was taking too long.  Does it matter how much faster vector processing   is compared to scalar processing if  handling the vectors took forever? So Cray introduced "vector registers",   very fast intermediate memory systems that  worked like cache memory to improve speed. Seymour also decided to adopt integrated circuits  for the first time. This allowed for more density   and cut down on wiring, allowing the Cray-1  to be far smaller than its predecessors. By then, ICs were roughly about 14 years old and  quite mature, but it reflects Cray's approach of   choosing older technologies - "a decade behind" as  he liked to say - so that they are more reliable. But when it came to memory, Cray could  not compromise. He broke his principle   and bought bipolar semiconductor memory chips  to replace the old core memory. It cost less,   had more density, and ate less power. The computer's clock cycle of 12.5 nanoseconds  made it five times faster than the CDC 7600.   So every wire in the machine had  to be less than four feet long. And of course, you can't forget its iconic look.   A circular shape to accommodate  the new cooling scheme. But with   an added bit of flair to differentiate  from the boring gray boxes of the era. And it had cushions too. ## The Cray-1's Stir The Cray-1 made a huge stir upon its release in  1976 with its flashy look and world-beating speed. A hundred Cray-1s were sold to various  government and university lab customers   like the National Center for Atmospheric  Research and the Department of Defense. It generated 150% revenue growth  for Cray Research from 1978 to 1979,   with another 50% growth a year after that.  The orders came in so fast - one a month,   which is a lot for an $8 million  product - that a big backlog developed. IBM did not even try to compete.  Cray's former employer Control Data,   found itself thrown off its feet. They  tried - producing vector computers   like the Cyber-205 - but they had gotten  bloated and complicated, unable to keep up. The company suffered large financial losses and   eventually sold itself off  in pieces. By the mid-1980s,   CDC's most profitable business was Ticketron  - a rival to the widely despised Ticketmaster. ## The Fruits of Success As I said, Seymour Cray wanted  to build the fastest computer,   and to build it from a "clean piece of paper". Even as the Cray-1 was in the  late stages of development,   Seymour started to shift his gaze towards  a machine even more ambitious: The Cray-2,   with a clock speed some three to six  times faster than its predecessor. Such a machine had obstacles. With that clock  speed, no wire can be longer than 40 centimeters,   again bringing back the same heating  challenges Cray faced with the 8600. To his dismay, Seymour could not focus  on solving these problems because his   business needed him. To fund early development,   Cray Research IPO'ed its stock, which brought  on a whole new load of responsibilities. And a bit poetically, the company's success  caused new headaches. Since each Cray-1 was   hand-wired and custom-made like some  limited edition supercar - a process   that took a year - the company had no choice  but to staff up to deliver on its big backlog. From 1978 to 1980, the company grew from  300 to 500, rapid growth. At its peak,   Cray Research employed over 5,000 people in  the tiny town of Chippewa Falls, Wisconsin. ## The Same Dilemma Seymour originally pursued scientific  computing because users wrote their   own software. It let him just focus on hardware. But times had changed. Customers no longer  had the budget to rewrite their software   every time. They wanted portability - it is why  the Unix operating system got to be so popular. So Cray's customers were increasingly  interested in getting a better Cray-1   than a radically different Cray-2  which would require them to redo   all their software. It was the story of  Control Data and the 1604 over again! Eventually, Cray Research's management,  including CEO John Rollwagen, did the   dual approach once more. On one side,  they extended the Cray-1 line with the   1S - still a very powerful computer but not  radically different like a Cray-2 might be. Meanwhile, Seymour Cray stepped down  as Chairman in 1981, handing that job   over to the CEO Rollwagen, and became an  independent contractor so that he could   work on the Cray-2. He moved to Boulder,  Colorado to work in peace once more. ## The Cray X-MP The Cray-2 eventually did come out in 1985,  after three false starts on the cooling system. Famously, it had this massive liquid immersion  cooling system that caused the machine to resemble   an aquarium. Even so, memory latency caused  the system to underperform its full potential. Then to the surprise of many, Seymour's  Cray-2 found itself upstaged by another   computer produced by a separate  team: The Cray X-MP supercomputer. The X-MP team was led by a  long-time Cray collaborator   named Les Davis as well as a talented young  Taiwanese-American designer named Steve Chen. Where the Cray-1 had a single CPU, the  X-MP introduced parallel processing   with four CPUs along with new solid state  storage semiconductors. Released in 1983,   the X-MP was the world's fastest  supercomputer - 2-5 times faster   than the 1S - without the radical design  changes of the Cray-2. People were stunned. It sold very well compared to the Cray-2. By  1989, there were only 24 units of the Cray-2   sold as compared to the almost 200 units sold of  the X-MP and its immediate successor the Y-MP. ## New Pressures There were other changes. In the 1970s, Cray had  no competitors for its unique part of the market. But throughout the 1980s, new supercomputer  competitors started to emerge out of the   woodwork. First over in Japan, where  Fujitsu, Hitachi and NEC leveraged   Japan's growing advantages in VLSI semiconductor  production to make compelling supercomputers. Cray still dominated the market. In  1988, they had 56% market share of   traditional supercomputers, but the Japanese  altogether had 37% and were making ground. In a similar vein, supercomputer  startups like Thinking Machines and   nCUBE began exploring new approaches  of supercomputing beyond just vector   computing. The most prominent of which  are Massively Parallel Systems or MPPs. These systems coordinate many commercial  microprocessors to do millions or even   billions of floating point operations  each second. These microprocessors   being bought off the shelf meant  far better price for performance. This combined competition from the startups,   the Japanese, and even old friends like  Control Data's supercomputer spinoff ETA   Systems put a lot of pressure on Cray to  focus its approach and its product lineup. ## Cray Leaves Cray Cray Research's unexpected success with  the X-MP would come back to haunt it. Designer Steve Chen was featured as one of the  company's young exciting rising talents - the   next Seymour Cray, even. But unfortunately, the  aggressive vision that Chen and his team had for   the MP line after the Y-MP spiraled beyond  what the company can financially support. 64 processors, custom integrated circuits,  and maybe even optical interconnects. In 1987,   the company was already invested  in developing Seymour Cray’s next   machine the Cray-3 and the Y-MP  in addition to three existing   products that needed money too. It had no  money for Chen’s science fiction dream. So that year, Cray Research scaled back  the MP line of computers. In response, the   much heralded Steve Chen quit Cray Research and  started his own company - Supercomputer Systems. Supercomputer Systems took $150  million of investment money from   IBM and other investors like Ford and  Boeing, but went bankrupt in 1993. In 1989, the company could no longer accommodate  its former founder. Seymour Cray joined a spinoff   called Cray Computer Corporation or CCC in  Colorado to work on the future Cray-3. Cray   Research thusly would go onwards with the existing  Cray X-MP architecture, building the ecosystem. The Cray-3 would have used gallium arsenide  semiconductors for switching performance far   faster than what was possible with silicon. It  necessitated buying a lot of chipmaking equipment. The Cray-3 soon fell behind, and in 1991,  various customers started cancelling   their orders because of cratering defense  demand after the fall of the Soviet Union. CCC could only sell one system before filing  for bankruptcy in 1995. An announced Cray-4   never materialized either. Seymour Cray's  cherished approach of "sitting down with a   clean sheet of paper" and building a "big iron"  supercomputer was no longer financially viable. Seymour Cray formed a new company, SRC  Computers, to begin exploring parallel   designs. But he then passed away in 1996 at the  age of 71 due to injuries from a car accident. Cray Research was eventually sold to Silicon  Graphics. It bounced around for a while,   but is now part of Hewlett Packard  Enterprise as just Cray Inc. ## Conclusion Advancing semiconductor technologies have made the  supercomputers of the past seem comically behind. In 2010, an electrical engineer  named Chris Fenton did a project   to emulate the Cray-1A supercomputer using  a Xilinx Spartan-3E 1600 development board. He even put it into a cute little Cray-1 package.   It now sits in the Computer History  Museum, and it inspired this video. Today’s leading edge semiconductor makers  now face the same issues as Cray did with   his supercomputers. Thermal problems.  Interconnect problems. Slowdowns due to   memory retrieval. I am struck by the similarities. Unfortunately, the semiconductor industry  cannot do as Seymour did. Throw it all out,   and start anew with a fresh sheet of paper.
