E01: What is the FASTEST Computer Language? 45 Languages Tested!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
What's the Fastest computer language in the world? Find out in Dave's Garage where today it's time for the latest round of the biggest software drag racing extravaganza yet. We're pitting 42 different computer languages against one another in a no holds barred Battle Royale of prime number generating madness with no quarter given and none asked. Who's the fastest and who's the slowest and where does YOUR favorite language rank relative to the big dogs like C, Rust, and Assembly language? It's an episode so compelling that if you're not watching you better be dead or in jail. And if you're in jail, break out! [Intro] Hey, I'm Dave, welcome to my shop! I'm Dave Plummer, a retired operating systems engineer going back to the MS-DOS and Win95 days, and today's going to be a lot of fun. A few months back I had what started out, I think, as a pretty cool idea: take a standard prime sieve and then write it in C, Python, and C# to see how the performance compared and what code written in the different languages actually looks like and how the languages differ. You should absolutely check out that first episode for the detailed comparison, but after I posted it, something blew up. [Boom] No, not, not like that. I mean it blew up on GitHub, and in a good way. I uploaded my code in those three languages and soon enough, something amazing had happened. People started porting my code to other languages. They improved on my implementations and added many new ones. And now, from Ada to Zig, we have over 100 implementations in some 45 different programming languages, and as we go, we're going take a brief look at every one of them. You'll get to see the same algorithm implemented in each language so that I can show you the basics the language syntax, data declarations, structure, and simple I/O. It'll be a quick tour where you learn just enough about each language to be dangerous and perhaps entertaining at cocktail parties. That in and of itself is amazing, but it wouldn't be that useful unless there was some way to keep track of all these implementations and even find out how they performed relative to one another. Being new to GitHub collaboration, not to mention busy with writing code and speeches and making these videos, I realized there was no way I was going to be able to do a good job of it, so I turned to you guys, the viewers, for help, hoping someone knowledgeable in the way of forks and branches would step forward and volunteer. I didn't have to wait long, as right away a fellow named Rolf in the Netherlands offered to take over. I handed him the keys to the kingdom and hoped for the best, and I was not disappointed. Pretty soon the amount of work grew and Tudor from Romania joined the effort along with Rutger, who is also of the Netherlands. Together they've not only whipped the project into shape, organized and made sense of it, but even contributed new implementations. I'm indebted to everyone in the community who pitched in, but it just wouldn't have been possible without the hard work of these three guys especially. You can imagine that with all of these languages this could quickly turn into a giant mess. It's hard enough to get a single project to compile and work on everyone's machine everywhere, but now we're talking about a build process that uses potentially 40+ different languages! I don't know about you, but I sure don't have Cobol or Fortran compilers ready to go on my machine, let alone V or Zig. The other problem is testing them all. You'd have to run up to 100 different implementations in those 40 languages and collect results data and then somehow put it all together into a useful report of some sort. I'm very happy to say that everything is completely automated so that a single make command automatically not only builds and benchmarks all 40+ languages but it also collects the statistics and tabulates a comparison report at the end . You can literally enlist in the GitHub project and type 'make' and given some time, you'll get a full report of the performance of every language on your personal machine. Thanks to the magic of Docker and the Primes team, there are 40+ complete development environments set up and ready to go. That's some high-level stuff! If you think you can make a prime sieve faster and have been looking for an open source project to dip your toe into, check out the video description for a link to the repository. If you've never worked on an open source project before, this could be a great place to start, and we'd be more than happy to have you as a contributor. I do ask that the primary focus being improving and optimizing the implementations already in the tree before simply adding new ones. For me, there are two very cool and valuable aspects to this software showdown. The first is that we get objective speed comparisons between the languages. If you've wondered how C++ compares to C#, or how Python compares to Ruby, even Pascal to Ada, we've now got the hard facts, at least when it comes to high-speed memory and bit manipulation. I was particularly looking forward to the Rust results to see how memory protection impacted code like this relative to a raw C implementation. So, which languages are represented? Well, as just a partial list we have ARM and x86 assembly, Ada, Basic, C, C++, C#, D, Dart, Delphi, F#, Fortran, Go, Haskel, Java, Julia, Lisp, PHP, Pascal, Perl, Python, Ruby, Rust, Scala, Swift, TypeScript, V, and Zig. There are actually quite a few more, and in many cases, we have multiple solutions in each language. That's how we wind up with more than 100 variations in total. We also have single threaded and multithreaded solutions. In each case the goal is always the same: solve for all the prime numbers up to one million. Do that as many times in five seconds as possible, and then report the average number of passes per second. That number - prime sieve passes per second, becomes the language's performance score in the drag race. It's worth noting that the language itself is in fact the primary factor in the results. For example, Python, being an interpreted language, is going to be necessarily slower than C or Rust. But that doesn't mean it's going to be slow. After all, it's still running dozens or even hundreds of passes per second, and that alone is impressive. But the languages that compile right to fully optimized binary executables can measure their results in thousands of passes per second, so there's really no competition between them if raw performance is the goal. What you can do more fairly is to compare results within language types - BASIC vs Python, for example, or C vs Rust, or C# vs Java. And that's exactly what we'll be doing: popping a small handful of languages off the stack in each episode that are related to one another so we can compare and contrast them. And it just makes a lot more sense to compare Bash to Powershell than to, let's say, Rust. Back in college one of my very favorite classes was Computer Science 405 with Dr. Yang. Disguised formally as a languages class, for me it was really an excuse to play with a wide array of different languages all within a single semester. From Ada to LISP to Prolog and more, Dr. Yang introduced each language to us in his inimitable style and then assigned a programming task in it so that we were forced to become at least somewhat proficient with each language. Since then, I've spent a lifetime in operating systems, but it's been entirely in assembly language, C, and C++. While I'm also recreationally fluent in C# and dabble in some Python, that's about it. As a result, the opportunity to tour more than 40 different languages was simply too good to pass up. It would be like CS405 all over again but turned up to 11, then on steroids. Like the CS405 Turbo Plus 3D Max Professional edition. While I was genuinely curious about the results of which language in each category would be fastest, I was even more excited about seeing the code for my sieve implementation ported to all these different, and sometimes exotic, languages. Seeing the same algorithm ported to any new language should be way more instructive than if I didn't even know what the code was trying to accomplish. And my goal is to share whatever I can learn about each language along the way with you. To that end, then, we're going to break the languages into groups and then run the benchmarks between them. Then we'll go through the code and see how it's done in each language, and I'll point out the interesting differences and things that stand out to me about each. Think of it as a little sampler of up to 40 languages, many of which I'd wager that you'd otherwise hear of but never get to see. Now you'll know just enough about each of them to be dangerous and to appear clever at cocktail parties. It's worth noting that I'm just an old C programmer. I'm not an academic, and I'm certainly not someone like Dr. Yang who can offer you intelligent comparative analysis between languages. But I can show you how to do things like allocate a packed bit array and a dictionary and run a loop in each language, and even seemingly mundane things like if/then/else control and how to output your results to the console can be worthwhile knowledge. So, grab a helmet and a cup of coffee and hang on as I take a look at the first group of languages: Pascal, Delphi, and Ada. Delphi and Ada are both based on Pascal, and we'll see how they're similar, how they differ, and which is fastest once we benchmark their implementations later in this episode. And before you even think about bailing because you don't use those languages, remember that's half the point: getting to see how the stuff you never touch works! Fortunately for us, the process has been completed automated by the Github Primes team. You simply enlist in the project, cd into the folder, and run make. From there it's all automatic. All 100 or so implementations will be built and executed and the results tabulated into a report automatically. The first run can be slow, as each language comes with its own Docker container that represents the build environment for that language. But that's the genius of how they've set it up, because it means you don't have to install 50 different compilers; you simply need a working docker implementation and a few basic tools like Git. Let's start our tour with the Pascal implementation. Pascal is an elegant procedural language that lends itself to good programming practices such as data structures and structured programming. It was invented by Niklaus Wirth. Americans will often call him Nicholas Worth, and so he's been known to joke that Americans call him by value whereas Europeans call him by name. Well, if you've got a better Pascal joke, I'd love to hear it, so please leave it in the comments! Pascal was largely intended as an educational language, but good compilers and tools back in the 90s made it popular among computer enthusiasts and even a few professional developers. Most notable among these was Borland's TurboPascal, which was written largely in assembly language by Anders Hejlsberg [Haylsberg]. In the "small world" department, Anders later went on to design C#, and is currently the lead architect of that language at Microsoft. [Pascal Code] Let's dive right into the Pascal to get things started. The first thing we see is that Pascal supports classes, and there is a PrimeSieve class with the three main members that we've come to expect: RunSieve, CountPrimes, and ValidateResults. As you might expect, RunSieve does the prime number calculations. CountPrimes determines how many primes were found, and ValidateResults compares the results to known values to make sure it's all working properly. In case you've never seen Pascal before, one thing that will look different right up front is the assignment operator. Where C uses a single equal sign for assignment and a double equal sign for equality, Pascal uses the single equal sign for equality testing and then adds the colon-equal operator for assignment. Whenever you see the colon-equals operator, something is being assigned to something else. When you the equals sign by itself, something is being tested for equality. The one concern I have right away with the Pascal version is that it's using a 32-bit integer for the index into the map of primes. That means it's limited to 32-bit values. My initial implementation supported 64 bits, so while this isn't a problem for the benchmark testing as it only runs up to one million, it is a broader limitation that you should be aware of. My original Sieve can handle up to 100 billion, but this implementation can't go past about 4 billion. As always, our prime sieve needs to keep a bitmap of which numbers are prime or not. Thus, if we're testing up to one million, the program keeps a million bits in a row to indicate which numbers are or are not prime. That array of bits is named the NotPrimeArray in this case. The declaration of the PackedBoolArray is interesting - it turns out that Pascal natively supports packed arrays. Normally, if you had an array of 8-bit or even 1-bit values, they are stored in words for faster memory access. That does improve performance but wastes a great deal of memory space. A packed array squeezes all the data together efficiently into an array of raw bits within bytes. It's slower to access but much more efficient on space. In some languages, like classic C, we would need to do this work ourselves, but apparently, Pascal does it for us. The RunSieve function will return that packed bit array to us for results and validation. RunSieve itself has three main variables - the factor by which we are stepping through the array to eliminate primes, the number we use to step through the array while marking off those non-primes, and the square root of the sieve size, which is the upper bound we need to work up to. As you can see, Pascal is very readable. The loops are obvious, and it uses BEGIN and END tags, rather than punctuation or indentation, to delineate them. Looking at the CountPrimes function there are couple of interesting things to note. You can see that the return type of a function, in this case an Integer, follows the function name in the declaration, and that variables are given their own declaration section above the code block. Taking a quick look at the For loop, we can see the use of the LOW and HIGH operators. In Pascal you have the option of defining the range of your array, by which I mean where the array indices start and end. Whereas in C arrays always start at index 0 and go up from there, in Pascal they can start anywhere, even at a negative index. Let's say you had an array of printable characters. If you made the index into the array be the ASCII code of the character, you could start the array at 32, the first printable character, and extend it up to 126, the last printable character. Any references to characters outside this range could be caught by the compiler or at runtime. The Low and High operators tell you, the programmer, where any given array starts and ends. The ValidateResults function returns a simple true/false value to indicate whether or not the results can be confirmed. The program contains a dictionary map of how many primes can be found up to one hundred, one thousand, one million, and so on. If you results match the known values, the function returns true. Based on the way this is written, it appears that Pascal doesn't support static declarations of dictionaries. Instead, an empty dictionary is created dynamically and each of the value pairs is added programmatically one by one. Finally, let's look at where the Pascal version prints its results. There appear to be two functions of interest, Write and WriteLine, which differ by whether or not they automatically add a newline to the output or not. We can also see how the DurationTickCount variable's output is formatted: in Pascal, you can provide the total field width, which is given as 4 in this case, and the decimal part width, which is specified here as 2. Whereas languages like C# and Java use bytecodes or an intermediate language, the Pascal code will compile and link down to a native binary. Moving on now to Delphi, I should explain that Delphi is technically a development environment, not a language per se. The underlying language is perhaps most accurately described as Object Pascal. The first versions of Delphi actually evolved from Borland's Turbo Pascal, but it was decided that the object-oriented extensions that had been added to the Pascal language up to that point in time were not ideal, so they effectively started fresh and derived the language more from Apple's Object Pascal. Still, if I called it Object Pascal, I'd have to explain that I really meant Delphi, so we'll just call it Delphi for simplicity. The Delphi code looks quite different, and there are two reasons. The first is that the object handling is different in Delphi than in classic Pascal, and the other is that each was written by a different author in their own style. In my original implementations I tried to keep them as similar as possible between languages, but in a few cases the authors took greater style liberties. If we can still learn from looking at the code, though, that's the main thing. Let's start with some simple things, the prime counting and validation. We can see that CountPrimes and ValidateResults are both defined as member functions. The former returns an integer, the latter a Boolean. Otherwise, the functions are quite straightforward. The main loop is also similar to what we saw with Pascal. One thing that stands out is that the time math here shows that Delphi contains the notion of a timespan, and that durations can be converted to units like total seconds. This is likely more a function of the compiler library, but it's quite reminiscent of the .NET framework. Rather than checking for null, which in C is synonymous with meaning "no pointer value assigned", Delphi has an explicit operator called assigned to let you know if a pointer has been set or not. While I'd venture to say that in C, using i++ or even i=i+1 has always felt quite natural. In Delphi, however, you have increment and decrement operators. They accept an optional "how much to increment" value and otherwise assume one if it is not provided. At the bottom of the PrintResults function we can see the use of a Format function that appears to provide C-style printf formatting for strings and variables. While I imagine the Pascal mechanism of total and decimal field width's still works in Delphi, you might prefer the expressiveness of this printf mechanism. The RunSieve function again shows the use of the increment operator, but this time with a variable increment amount. Pulling back in scope for a moment to look at the class declaration itself, we can see that at least in this case, all data is private to the class, but there are both public and private functions. Delphi appears to support virtual functions as well, at least judging by the presence of the override keyword. The bit array is declared as an array of the type ByteBool. That means each bit is actually taking up a full byte, rather than a single bit. There are also wordbool and longbool types, but they just waste even more space. Because the Delphi version uses 8 bits per prime, in my mind, it's not authentic to the original, and so its score cannot be counted. Add in the further complication of it needing a commercial license for the environment, of which we have only one, and you can see why we generally omit its performance numbers from our test matrix. Since Delphi is just Object Pascal, Delphi's performance is very similar to that of Pascal itself when not doing object oriented programming. We finally turn our attention to Ada, which you won't be surprised to find out is another declarative and imperative language just like its progenitor, Pascal. Ada has built-in language support for interface contracts, extremely strong typing, explicit concurrency, tasks, synchronous message passing, protected objects, and non-determinism. Ada is very popular with the military, and if you're going to write the software for a nuclear ICBM, odds are you'll be doing it in Ada. Perhaps one of the reasons is that Ada is particularly adept at catching things at compile time, rather than runtime. Because after all, runtime is a terrible time to find bugs in your ICBM code.a I've long had a fascination with Ada, but have never really more than dabbled with it. What time I spent with the language convinced me that by and large, if your code compiles in Ada, it's got a very good chance of working. Ada can catch things like invalid parameters, range violations, invalid references, mismatched types all at compilation time. Because concurrency support is actually built into the language, the compiler can even recognize certain potential deadlock situations. Dipping our toes into the code by starting with CountPrimes, we see that the code is written to use a 64-bit quantity, a long long. The loop and if-then syntax are unremarkable, and for most intents and purposes, this stuff could be Pascal. The output is similarly straightforward. We can see that there are two ways of sending output to the console: Put and Putline, the latter of which adds the newline. One novelty is that the string representation of a numeric variable is obtained by looking at its image attribute. Before the apostrophe is the type and the instance, if needed, is passed in parenthesis. Finally, we look at Ada's RunSieve implementation, which to me reads clearly enough. It's perhaps reassuring that, at least at the level of a prime sieve, the language looks quite natural. It's only when you start devling into topics like concurrency that the language really starts to get complicated, and we don't need that for our single instance sieve implementation. Well, we've had our quick peek at the implementations, and that means it's time to race them. As noted, Delphi will be represented by Pascal, so it's a heads-up Ada vs Pascal showdown. Before we do any racing, though, it's likely a good time to let you in on the number one rule of software drag racing club. And that is that if you don't like the results your favorite language is getting, you don't whine online, you improve the code. You get out your editor and head on over to the github and you hack away until your code is faster than my code. And, while the individual races will be determined by the state of the code as it is today, the official winner will be determined by the grande finale episode with the code as it sits then. That way you can improve anything you're not happy with and achieve redemption for Powershell, or whatever it is you're really, really passionate about. With that out of the way, let's hit the track. Primes to one million, as many times as possible for five seconds, head to head: Ada vs Pascal. Ooh, that's gotta hurt for the Ada guys. Especially because I thought Ada was now compiled with GCC which I thought might give it a leg up over whatever older tech the Pascal is probably based on. The languages are similar enough at this level of coding that odds are this one comes down to algorithm implementation, so if you're an Ada fan and aficionado, you know what to do. And heck, if you're a Pascal guru, it could be faster as well, I bet! To put these results in some larger perspective, the world record score recorded on that machine on that day was 7301 per second. The lowest score turned in by any language that day was one pass every 294 seconds. Clearly that's a huge range of results, and part of what's going to make this tour so interesting as we keep revealing languages. For now, both Ada and Pascal can be found on the top half of the bottom half of our leaderboard. We're just scratching the surface with these first languages. We've yet to even talk about scripting languages, shell languages, functional languages, notationals, and many more types. We'll race and explore the highest performance languages such as Rust, assembly, and C while also looking at the exotics like F#, oddities like LISP and Prolog, and even some comical ones like the powershell vs bash showdown. But unless you take the time to subscribe to the channel and turn the bell on, you might never know about them. Which makes right now a really, really good time to do it. Because later means no. If you enjoyed this episode, please find that like button in the toolbar and Smash it real good, or whatever the kids are doing these days. I don't even know anymore. But I can count, and if lots of people like and subscribe, it makes both me and the algorithm happy. So, click on it just to prove to yourself it really does turn blue. Thanks for joining me out here in the shop! In the meantime, and in between time, I hope to see you next time, right here in Dave's Garage!
Info
Channel: Dave's Garage
Views: 128,316
Rating: 4.8496599 out of 5
Keywords: fastest computer language, programming languages, top programming languages, software drag racing, coding languages, learn programming, code racing, best programming language, learn to code, learn to program, daves garage, ARM ASM, X86 ASM, Ada, BASIC, Bash, C++, C#, Dart, Delphi, F#, Fortran, Go, Haskel, Java, Julia, Lisp, Lua, Node, Nim, OCaml, Octave, PHP, Pascal, Perl, Powershell, Python, Ruby, Rust, Scala, SQL, Swift, TypeScript, Zig, for beginners, learning programming, 2021
Id: tQtFdsEcK_s
Channel Id: undefined
Length: 22min 27sec (1347 seconds)
Published: Sat Jul 03 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.