[EPILEPSY WARNING] How fast should an unoptimized terminal run?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Although I strongly agree with you that people often unnecessarily overcomplicate things, I think std::vector or the STL as a whole is a very bad example. The main reason to prefer C++ over C is that you don't have to deal with malloc/delete and you have a generic standard library at your disposal. And this makes code a lot simpler.

In the earlier days of the new windows terminal, I have watched a talk where a developer said that the terminal does not just get the output text from the program and renders it to the screen. There is a complicated process that has to happen before the text is rendered.

👍︎︎ 2 👤︎︎ u/essanto 📅︎︎ Jul 04 2021 🗫︎ replies
Captions
hello everyone and welcome to a little terminal rendering aside normally this is handmade hero but uh because last week and was a heatwave i had promised to do some terminal demo stuff and i did but it was silent and in the middle of the night and only about five or ten minutes or something uh so i'm gonna do the actual terminal demonstration with talking uh today and uh just to give people some background about this there was for reasons that are totally not worth going into a lot of the standard sort of nonsense developer excuse factory stuff going on with regards to mono space text rendering and uh generally speaking one of the problems that we have that's endemic in development today is that instead of people just acknowledging the fact that pretty much 100 of software is like either 100 or 1 000 times slower than it should be uh they just make all these excuses as to why that is correct um there are so many that i don't even know what they all are off the top of my head but you know they range anywhere from ones that are actually they're not they're not excuses they're they're potentially even reasonable explanations like you know we don't care about that we just want fast time to market like we know we're shipping bad quality software we just that's not our goal right and that's pretty reasonable as an excuse because that meant you know you're acknowledging the fact that the software is bad but you're overtly saying look we're not we don't care right and there's there's a lot to that right like doing a bad job on something but acknowledging that and is is a is big right because then at least it's intentional um but most of the excuses you hear are are actually not even really excuses they're kind of factually incorrect they're like well you know if it was gonna if if it wasn't one thousand times slower it would be too hard to maintain or something which is i i mean it's crazy because i've seen the code bases where people say that and actually the fast version would have been much easier to maintain because it would have been a lot less code but you know whatever there's like oh this problem is really hard that's why it's so slow it it's almost never actually the case you know there are hard problems that computers do struggle with the it's almost never what it is so there's just all these excuses and this was another case where there was just a lot of excuse making as for why terminal rendering was slow now terminal rendering is a problem that's been around uh for you know more than half a century at this point i would wager i don't know when the first terminal display came into being but terminal displays have been with us since the very beginning really um and in a sense a terminal is really just an extension of a line printer like you know the very earliest computers just printed out uh stuff onto onto paper uh and a terminal was just an innovation that we had as far as i know and again i'm not a historian of computers so i don't know but you know my recollection was just like you know then we went to you know terminals as being better than paper because now you're not wasting this paper uh to see the output of your computer and it can go much faster because you know you don't have to physically print things and so on and even so you know a lot of computing grew up around that idea and so even today we have terminals and so what this was basically about was i am working on a course that's coming out it's called star code galaxy and one of the first things you do is you start in text mode because part of the idea of the course is you should kind of follow the steps of how people learn to program originally because a lot of the very best programmers you know went through a certain path and it tries to kind of mimic that path right and so one of the things i noticed is that actually trying to output things to the terminal which i don't normally do i don't normally use terminals very much at all myself uh outputting through the terminal is very slow very very very slow uh and just to give you an idea of how slow it is i'll show you some demos today uh on the actual windows terminal this is actually the the best case this is the windows terminal that you have to download from the windows store uh which is actually faster than the one that comes with windows uh normally i i don't really know why maybe for backwards compatibility they don't want to like force people to use this one i i'm not sure but anyway uh i noticed it was very very slow compared to what it should be and so i posted a bug report and the the the excuse factory kind of rolled from there right and so what i wanted to do was show that with not very much time and not very much code uh i wrote this in just a couple days what i'm going to show you and the code is very small it's 3000 lines of code and most of that code is not even relevant to the problem it's just things like wrangling direct write or whatever with not very much code you can take pretty much any slow terminal rendering system and turn it into a very fast terminal rendering system and i think that was a lot of the confusion regarding things is i think people think that somehow speed only comes from like redoing everything and it's true that the very fastest thing uh you probably have to do that but just using some you know just some sensible code uh and caching you can turn a slow terminal renderer into a fast terminal render it's quite easy and so that's what i wanted to demonstrate and kind of show the results of today um again nothing here is even worth demoing in my opinion this is all stuff that in general should just be obvious and simple knowledge i don't there's nothing new that you're gonna see today it's all old hat um any renderer person like a game engine rendering person who came and looked at this would probably not even think it's good right they'd probably just be like everything i'm showing is just kind of garbage anyway and they could do a much better job and that's true right so i just want to emphasize that before we start all right so that's the preamble i don't want to belabor it too much that's just where we're at um so let's take a look at for example what i'm talking about when i say that the performance isn't very good uh in in like current terminals so what i'm gonna do here is i wrote a program called splat um and all splat does is it just feeds stuff to standard out as fast as possible so you can give it uh a file and you can you know you can say just dump this to standard out and then you can test how long your terminal takes to process that and so i'm going to do is i'm just going to dump a one gigabyte file of text uh out to windows terminal right um so this is what that looks like and one of the things about this that's kind of surprising is a gigabyte of data is almost no data those of you who work with data for a living know that a gigabyte of data is laughably small um and nobody would care about it right computers deal with that all the time they have more core memory than that routinely a 16 gigabyte uh core memory computer today is is durger no one would think that was weird um and yet a single gigabyte of data uh this is what it looks like and it takes so long to wait for this that i'm just gonna do the terminal demo while we wait for it to uh finish so i'm gonna kind of put that over here and then we'll just look at ref term which is the terminal emulator that that i made over here it's not really terminal it's just a reference rasterizer um so first of all this is a regular terminal display this is the one that i made and it's just designed for testing what the efficiency of a terminal should be so we have some solid numbers and it supports most things you would expect so for example if you want it to do line wrapping uh it will do it that line wrapping is dynamic so you can kind of see uh right there's there's no nothing up my sleeve here it'll just do uh dynamic wrapping um there's some things it it doesn't like like i said haven't spent really any time on this uh for the most part so right now it count it doesn't leave a margin right so the last character it puts there it should probably wrap one character early because you know you want to see the whole character there's a bunch of things like that this code is gplv2 and on github so if anyone wants to turn this into something real they're welcome to do so and fix some of these little things and i'll point out what they are as we go but for the most part it does everything you would expect right like it's not it and it can do like animation like blinking characters which i don't even think windows terminal handles but that side of the point um and it also you know just has basic stuff so for example i can change the font size if that font was too big you can change the font size dynamically uh you can change what font you're using dynamically so like if i want to i can change to a different font um so there's nothing like hard-coded or weird up my sleeve here i'm not playing any games or tricks this is designed to test the full pipeline of a terminal and furthermore i i wanted to try and handle everything so i could prove that uh you know this wasn't fast because it wasn't handling a lot of special cases that was an excuse i heard a lot windows terminal doesn't really even support arabic correctly um and so i i put in enough to make sure that if someone wanted to support arabic correctly and stuff like that it could which windows terminal doesn't even do uh so for example uh if i do um i don't even remember what this this is called yeah there you go um if you dump arabic turret to it it will actually do it in correct right to left order and it will do the correct glyph combinations you can see here um and and this works uh with most arabic i really want someone who knows arabic well to help me with this and some people on twitter were nice enough to to give me these samples and and we'll check it out but basically what you can see here is not only does it support arabic properly but it actually supports it even when the size of the arabic would not be evenly monospaced so this actually just allows these characters to be however big the glyph wanted it to be so if your arabic fallback is essentially a proportional font it still just works with this renderer in fact proportional fonts just work with this renderer they just they'll just do the proportional part that you ask them to do and then it'll render like on the next cell right so unicode combining cares we handle uh right to left it handles it does all that stuff you know you can you can do whatever you want you can see it it wraps them uh just like everything else so all of that stuff uh is is what you would expect and you can see too up at the top that the frame rate up here just this is measuring the actual frames per second that the terminal is getting vsync is turned off so it's allowed to update as fast as uh it can and what you can see here is that the frame rate is literally running at like 7000 frames a second right um this is kind of what i was trying to explain to people and i just wanted to demonstrate that i'm not making these numbers up when i said that these things should run at thousands of frames a second i'm not joking like thousands of frames a second is how fast a terminal display runs nowadays if your terminal isn't running at thousands of frames per second then something's probably wrong now that's not to say that the terminal should always run at thousands of frames a second as you'll see this terminal itself won't run at thousands of frames a second when it detects that there's very large amounts of input coming over the pipe what it will do instead is it will just go ahead and focus all of its effort right on trying to process that input so the frame rate will drop down to like 30 frames a second while it's processing all that input and then as soon as it's done uh it will pop back now that brings up another point here which is just that the only reason that happens is i wanted again i wanted to do like the worst case scenario effectively so this isn't multi-threaded none of the performance that you're seeing here comes from actually using the computer the 7 000 frames a second is the single thread performance of a terminal renderer because again terminals just don't require that um so we finally finished uh here's here's uh how long it takes windows terminal to output uh a one gigabyte file it's 330 seconds right so basically you know you would expect um yeah i mean i don't know what else to say 330 seconds you can see the gigabytes per second of throughput that you're getting here and i don't know uh we should probably run my memory bandwidth testing uh i've never run it on the streaming machine i should i should probably well maybe i have but i forgot what it was we should probably run um the memory bandwidth tester on the machine because you know i don't know what it is but it's probably at least 10 gigabytes a second uh might be as high as 20. i don't remember what this machine has um but you can see here that if you assume that if you assume that this machine is capable of 10 gigabytes a second at least of memory bandwidth it's got a read and write data so you figure you're going to get half of that even at peak so you know maybe five gigabytes a second this isn't even an order of magnitude away from that it's three orders of magnitude away from that right and an order of magnitude away from it might make some sense because again i don't expect people making a terminal program to necessarily spend all their time hyper optimizing it that would be nice if they did especially from a company like microsoft that has literally billions of dollars and you know uh tens or hundreds of thousands of employees it would be nice if someone spent some time actually getting the throughput of their terminal up but you know e like i said the main thing to point out here is that even without really doing any work like even just the minimum effort a couple days of work you can get terminals that have all of the support that you wanted the rendering side like all of the support you need for all the stuff you want um and you don't have to be this this slow right so let's take a look at what happens now because uh there's a separate source of slowness and i want to show you uh this sort of separate part and talk a little bit about why that's there so let's suppose that now without doing anything special um i try to do this on here and what you'll see is it's a little bit surprising so suppose i try to do that same file um from from our terminal right um and one of the things that you can see that's very very suspicious because i just told you that what this terminal will do is it will focus on ingressing input when it sees it and it won't spend time rendering right but you'll notice this is very suspicious here right that is not a good number because that's very close to the same number that we were getting when we weren't doing any of this right so when we it's very close to the frames per second that the terminal was getting when we were not streaming any data to it and so you can see here that we get about 10 x faster right um you know they were 330 we're 39 we're about 10 times faster which is great and you might be like woo hoo it's 10 times faster but i mean like i kind of just said we know that we have two orders of magnitude left to get here and so the question is like okay what's going on now i should mention as the icing on this particular cake of just how ridiculous some of this excuse making gets i didn't optimize any of this code i have never taken a performance measurement of this code of any kind i've never run a profiler on it i've never optimized any code path at all so this was literally just the terminal i typed in compiled in release mode is all it is so you might say well okay since yeah casey didn't optimize any of the code maybe that's as fast as it runs right but as we know because some folks um on the molly rocket discord uh who were looking at this problem after i sort of posted how slow you know uh the terminal was they were started looking into it and they found some things i happen to know going into this example project that windows console subsystem is really bad so you know windows kernel is actually pretty good in my opinion i'm not an expert on kernels so i couldn't say for sure how good it is in some absolute sense but windows kernel you don't normally expect the kernel part to have really slow performance but in the case of conio so when you're actually sending like standard in back and forth windows kernel is actually terrible and the reason for this is rather than just do direct piping through the terminal like it would if you just use a regular pipe you go through this subsystem that is not written by the kernel team that does a bunch of stuff to maintain like the layout of the screen so that people can call functions to read back from the console and stuff like this now none of those functions are difficult and it shouldn't really be that slow to do that probably it's a little slower right just sending data over a pipe is obviously much faster than actually if you look at the data for things like vt codes or whatever right and you've got an extra write in the mix because you've got to write to that terminal buffer so you know it is more expensive but it's not that much more expensive it's not a hundred x more expensive or potentially a thousand x more expensive depending on what we're looking at here so that subsystem is actually the bottleneck now if i do this splat so the reason we get 40 seconds is not actually because my code is slow and my code is slow but it's just not slow enough to be as slow as windows console subsystems so what i wanted to do was also benchmark that so how fast should naive unoptimized code which is what this terminal is how fast should that still you know how fast should we still expect it to take uh an input ingress right and the answer there is or i shouldn't say the answer but the trick to that is bypassing the terminal so here's how i did that um what i did is i introduced this concept called fast pipes and what fast pipes are is just a way of bypassing windows console layer because the windows console layer is really bad so the way that they work is i made this h file it's called fastpipe.h and if you include it normally it won't do anything so you can see here it just returns zero as a macro so it doesn't do anything if you include it on like linux for example because i don't know if linux has this problem i doubt it does so if you include this on linux it doesn't do anything if you include it on windows however what it will do is it will look for a secret back channel pipe and this pipe is just called fast pipe with the process id of the current process stuck in there right and what it will do is if it finds that pipe it will redirect standard in and standard out to go across that named pipe bypassing the windows conio stuff so it'll still go through the kernel so we can test how fast windows kernel actually is at transferring this data and we can get a realistic estimate of what it takes to round trip everything but we can bypass the conio stuff so we can see just how bad it actually is so what you can do in this uh ref term you can always type status by the way and it shows you like what's going on um if you go ahead and type fast pipe then it'll turn on it'll it'll create that pipe before it launches uh the process so it'll launch the process suspended create the fast pipe with that name and that splat program that i made the whole reason i'm using that program rather than type or cat or something is because it has that fast pipe header file so now if we because if type and cat wouldn't get any faster because they don't have that fast pipe header file splat does so splat you would assume would probably be roughly the same speed as something like cat or type uh if you don't turn fast pipes on but if you turn fast pipes on now it can be drastically faster because it can bypass windows terminal so it took three something second 300 something seconds uh 330 i think it took about 40 seconds uh with the fast pipes off in our terminal um so now let's go ahead oops and uh and do uh gig.text with fast pipes on uh this is what that looks like and you can see that that drops it by another order of magnitude so by bypassing the windows con io stuff which is an epic disaster uh it gets 10x faster so that's a huge performance penalty to go through that layer uh and that just slows all terminals down even ones that didn't that weren't slow for other reasons um that's the case now there's even worse scenarios than this that is with vt parsing turned off not in my terminal my terminal is doing the vt pursing um that's with vt parsing turned off in windows conio if you turn vt parsing on it probably gets even worse i haven't done that test but i suspect that if it has to actually do vt parsing which you have to call set console to do then you're in even more trouble so best case scenario it's 10x slower that's when there's no like color information getting sent down or cursor positioning all of that's turned off uh in splat i should probably make a thing that turns it on so we can see how bad it is but either way um so that's pretty much it i mean kind of like i said there's not much to say here this is just kind of obvious stuff i just felt like it kind of had to be done at least once because the excuse making is so absurd these days that i felt like it would be nice to take something that people were saying was impossible and it's just like there's no way you could write maintainable code that does this or it would be this big research project it was called a phd project by one of the microsoft developers to do this uh what is literally being done on the screen right now that took me a couple days um so i just wanted to show that all that's nonsense and in case you're wondering about how complex the code is let me show you basically the entirety of the code that you actually need to support this rendering this is it it's just called glyph table um the file's called glyph cache and it's it's just a glyph table right um it's it's basically these functions that you see here right it's uh initialize direct lift table uh get footprint and place in memory unpack find update and then a stats thing if you want to know how it did right and the code like this is the code that that's that's it that's the entirety of the code all it is is a least recently used cache that takes a strong hash value basically in this case i use pelican hash modified pelican hash basically but you can feed it whatever hash you want you can do whatever you want all it does is say okay um tell me how many glyphs you want to map directly which is you know like ascii right you probably want to map those directly because there's a small number of them and they're the most heavily used thing in terminal because most code is written in ascii so it directly maps those so that when you get an ascii character you can actually just go directly to a cache slot uh and then it just has 128 bit hash lookup least recently used for everything else so if you get a big run of arabic character combination stuff that's like 20 characters long you just hash that 20 characters you send it to this thing it says oh yeah we've produced that glyph already here it is or no we haven't um you know go get it and it supports uh it uses kind of a a rolling hash to allow you to to have any of those glyphs take as many tiles as you want so that's how i do like the long if if an arabic thing wants to be very long which sometimes they do or like an emoji that takes up two squares that's how that works right um and that's it that's all it is it's just that and then you put the glyphs in a texture that's that's the code right there's nothing else there um i i don't know what else to say uh there's a bunch more code the reason i mean so it would be like a couple hundred lines of code if you just wanna talk about the logic of how you render something like this that's it it's a couple hundred lines of code that anyone could do here's the shader um it's it's nothing and by the way this shader also runs in two formats this will run as a vertex pixel shade vertex and pixel shader if you want or it'll run as a compute shader in the same shader so it's not even like this is both shaders together because they're so simple you don't even have to pick you can just say well however you want to do it right here's the entirety of the code it doesn't do anything it just looks up the screen position and composites the glyph right now one change i'll probably make to this uh because i wanted to do some stuff i wanted to show that you could do even things that don't currently really work i mean it already does things that don't work in windows terminal but it could do even more things don't work in windows terminal um and specifically what i want to do is i i think because this runs so fast i mean it's like 8 000 frames a second or whatever 7 000 frames a second i think a little more of that time should we should probably slow this down by adding a second texture fetch and the reason for that is then you could do arbitrarily oversized glyphs so if you want to do like italics that lean into the next cell a second texture fetch here would just do that right and since it runs so fast i think it you might as well because it probably isn't even really going to slow it down that much and you might as well spend a little bit more of that frame rate to get an extra feature for free which you just can right um so it's all very very simple there's uh also in here just so for people who want to look at the code base this is obviously i should mention this is all on github as ref term under cemutory so you can just go look at the codes gplv2 um there's also i just in here because i had to do it there's example code for all the other stuff but it's just not good code right uh so there's example code for like that's digital streaming for the fast for receiving fast pipes for generating glyphs with direct write um and so on and again all of this stuff so when i did this i accepted as constraints all of the constraints that i felt like the visual stu not the visual the windows terminal team had which is i have to use direct write like i can't go write my own glyph renderer that would actually be fast direct write is horribly slow it's one i i can't even imagine why it's so slow i i don't know it's incredibly slow um so i accept that i'm like okay the glyph generator we have to assume is so slow that it almost you'd never get above 30 frames a second um maybe not even that if you were actually going to use it to generate all the glyphs on the screen every frame so i just took that as given um and uh and the same is true for figuring out what's unicode and what's not i used uniscribe i use the thing that's in windows which is terrible it's horrible right uh if i actually wrote that myself it would take longer but it would actually be fast it's horribly slow right now so i accepted all those things as constraints on the design of this system but the truth is again you don't have to write everything to be fast you just insert caching or you do a few things to make sure that the bad parts of the code if they are things you can't change you just isolate them and cache them and off you go right some problems you can't do that with and that would be a harder problem but that's why i say terminal rendering is so simple because it's just an embarrassingly simple problem right you don't have to rewrite the world to make a fast terminal you just cache the things that are slow and you're done right um so anyway uh that's that's basically it uh i don't know that there's really much else to say there are some things that i probably would like to fix um just for reference purposes they have nothing to do with the renderer this completely validated the renderer for me like it obviously proves everything that i think i wanted to prove without exception i don't think there's any excuse you can possibly make now um as for why things are slow so if you've got a new one i'm happy to hear it but i can't think of any excuse that people could have because it's like here's the code and that's how fast it runs just like i said it would but um i don't know anything about direct write so in order to do this project i had to learn direct right because i had no idea how that worked it's terrible by the way it's it's one of the worst apis it's not as bad as event tracing for windows but it's absolutely horrible right direct write is fantastically terrible um i had to learn it and i don't really know how it works the documentation's bad the api is bad i don't really know how it works so one of the things that i can't figure out how to get it to do um and i'll show you example here so like let's say uh right one of the things i don't know how to get it to do is shrink so what is happening here is called font substitution i believe so when you are printing out something if the font doesn't have the thing that you're printing it will substitute a different font in place of the one that you're using so that it can get glyphs that it doesn't have and one of the problems that seems to happen uh with direct write that i don't know how to fix i've come up with ideas for fixing it on my own but i'm not sure how you fix them in but i'm not sure you make director do the right thing is i don't know how to tell it look i i need you to shrink or pick a font where this thing is going to fit inside the cell because again like i said i wanted to add oversized glyph rendering so if i added that it wouldn't clip these you'd see them but they'd still be wrong like especially something like this that looks like there's quite a bit down there it would kind of obscure like the line below it and so what i really want direct write to do is pick a font when it does the fallback pick a size that's small enough that it still fits in the square and gdi i had a gdi path as well you can't enable it right now because i don't know how to get gdi to actually do alpha so with colored glyphs it's it doesn't really work um i couldn't you know put the alpha myself but the alpha wouldn't be correct because for color glyphs it doesn't matter it's a long story but anyway gdi picked correctly so when you run this through gdi you actually get fonts that shrink down properly but direct write doesn't do it so one of the things i'd like to fix is find out if somebody knows how to get direct write to stop doing that a but b if direct write just won't do that then one of the things i was thinking is well i could just make my own shader that instead of updating these resources by copying them we could shrink them ourselves because when the glyph generator generates them it knows how big they are so we could like resample them downward ideally you wouldn't do that because the whole point of rasterizing glyphs on the fly is so that they're not bitmap so they can be like infinitely precise for whatever the size is so i'd really prefer it if there was some way to tell direct right now to do this the other thing i was thinking is maybe we just implement fallback ourselves again free to do because the glyph generator is cached so even if that takes a little while it's not going to affect the performance so that's another option but i would love to hear suggestions from anyone who knows how to get direct right to stop doing that because that'd be the best case um but i think there are other cases we could use to fix that uh but that was ridiculous i don't know why it does that um so it would be nice to to fix that um the other thing is right now so trying to think there's there's one or two other things that oh right now um so the code path i use uniscribe like i said i just kind of accepted the constraints that you have to call things in windows because this is i wanted to pretend you're in this footsteps you're in the like shoes of the windows terminal team and you have to call these windows things that are very bad um i use uniscribe and uniscribe also is terrible and i don't think i really understand that api very well at all it's a better api than direct write but it's also horrible um so if you look at what happens here this is actually subtly wrong the reason it's subtly wrong is because uniscribe puts the space character into the same batch as the glyph and so the space doesn't really show up in here because it goes to the glyph generator as a space and then the space gets truncated so i should probably fix that by just hacking the uniscribe path but the uniscribed path is also horribly slow so anytime you're dumping a lot of unicode to things things slow down slightly and that would be nice to fix so another thing that i think might be nice is to just put in some utf-8 unicode parsing because all you really need is something that knows the boundaries of what hat you just need something that tries to figure out what the boundaries are of things that you need to pass to the glyph generator as a chunk that's all you need to do it's not even utf-8 parsing really it's just chunking so having a nice community verified like people who speak lots of different languages could try it and make sure that it actually worked on their languages because that's the hard part is knowing whether you know it's trivial for ascii you don't have to do anything it's just every character right um and it's maybe slightly more difficult but still pretty trivial to do like this kind of text where all you just have accents um but if you want text like this where it's like a complex combining character set i think that's probably somewhat difficult to do and so you would want i mean difficult right it's still a very simple problem it just takes some elbow grease to know what all the cases are right you have to go you have to actually go look up um and work with some people who who you know are fluent in arabic writing to find out what that is so that would be a nice thing to do because uniscribe it's it's not good and it would be nice to just get rid of that and that would also speed things up um so basically it's those two things uh that's really the and you you know the the tech sizing issue the direct right just picking bad things is really a bad issue for monospace because you can see here uh let me show you another thing um what is the actual i'm sorry i don't remember what the um oops it's like uh plain text stress utf uh so splat plain text stress utf-8 so you can see like this stuff is not correct either right um what happens is it gets the right characters but it doesn't know that it doesn't keep them in line with the grid and this is the same exact problem as the um as the like emoji being too large what's happening here is i believe direct write substituting a font to get these other characters but that font is like not the right size so when i ask it how big it is it reports that it's like bigger than a cell so i think oh this must be something that takes multiple cells right but that's not what's actually going on so it would be nice to fix some of this stuff um because the render doesn't care it's just a question of getting the glyph generator to do something right and of course we could fix all this by writing our own glyph generator but the whole point was to show that you could do it without generator without your own glyph generator as long as you're just sensible um i think that's basically it i'll be happy to take questions now um but i'm going to go ahead and and stop the cycle of recording for the questions the only thing i would point out too is again nothing up my sleeve here i tried to implement everything i could think of that goes into a terminal to ensure that there were no code path uh shortcuts that were being taken to get the speed um but again it's not a real terminal so you can't i'm you know you can't go use this as your terminal i apologize uh people could make a terminal out of it if they want to but you know it's not it's just a demonstration of how you do the rendering um but again i transferred everything so another thing that's here you'll notice is scroll back works just fine too so one of the things you might have thought uh one of the excuses people might have had was like oh it's but it's not doing scroll back buffers like yes it is and also the scroll back buffers line wrap so again there's no trickery here as far as i know um i tried to do everything that you might want to do um to make sure that there's no like shortcutting and no weird excuses that people can come up with um also you know it works pretty much with with whatever font size it's not really like about the size of the font obviously it'll get slower uh if if you have more stuff on the screen because it has to send down more character cells uh for change this by the way no optimizations this sends down the whole screen's worth of cells every frame so again not you you wouldn't have to do that but it does so it's the worst case again slows down to like 2 000 frames a second with really tiny cells you know not a big deal uh and again you can still run these tests i believe fast pipe's still on right yeah uh you can still run these tests and see it doesn't matter that the fonts are tiny um it still only takes two seconds so like there's no trickery it all i can say is like there's no excuses for this stuff being slow it's so simple very very simple small amount of code gives you a complete terminal renderer that runs incredibly fast and i do not know why people a won't accept that fact and b make up weird excuses like somehow if you included tons of like standard library stuff and had tons of classes in here so instead of just five simple functions that you just call that are very self-explanatory simple and short you add tons of like resizing standard vector all over the place and all this garbage like you look at the windows terminal code base and it's just that garbage for days huge amounts of code people somehow think that's more maintainable how is that more maintainable like it's way more code the code you don't know what it does like most of the people who are calling those things standard vector standard string they don't know what they do internally they've never even stepped through them half the time so you know this is something you can actually see the entire code path you can step through every line that's trivial there's almost nothing in it um for the renderer side it's it's nothing right it's like a few lines of code it's very easy to understand um and the bonus part is because it doesn't use anything it's it's no libraries are in there in the render it's just a bare file there's no pound includes that it even uses in the in the like in the cache for example it's just just straight line written code it doesn't break right like if somebody changes something in the standard library that gets slower it doesn't matter right whereas otherwise you've got all these weird performance dependencies somebody screws up the behavior of standard vector and all of a sudden your code is slow right and you don't know why so i don't know it's a very exacerbating situation i wish i didn't have to keep doing this stuff i i don't know what's going on i don't know why people refuse to see obvious stuff that's like plain to see it's just and it just gets really tiring it's it's really exhausting to program these days because nobody seems to know anything and they just make excuses all day long instead of just going and looking at what you would actually do what the code is that's necessary to make something happen i i i don't know so in one sense here there that's some proof hopefully i don't know uh in another sense it's like i give up you know it it was exactly like i said it was it was trivial code to write most of the pain in the butt was working with direct write and direct 2d which are terrible um and uniscribe right and uh it's super simple code that anyone could maintain if they know how to program c the lru cache is a little finicky like you know it's the kind of code that if i was going to ship this i would probably make some unit tests for because it does maintain like two linked lists in there right internally like chaining together um but i mean even if you didn't do that if you replace that with some standard kind of hash table i still still think this code would probably run roughly the same speed you know so even if you don't want to write a hash table for some reason although yeah i don't know does the standard library even have an lru hash table probably not so that's it uh that's the end of the video it was very depressing uh the response to making simple statements about this and i don't know it's the kind of thing that just makes you not want to engage with development communities these days because it's just so ridiculous that you have to keep saying this stuff and demonstrating this stuff because everyone should just know this if they're actually working at a job right this is stuff you should have learned in your first year of working as a programmer professionally but we were so far away from that it's just like you know i don't know so anyway that's it i'm gonna go ahead and stop this so i can post that part to youtube uh um and i'm gonna go be depressed now in the q a okay so sorry i forgot one thing uh so like i went to the q a and people reminded me that that colored text or text on a colored background that changes every time or whatever was one of the standard excuses so let me show that as a simple demo as well um so here is windows terminal um running a test where uh you can see that it's like different color text um that changes over time on various like you know there there's like i don't know if you can see but this is probably the font looks a little too large let me see if i can get the whole thing on the screen um i'm sorry i don't really know how to use windows terminal uh so so maybe like if i set it down to to that it would work um did that change it it's still look you can see that there's a break there i think that line is still not quite on the screen for whatever reason um did that actually set it do i have to hit like okay or something i don't know um [Music] i'm sorry i don't really know how to use this thing i don't think that's doing anything is there like a is it there a thing hidden here let me see there it is all right that's a great place for that save button um okay uh so let's go ahead and try it now there we go so now we can get the whole thing on the screen so let me do a cls okay so this is an example of just all this is doing is just splatting vt codes out to the screen and it changes the text color uh of of the text and it changes the color of the background uh and you can see the like draw speed here uh this this is a measure of how many characters table sync so this is 163 000 colored glyphs per second right uh with the background and the foreground uh color changing um so you know that gives you some background so 169 uh or 162 something like that right somewhere between 160 and 70 is what that's able to maintain uh in in windows terminal preview or whatever this is the latest windows terminal i could find uh on on the windows store so that's where they're at um here's us running uh so if we do a status here oops uh if we do a status this is with uh fast pipes off so no fast pipe i'll run the same program um this is how fast it runs so this is about how fast you'd expect terminal to run because terminal still goes through windows kernel if they were rendering properly this is what it would be so that gives you a good so so that's like 5 300 and whatever that is right um so it's about what is that like 40 times faster i don't know something like that um so you may ask why is it so much slower i believe that's because uh they called direct right for like every glyph in this case so remember i said direct writes really slow so you know they they really can't probably get above that speed without fixing how they're doing uh glyph rendering um would be my guess uh and this was actually what started the whole thing is is on their github they were like oh well we'll separate out the foreground in the background call it rendering so that we can pass more things to direct right and i'm like what i'm like don't do that i'm like just use a towel renderer and that's when they called that a phd project so you know that's that's where we're at today right professional people getting paid very high salaries to say that on github so uh this is is without fast pipes if we actually turn fast pipes on and then run term bench you can see the actual speed so this is how fast it should be running if you got rid of the windows con io nonsense that's going on in there that you don't need for anything um this is what you'd actually get so it's 3x faster even than the one that's 40 fat times faster right so this is about a hundred times faster than windows terminal they were between 160 and 170 we're between 17 200 and 17 300 right um so i i don't know what to tell you um that that's where we're at again this is unoptimized code i i can't stress that enough let me just leave you on that final thought i did not optimize this code i don't even know what the slow part of my code is i have never timed it i have never even run a profiler on this code so like just just so we're clear this is the speed you get when you just type in the basic simplest possible thing that you can think of to cache a glyph you get this and of course you have to bypass the windows kernel so i suppose you could count that as an optimization but that was really just so i could test so i don't know how you want to classify that but i didn't time that i just happened to know that that was a very slow part of the process because other people had already posted that so that's not necessary someone else can tell you that so it would be interesting to see how fast you get this to go if you actually optimized your code right we could potentially make this much faster we don't know how fast this could run so really let me just end with ref term is like the slowest your terminal should run it's it's like the minimum bar this is not ref term is not like highly optimized code you should like aspire to made by amazing monk programmers who slaved over every line or something this is the min this is the bare minimum this is just what a basic sane simple code base produces is that speed right or the speed i showed before that's a one-third this if you don't bypass the kernel right um which you might not be able to do for other reasons right if you're trying to do something where you still want that backwards compat um and you don't want to implement all of con ptty yourself uh so so that that's it um and i'm gonna now i'm gonna actually go to q a
Info
Channel: Molly Rocket
Views: 40,978
Rating: 4.8795986 out of 5
Keywords: Windows Terminal, rendering, performance
Id: hxM8QmyZXtg
Channel Id: undefined
Length: 51min 3sec (3063 seconds)
Published: Sat Jul 03 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.