so this talk is about Titan inside the virtual machine and how that actually affects peyten as a language and because there's a lot of potentially boring stuff in there I would really welcome if people would just ask questions or anything of that sort along the way to spice it up a little bit what's interesting about I think pipe nice language is that we all use it and we're sort of we like it I think for the most part at least people don't seem to mind it too much but it's actually surprising how nitty-gritty the interpreter internally is and how much it sort of restricts us in what we can do but also how much it enables us in other ways so this is basically so I should give a little preface for this I've been doing Python scenes so the first version of Python I used was 2.0 but it wasn't like I wasn't programming when 2.0 came out I think the first one actually used was 2.2 but I had an interest in trying to figure out how it goes back to to see how how the language evolved and how the interpreter evolved and then just recently I think last year I already gave a talk similar to this one and for that one I decided to see how it is the compiled peyten 1.3 I think so it's a it's it's interesting how far you can go back with the pipe interpreter and actually still get a running binary out of it you can actually compile the old versions on modern computers and and see how the language evolved and so there's a lot of stuff in interpreter which didn't really change over the years um and I had just an interest for it because I liked pythons a language I spent a lot of time working in it I have made many open source projects in it and such I was always curious to see how far you can take the language and how you can evolve the interpreter or in this case how you can't really change all that much because of some reasons which we'll go into on my day trip at the moment is I work for the most part on century which is an open source project written in Python most mostly based on Django framework and it's a it's an error reporting tool and then other than that I did the flask framework I had I started the news sort of static foul content management system recently called lector and I have lots of Python libraries like back it's like ginger markup save and many more so I have done a lot of Python and but never really as a core developer on the language I do have commit access for some inexplicable reasons but I I barely ever contributed anything to the interpreter but I have a lot of fun looking in and see how the interpreters so I would say that in in regards to all of the languages would exist in a world for programming they kind of are on on on a line that goes from very dynamically to very static and very dynamic can be seen in many different ways so for instance how many here is Ruby or have used Ruby a little bit okay not that many but I should probably explain a little bit about Ruby because it's kind of interesting in a comparison to piping which is that in Ruby they pride themselves as the Ruby community that it's a very very highly dynamic language in the sense that you can change a method on the I don't know on the float class and then you can change addition - subtraction not of you would do that but you can do it so it's very dynamic in the sense that you can change behavior which you think shouldn't necessarily be something that you want to change and Python doesn't do that you can't change the under under add method on the float class I mean technically I guess you could but it was never designed to support that but I actually think that pipe NASA language is way more dynamic and as a result way harder to optimize than it is to optimize Ruby and have some some examples of this and I find it interesting because we all are being told that Python works in a certain way and then as you dig deeper and deeper into it you realize that that's actually totally not how the language works and you would think that oh may we change the language so that it works how we are told that the language works and then you realize you can't do that because people actually depend in very bizarre ways on some of the internals in the language and as a result it makes it very hard to evolve the language so certain things that people want to have for a long time is getting rid of the global interpreter lock and that turns out to be a heart and also very much relevant to how the interpreter was designed initially and what guarantees it gives is very hard to not just bring concurrency into the language but also to bring the ability to have multiple versions of the same library loaded at the same time it makes is very tricky to have multiple interpreters running in the same executable side-by-side without sharing anything in it and it makes it even harder to just change some of the basic principles in in howdy that the types in piping work and it over the years more and more stuff has been piled on top of this already very complex language in particular Padmasri I think he goes very far in in doing all kinds of stuff on top of all of this which also lacks a little bit of peer review I would say ok so this is the language we are all told exists you have a global variable called metric it's 42 and there's a function called add metric and it adds a plus magic I think that makes a lot of sense it's not a very useful function but it is a function that everybody understands and a lot of tutorials on the internet will tell you that this is actually equivalent to this which it's not really it's not really for many different reasons but in fact it is totally not that what is actually happening behind the scenes is that the Python compiles all of this into a bytecode and the instruction for adding things together is called binary yet and from that moment on we go down a rabbit hole which has nothing to do with under under under under there's a it is a file in piping called C eval which is I don't know how large it is at this point but is very huge I think it's about a couple thousand lines and it's basically a huge switch statement and it's a big loop and it checks every single opcode so for instance there's a branch in there for binary addition and I cut this tremendously in fact I said dot atop that I thought all of those I just deleted entirely from the slides because they're not super relevant but if you actually consider how much codes the interpreter has to walk through to add two numbers together it's a lot and it became bigger and bigger over the years it might have been easy at one point but at the moment it takes two arguments from the stack so it pops these two arguments and then it does all kinds of stuff for instance it checks if both of those vary if the first variable and the second variable first argument the second argument if they are both exactly integers and not just subclasses anything of that sort then it does a special thing if both of them are exactly strings then it does a special thing otherwise it goes to PI number add I will look into what PI number it is but what do you think PI number it should be with that name it implies something with numbers and addition if you actually look at what it does there's a thing in there why the first part has something to do with NB underscore add that's number addition slot will go into this but then also it has a special handling for sequences in there so PI number add is not actually anything to do with numbers it's actually PI object that it just has a bad name but one thing you will notice here there is no under under at anywhere in this function like it doesn't look for an attribute called under under add what it does it invokes this binary operation one on the variable V and W is the first two arguments to the number at thing and then it looks at this lot and be it so how many here know what the slot is in peyten okay so I will just go by random one person over there which is said slot Michael acidity slots in Python are two things and a lot of people think that slots are under under slots under under to give an object what's called a slot and to remove the dictionary that's one of the two slots that patent has the other slot that Titan has is these special things which are added on top of the type on the see interpreter to not have to look into a dictionary attribute what this means in practice is that up here on the under V and the W object which we have on the interpreter level there are special containers it's a huge struct and on different positions in this struct there are different methods stored so for instance if you add two numbers together it's ridiculously hard to read we're just going to skip past this as much as we can but there's a each variable each pipe knob checked has an OB type which is sort of a pointer to the type object it has and on OB type we have for instance TPS number which means that if the type is seen as number there is a huge list of methods it could have and then in this case we might invoke that thing and then otherwise if what do we have so we look at these slots of both two different types and then depending on the relationship of them together we add numbers and this actually if you if you wouldn't look at my slides isn't total something like four hundred lines of code for just adding numbers together and words underscore underscore underscore underscore it doesn't appear anywhere sort of that's the big lie the big lie is that we were invoking these special things but we're not in such we are looking at this stupid slots things and they're surprisingly frustrating I would say if you look at the interpreter and try to reason about it it will be very nice to say like you have an object a and object B and to do something an addition with them or subtraction or multiplication there is a very very clear thing that happens javascript learnt this recently they have a standard and it defines very clearly what happens if you add two objects together Python it's it's impossible to tell it depends especially if I heighten - it depends on flags on the object it depends on the specific relationship the types have together with each other and I would argue that there is probably not a single programmer in in the Python community who could tell you exactly what happens if you add two objects together and I kind of assumed this from the fact that there is the pipe I project and the pipe project tries to implement the Python language in peyten and it's kind of interesting sometimes when pi PI disagrees with the implementation that see python has and you would think that they must get it right but then they still disagree in certain ways and it is very tricky so to make this a little bit more concrete what is a slot a slot is um it's basically a it's a slot on a struct which the Python interpreter keeps for every single type so yeah you can imagine this way you have an integer and if you look at the class from the Python site as a dictionary on it and in this dictionary that special underscore underscore underscore underscore and there is a function to it so there's a function that is invoked with that name but you see interpreter was written long before I guess this special method existed so when you go back to I don't know in 1995 there were two things there were Python classes and there were C types and they had no relationship with each other so if you edit two integers together that were written in in C that were like building types into the language then it always invoked this special slot method called nb8 that's what it did and later on they added support for classes and then I said okay so it would be nice if classes could be also added together so they added independently of these C types they added Python classes and then a python python class the logic was underscore underscore underscore underscore is the special callback method to to add numbers together and so they had these two separate systems and to try to unify them I guess or maybe that even happens side by side I can't really tell it's hard to say but nowadays what happens is if you have a c-class and it has a slot then there is some magic in the C code which exposes these slots as these special methods to patent code and same way if you make a special Python class with underscore underscore underscore underscore it takes this function which is there and stuffs it into the C type and stores it there as a C function so they try to synchronize up with each other so you could think that full underscore underscore underscore underscore can be this food type TPS number and B at which we have fun heceta is C interpreter that's sort of the expectation that they are synchronized with each other but history makes this really hard so for instance you have to mention that very early on piping had integers and floats but it also had strings and lists and tuples and so whoever mode that's what probably a Guido wrote this himself he decided that there are two different concepts for adding things together one for numbers one for sequences and so they have the two different slots for adding things together and plus looks at both of those slots so you can already see that there might be d synchronization that can happen but theoretically they're kind of supposed to map to underscore on the squared but it just becomes very hard to explain as a to a user and turner and that's also why i guess nobody really explains that because it seems irrelevant for the most part if you read the tutorial it says a plus b is a and on add an under be slightly more slightly more correct tutorials have discovered that what seems to be happening is actually the type of a invokes under under add with I should actually say a comma B but both of them are wrong because here's what happens if you write a plus B in the interpreter it checks if a and B are integers in case they're integers it tries to do a fast addition on D like in a very special place interpreter if they're both strings and it tries to make a fast concatenation then it will try to do a number addition is a implementing number slots then it resolves the slot is being implemented number starts and resolves that slot based on the type relationship between them which is very complex I'm not going to go into this it invokes them slightly differently but usually it ends up with something like the slot of a invokes with a and B but then if that also does have doesn't work then it tries to do the sequence concatenation which goes for a different interpretive path in comparison this is what happens if you do a dot under under under under B it looks for an attribute called first it looks for an attribute called under on the add-on type then it looks this up and invokes it the scary thing is this thing is easier than this thing so the fast path that we have an interpreter through these slots and everything we have at this point is significantly more complex than what we are actually told as Python programmers what happens this path is not optimized this path is often not directly invoked but you can find places where manually writing this is actually faster than this not always but in some cases and the reason why I want to mention this is because the interpreter has grown over the years with optimizations and other things to make it operate faster in this very complex world it has set up and you would think that well why don't we just take this thing and optimize it and then say when you do a plus B it does a under under add and under be like why don't we just do that that seems to be like the reasonable thing to do that's easy that's what every tutorial says why do we do this complex thing and the reason for this is that unfortunately we expose so much of the language to the ecosystem around it that we can't change the interpreter or so everyone who has ever written AC extension in peyten is a quite aware of how low-level the C API is so every custom C type yet knows about the slots every even even if you ignore the C API entirely slot rappers as a concept are exposed to Python as a language every once in a while even newbies that have never used the language encounter an exception message possess like slot rapper has no attribute under under Foo or something because these all of those internals need to be exposed m to the runtime and we have this I would say very very complex just language build up which is hard to explain to people and it's sort of for some inexplicable reasons used by everyone with all of those little MIDI details so for instance I have a better example later on where we have some code for this but point is that they are not equal and they try to be hidden for you as a use as much as possible so for instance if you have a class X which is subclass of object I think most people have written this at one point at the special ed method and then when you access this you can see there's an unbound method X which happens to be a lambda what happens if you look at what a c type looks like well if you do it under under adds on the c type then you get a slot wrapper what's the slot wrapper well what this actually is this is a special thing which wraps as c function on the c interpreter level but it's exposed to a piping user but you can see that these are clearly not equivalent even though they should be and so we have exposed all of this like very nitty-gritty detail of the interpreter all the way up to the Python language so if someone ever changes this behavior all the Python code would have to deal with a different object there and this is something that unfortunately Python didn't do just as well as JavaScript points because in JavaScript everything is a function like there is no difference we have one function in JavaScript and if you write one yourself or if you do it on a sea level they're just the same there is no real difference in looking at them other than that one of them might not expose the source code that created and was important we have we have slot wrappers we have method descriptors we have built-in function we have function we have call 'evil objects we have a huge range of all of those and they all exposed slightly small details in how to behave differently to give you an example if you have an unbound method under under at under under and you take a reference to it and you put it on to another class then it acts as a descriptor so it receives self as the first argument if you try to do the same thing with a slot wrapper it doesn't do it and people start you depending on this so for instance one thing I have seen a couple of times is that someone writes a class and then as a class attribute there is written conversion function equals int and that depends on int the type not having a descriptor so not binding self as the argument so you could never change that behavior you could never clean it up because we depend on these little differences in it and and as you go into like pi pi is a language they all have to replicate every single bits and piece of this like very intricate parts and peyten tries to sync them up as much as possible to hide this complexity as much as possible from you but it unfortunately doesn't get it right all the time and the reason why we should care about this is because it's very complex and everybody uses it and it makes optimizations almost impossible we have to emulate it as we go to other languages but it also shapes how our own code works and this is this is an example which thankfully has been fixed but this is I think a very interesting one and there's a module called re so if you're running some Python 2.7 on meua it behaves differently but I think it shows very well how weird it gets so there's a module called re for regular expressions in piping and if you compile an expression you get an object back and that object is and a specific object called Sree rule or Sree Rex or something and if you access under under class under under it gives you an attribute there that seems weird like why does it do that how does it do that what how does it make any sense for an object in pipe not to have a class attribute and in fact it also doesn't have under on the wrapper under under so it has a whole bunch of behavior of a normal piping object but then also it never responded to those special attributes so how can this be and the reason for this is that this object was written in C and whoever wrote this C object customized the entire behavior of sort of how it looks like in the piping land so the C interpreter sees a different version of this object and Python code will do so for instance you if you would want to write the wrapper function in Python to invoke the special under-under wrapper under under function you could never implement this because at least this object would always disagree with it because you can only access the class information at the wrapper information from the C language you can't do it from the Python language and it's it gets really weird because there are so many cases of this and this is this is an example where the C API allows you to implement a certain behavior that is completely invisible from the Python level so this obviously is not titan-3 code this is Python 2.4 code let's see and in all the versions of Titan it was possible to create a class that is not a new style class say where you implement under under get a tour and then it returns an attribute from another object and would pass through all of these special methods so you could for instance instantiate a and it looks like 42 because when the interpreter looks for the under-under wrapper under under function it goes and looks attribute and it will find the under-under function on the integer 42 so it is a 100% proxy so to say and modern piping doesn't work like this anymore and the reason for this is that this special instant type which exists in all the pipe inversions and in all slide classes in European versions forwards all of the calls like it did in very old versions of patents so it emulates what the Python interpreter used to be at one point first that at this point it has to do it as this very complex proxy object so when when you actually consider how peyten as a language was taught to people at one point this under under add and the under was actually correct it just became more complex over the years and now has this massive sort of disagreement in how it works the other thing that sort of leaks through which is a little bit easy to understand is unicode and I think for Unicode it is kind of important to understand where the language comes from when Python was created originally it didn't have any Unicode support at all it had strings and strings were implemented as straightforward as you can imagine they hold the sequence of characters each character has a numeric value from 1 to 256 and because of that it gives you certain ideas of how a string is supposed to work and pipe me expose this so for instance if you have a string of ASCII characters you can always get the first character I can always get the second character the third character fourth it's very easy to access these individual characters on it when they started adding Unicode supports to it they decided that the best way to do it would be to mimic this behavior but instead of going from 0 to 256 they would go from zero to sixty five thousand and then later on when I realize that Unicode is going to grow they decided that wear can also compile Python in a version where it will go to much large numbers and this is nice in one hand on the one hand but it's also very disappointing on the other so does anyone here know what UCS means it's an encoding you see as I don't actually know what it stands for I think might be unique code something system it was basically about 25 years ago I guess someone decided that we need to evolve esky for all the languages in the world and there were two independent consortiums which decided to do that one of them was the Unicode consortium and they decided 65,000 characters is enough for everyone they were wrong they had to make it larger afterwards but independently there was a group of people on the ISO committee which decided that they're going to make two to the power of 32 with different characters and they made a much more complex standard which didn't really go anywhere because they couldn't figure out how to encode it efficiently and then along the way why these two standards were evolving two things happened the first one was that people realized the Unicode will have to get more characters than 65,000 so in particular if you use emojis emojis are way past 65,000 they are they need more than two bytes but also eventually they realize that it's actually very valuable to encode the the unicode data into more compacted and codings so if you if you imagine that you have most of your text fit into ASCII or it's cure League or something that's very low in numbers then it doesn't really make any sense to store all the numbers with a lot of zeros in between because that's just a waste of space so if you would store all the text in in two bytes the first of all you could only store 65,000 individual characters per position but also it would be very wasteful so the utf-8 is the most common encoding where you can sort of make the individual character larger bigger depending on how much how large the number and it is so ASCII for instance fits into eight bytes so it beats kirill 'ok i think it fits into two if I'm not mistaken and it grows up like this you see s2 on the other hand is basically the original Unicode encoding its each character going from zero to sixty five thousand and there is no support whatsoever to encode characters in it which are larger than 65,000 so if you could if you take a standard Python to interpreter and and it's compiled with ucs-2 then accessing the first character second fourth and so forth will will give you exactly the character but then if I encounter an emoji the emoji will split in half so if you would have to exit the first part in the second part of the emoji independently and so if you want to deal with emojis in Python 2 you would have to compile it with UCS 4 which takes them 4 bytes per character which is enormous waste in particular is an enormous waste because the largest character which can ever be in Unicode is only 21 bits I think yeah tiny 1 bits which is just slightly about half of it so you will have the vast majority of a space unused and so why why do we waste so much memory in peyten why why do we not just use utf-8 or something like this now because we guaranteed that if you access the first character you will get the first character and if you access the second character you will get the second character and the only way to do this in in utf-8 would be to scan character by character by character to figure out which one you want to access whereas if we store it in museu stew or you see s4 then which encrypt the number of bytes and to where the characters and there are many arguments we made about a system like this versus other systems but I would say in the last 10 years it has it has emerged that utf-8 is the way to go and a lot of newer languages actually embrace this a little bit so for instance go is UTF everywhere and it doesn't give you character access same thing with rust i think swift might be doing something similar but at the very least apple never really gave you this this direct access to character so the line which didn't give you this idea that you could do this efficiently and it's kind of disappointing that in peyten be denied the nice thing is we we fix this in patents free a little bit we're depending on the largest character in a string it gets bigger or smaller but it still gives you this guarantee and this guarantee is only given to you because the interpreter decided at one point that that is a reasonable thing to do it was not I don't think there was a conscious decision in doing this I think it just happened and this is sort of a a general story in pipe number D the language that we use is pretty much exactly the interpreter as it exists and this on the one hand makes it very hard to do optimizations and it can get really weird if it is synchronized but it doesn't have them very often but on the other hand it gave us as Python programmers I would say a very different way to look at the language compared to many other programmers um if you JavaScript programmer or if you are ever well maybe not but if you Travis could program at the very least you you take the language and you use it and you live with the limitations of it and maybe over the years you realize that there's so many limitations in it and you go to browser vendors and ask them to give you a little bit more access to the language but it stays it stays very high level it stays very very sandboxed and contained where as important because the interpreter has always shined through so directly a whole bunch of stuff has happened which i think is quite unique the first one is and I think that's for me the biggest reason why I like Python so much like even though I know that all of this exposed interpreter makes it hard for us to ever get rid of the Gil makes it hard for us to ever do certain optimizations we were always kind of locked into this sort of language that works in this way but on the other hand we have so much power in our hand because we have access to the interpreter that this is like who uses this get frame or has heard of it a shocking small amount of people the escape frame is sort of the secret sauce of what makes a lot of piden code spicy I wouldn't say good but basically what you can do at any point in time you can ask the interpreter give you the internal state of the current call site and this returns a frame object and this frame object exposes almost the entire interpreter state at this point in time which from a performance point of view is nuts you shouldn't do that like if if you would go into like designing a new high-level language everybody will argue that this never give this thing this thing is slow it's ridiculous like just don't expose it it shouldn't exist peyten exposed it very very early on and code started using it a lot and because code started using it the the core language developers were forced to keep it at least somewhat performant and something like Swiss CAD frame also exists in our languages like Ruby and in Ruby it's really really slow because nobody uses it so to understand what to escape frame is it gives you a frame object and with the frame object you can access everything that's there so for instance you can access local variables you can under some circumstances change local variables you can see the func the file name while you're currently executing you can see what exception has been on this frame and so forth and this has been abused but also heavily used so for instance there is a project called soap interface where you call a method called implements and it goes into it looks at the current call side and then goes one up and then it asks the class to inject a new attribute into it this is something that doesn't make any sense from a from a calling point of view but they just decided to do that the warnings module depending on why you call it will anchor the warning to a different code location so for instance you can say I want to warn that this method is going to be deprecated wanted warning to be relevant to frames up it does the sort of stuff at the inspect module is full of this stuff logging for instance will skip frames in the logging modules so that it locks based on the location where you invoke the method yourself and not where the method was defined and then like so much debug support so for instance does anyone know twisted manhole probably not it's a one person the ISA is a very uncommon thing but there are so many versions of this that people use inside companies even where you can take a running Python interpreter and it just runs there and it does its thing and then something weird happens in it let's say you have a computer game and it just has a server process it runs there it emulates a game world and you don't want to stop it but it misbehaves slightly you can actually attach a debugger to the Python interpreter and execute Python code running in there and figure out what it does at any point in time it gives you all this state of the interpreter and you can do really interesting things with it so when this I work for century and century is it's a system where you can report crashes too and we get all of this information out of a crash at runtime from the interpreter so we can we can not just report whether crash happened and what the other frames were we can report every single local variable that was there we can report how much time was spent that we can get from the interpreter state this sort of flexibility is not available in most other languages and as a Python programmer we are kind of used to this because even if you don't use it ourselves if you write a triangle application and it crashes the Chango debug page comes up and it prints you out all of the states that was there like the framework Chango depends on having access to this same thing with flask if there's an exception they will print out the secretaries in floss you can even execute your own code in an HTML page and figure out more stuff that's there and that wouldn't be possible if this wasn't exposed the the other thing that sort of is exposed in SUSE modules which is nice on the one hand but it's not so nice for us this modules is whenever you import a module it's cashed in this special attribute called C's modules and you can it guarantees that whenever you import this module every single other thing will see the same version of it so if you import the CGI module in this part of the application and import the CGI module and the other part of the application then they both will see the same one and the reason they see the same one is because the the module is cached in this modules a tribute and the the downside of this is that everybody expects it to be there so at any point in time people started saying that oh I have a class called X it's from module foo and I can just go Sue's modules foo and I find my class in there pickle uses this to re-imported stuff copyright uses it to make copies of things people write import functions like this import module module under under input and under module and then they look for it in Sue's modules um this is exposed interpreter state you don't have that in many other languages and the downside of this is obviously that we will never be able to change this now because everybody expects that this is the case so there have been many attempts for instance to allow multiple versions of the same Python library to load in the same interpreter and it seems almost impossible to do now because we would have to make Sue's modules also have a version identify or a scope identify or something else and they are not just the module name so the upside is we can discover every single module the downside is we can only ever have one of them at the same time and because everybody depends on this and especially pickle and we you have to imagine that when people pick a lot of the objects they are pickled out for eternity and as soap has a database which is entirely based on pickling objects if you change pickle now you're going to break all the pickles which were ever created so until pickle dies we will never have version sub modules this is disappointing this is a this is a part which is part of the reason by getting rid of the the global interpreter lock aside have you ever wondered probably not but I have wondered for a long time why int is a type and a subclass of integer class the only one I wonder why this is the case you can show hands nope so I wondered why why is one thing a type one is the other thing is a class like how does it make sense like theoretically making an empty subclass of an integer why does it make a different thing and the answer for this turns out to be that this by the way in Python 3 looks a little bit different I think they both said type or class now I'm not sure but they still work the same the difference is that a type is guaranteed to live always exactly the same memory location and a class sits somewhere on the heap so you have to mention this like this the pipe interpreter has a lot of internal types like integers floats strings Unicode object lists tuples and so forth and one of the things that the interpreter internally has to do it has to compare them like for instance you had this example before where you add two numbers together if you add two numbers together the interpreter would like you to know are those - exactly integers so how does it know if a number is exactly an integer the reason for this is that there's a PI into type which is the internal representation of integers and it's a global variable and one of the things let's go variables is you can only have off one of them so you can't have two pi in types and so what the interpreter does is it checks if the type is exactly the memory address of that integer so what does this mean well it has a very it has a very annoying consequence which is that imagine this imagine you are nginx and you would like to use Python as a scripting language then what you would like to do is you would like it to have let's say four or five frets running and each of them has an individual Python interpreter running let's find this how a JavaScript work if you use a browser each of your tabs might run an independent JavaScript engine and these engines they don't share anything in pite we can't do that because if it would run to PI interpreters they would see the same Titan type and this is not a constant this can change there's a reference count on it people could patch their own things on it if they managed to bypass this interface and this is I think the biggest reason what Python was never a language that was embedded in things because it if the interpretive design internally never supported this idea of having multiple interpreters running which is it is pretty disappointing and I try to fix this I try to go in and change all of those types and go through a level of indirection to make it possible to for instance have one fret running with one completely independent interpreter and the second phrase running with a complete independent interpreter and it just it's a you you start going down a rabbit hole of things you can't change because everybody depends on this so what are the consequences of all of this my biggest one is we can't get rid of the kill because we have for instance only one integer type and it's shared across everything and we couldn't for instance a we would like to have one thread running here with one interpreter another thread running here with another interpreter and they don't share integer types but also we expose on the interpreter level reference counts and other things that are in every single C extension and it's so ingrained in the language that it's just at this point something we can't take away with a change in language we can't change the internals because the internals have been changed and as mentioned before it's really hard to do multi version libraries now because these modules is an exposed interpreted detail and it just goes on and on with having this like really complex interpreter which shines all the way to the highest part of the language and it's it's something that I understand at this point that peyten is a language is just this is how it works and we have to live sort of with this history of how the interpreter works and heightens we also couldn't really change that but it's something that maybe as you go in and then discover other language is something that you wouldn't want to repeat sort of it has obviously strong implications on the speed of the interpreter and on the concurrency that you can have with it but it's I would say in the in the current state of how we how we have the language it's not something that really stops us from doing from doing a lot of of the work that we do with it but we have sort of a ceiling of where it's hard for us to get over yeah that's sort of what I have here how the parallel I mentioned we can't share these static types what we can't separate the static types because they're shared and we also can't really go more dynamically because the interpreter already has to cheat in certain places like for instance Ruby can do making addition into a subtraction if it wants because there is no fast path for integer addition but important to it to stay fast despite all of this complexity we lost a little bit there yeah I'm sorry I'm rambling through the slides a little bit because I'm realizing that it's actually very hard to convey this overall message that how the interpreter works internally in a sense that is sort of easy to understand and what the ramifications are of it which is why I would propose if people have questions to it feel free to wildly go of what the actual topic of this presentation is because I can see that I think it's maybe not as clear what sort of the point of of this leakage is and and how it affects people's code yeah that's it so what you basically just since you said that it's maybe hard to get the point across so you what's the point actually that you should expose as as small a public API as possible basically because because that's what's happening basically when you expose all the internals and people depend on them and yeah you can't take it away that's the problem yes we have to the correct thing to do for language is sort of a JavaScript it in many ways why you keep the public exposed interpreter API very very small to non-existent because then people will never get into this sense where they can start abusing it and and with peyten as mentioned we have this exposed API and now it's hard to change it so yeah it's this this idea of having a small public API best exemplified I guess in the interpreter and then again what about the special cases like you Illustrated for the addition I believe there are lots of special cases in the dict implementation what what's the alternative I mean if you lose them then then it's slow right so what is the thing I actually agree I I can see why I got all of this special handling to make it faster but it did it in a really wrong way because if you consider how a computer nowadays works a lot of the code is fast because it's cached it's something it has a very little footprint it goes over the same amount of memory all the time we never it doesn't invalidate caches so it would be in the interest of the interpreter to keep the hot code paths as small as possible so for instance if we would get more data for dictionaries all the time then we could make the dictionary code as fast as possible and we are spending more time in a dictionary code than in all of the rest of the interpreter loop so I think that by reducing the overall footprint of sort of the complexity that goes into one Python object or the operations it exposes then we could actually be both more dynamic and more flexible as a result and also faster because it becomes easy to reason over it and we have less code that we have to execute all the time so it has higher chance of being cached it has higher chances of being correctly predicted it's I think the irony in this is that we are less dynamic than Ruby but we are probably slower Pro up cause than Ruby is as a result of this because it was obviously slow so people started doing these optimizations for the common cases but we could have made the uncommon case significantly faster by simplifying it thanks very much was insightful so that's it then thanks Harmon
First, this is a very good talk and he makes a number of valid points.
The reason the talk itself was unclear, in many respects, is that the 'problem' he's talking about (exposing internals) is also a powerful advantage - allowing us to implement things in Python which would be difficult or impossible to implement from the dynamic side of other languages. He, himself, clearly uses these internals to advantage in his own code - and enjoys the flexibility that they offer. He also bemoans that exposing those interfaces makes the underlying interpreter difficult to change. These interfaces are clearly a double edged sword, and should be treated with respect.