String Literals in C++

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey what's up guys my name is Archana and welcome back to my figure of blood theory today I'm going to be talking all about Turing little and this kind of extends on from strength which we talked about last episode if you guys haven't seen that click on the card or the link in description below this is just going to be kind of more of an in-depth look at what we looked at in that video so trend literal is basically a there is characters in between two double quotes so if we jump in here I can define a string literal by writing double quotes and then something in between them such as China there we go that's a string literal now what this actually becomes depend on a number of factors at the very basic case what this actually is if you hover your mouse over this is a contra array of size seven now straight away you may notice that there's actually only six characters here right why is it akancha r87 the reason that the case is because there is actually one extra character at the very end of that string and that is a null termination character which if we were to write manually would look like there's a backslash zero or alternatively it can just be spreaded an actual zero and the reason this is needed is to signal the end of a string what's trying to a zero not the character zero if we write the character zero that actually has a different numeric value altogether if you write a backslash zero so that is what that null character is an actual numerical zero that's the early signal the end of the string so if we if we want to do something like put that back slash zero in the middle of our string we would actually break the behavior of this string in many cases let's take a look at the standard c library for a bit I'll just include standard little H which includes some three functions if I assign this to something now as we hover our mouse over this you can see that it's a concentrate so I might just assign it to that culture name put it over here if I put a breakpoint over here hit f5 like I can inspect my memory put us in the name of the string which is name you can see that I'm going to make this have like 16 columns okay so you can see that we have cheddar printing over here we have kind of two adults here the ASCII representation which represents the two zeros now the length of this tree if you actually count the character that's going to be 7 because the backslash 0 is an escaping character means it just counts as one character however this is set to a because we actually have an implicit backslash 0 at the very end which we have an implicit backslash 0 at the very end here which signals the end of our string so if I want to actually see what my string is by apps running Stirling which is a C function which will basically tell me how long my C string in El Paso named as my string and we'll see what values are printed you can see that we actually get the value 2 reprinting here however called Cherno is where lobos and 3 characters the reason for that is because it only counts two characters up until that backslash 0 because as soon as it runs into 0 it seems that's it at the end of the string if we remove that 0 and rerun this code of course we'll get text which is like the best dream and not ace which is actually what the array happens to be right now so at its core tight this is a comp char array however we can also assign it to a conch-shell pointer that's totally fine the CompTIA promises that you would be manipulating the strength so I won't be able to do something like name to equal a rise because its marks have comp now if I removed comps this appears to be possible and I actually accidentally said that it was in the previous strings video however it's not well it might be it's something called undefined behavior which basically means that the C++ standard doesn't define what should happen in this case so some compilers may generate valid codes to this but you can't rely on that so basically this is Bend other compilers won't even let you compiler code msps C which is Microsoft's visual studio compiler which is what I'm using right now compiled us with no problems at all however compilers that is claiming only I was good and the reason that this is undefined and not allowed is because what you've actually done here is you've taken a pointer to the memory location of that string literal and threatened literals are stored in a read-only section of memory let's talk about that for a minute let's jump back into our code I'm going to open up my compiler settings here make sure that on ball configurations I'll go to people flood and output files and make sure that this is set to assembly with source code in my assembler output I'll hit OK and I'll switch to release point over here just to simplify the assembly output and I will build my project next I'm going to go to the output directory to this so we have release and we have main dot ASM this is Amanda if you be file there for major SM all drivers individual studios that we could take a look at be generated assembly go over here you can see a section called comped segments and we have this Cherno listed here what you see here is actually be identified the laser and the compiler can actually reason about this trainer string but the data is set over here to Turner and there's a contact in hand so basically this trend is stored in a comped section in our binary if you open up the exe file that you get from the southwest changes occur to do something a little bit more useful so I'll get it to print out turn over to the combo that way we're actually using this string so when we build this in release mode the compiler won't optimize away our Cherno string now if I find the exe file and open it in something plushes HFD which is just a hex editor you'll see that we literally have shadow defined here inside our binary those characters are embedded into our binary and when we reference this is actually referring to a current data segment that we are not allowed to add if you do try and add it to like that even though it'll compile just fine if you do try and add a code like this in release mode if I hit f5 you'll see that even though we try to edit this it actually didn't work the third character is still EU not a now if we were to run the same code in debug mode as soon as we try to actually execute this code we would get an exception thrown because you can't actually do that we're trying to write to read-only memory if you did want to modify this for some reason you can you just need to define the type as an array instead of a pointer and now if we run this code we can hit f10 and you can see it works just fine if we look at our output here we have Jenna with an a please never spell it like that devil so to sum this up you cannot write code like this if undefined behavior and you should never do it other compilers will most likely warned you about this raven just throw an error and prevent you can compile a code like this because you shouldn't be doing this from season for the webinar onwards some compile such as clan will actually only let you compile const char pointer if you want to compile a child pointer from a string literal such as this you will actually have to cast it into a child manually however MSP features an inter carrot always been to fine so basically if you clear code like this you should really always be declaring a comped just remind yourself ignore it you can't actually right click like this okay similar to fun fact at our characters we have char of course everyone by type however there is also something called w chart under 40 which is called a white cabbage now let's just go over this types really quickly so we have a wide character pointer i'll pull his name to return is f is equal to china will get an error because it actually needs a capital L appended to the front of it this signifies the following string literal is made up of white characters tipis of 11 also introduce a number of other types such as child 16 underscore team which again you will need to set equal to a low-tech you and then in quote your text don't forget the pointer and then we also have a charm 32 underscore t name for which has an advocate you and your text you can also define the normal control one with a u a prefix like this if you really want to enforce that and there are compiler setting in which control whether a char or a WR is used there are a lot of things we could talk about again I don't want to really go into this too long I think this brief but basically a char is of course a one byte per character thing a child 16 is a two bytes per character at 16 bits a character string and then we have 32 which is 32 bits character or 4 bytes per character this is basically me to adhere with with utf-32 this one point to adhere with utf-16 and then we have utf-8 which is contra now the question is what is it what is the between WHR and shall 16 because they appear to both be too biased avocado salsa to both their character now but I keep saying to about the character however that's actually up to the compiler to decide it might be one bite it might be two bites that might be 4 bytes now in graphic to provide never seen it to be won by before it's usually either 2 or 4 it is 2 on Windows and 4 on Linux and I expect Mac as well so it is a little bit variable if you definitely want a two by string you can deal with HR 16 which is always going to be 16 bits or two by thinking about weird things to prepare to string such as UNL you can actually also attend it things to strength so there is something in table 14 called FTD's string underscore literals which give us the number of functions just for convenience in the previous video about strings I wrote code such as this we have a CD string name 0 equals Cherno and I said that if you wanted to append some other string onto this one then you actually couldn't do that because these are string literal of course which as you can see are arrays or pointers so we can't just actually point it together my solution was to surround it with a constructor to basically methods of string however since they were 14 there is something inside the string which were library which actually kind of makes that a little bit easier maybe depending how you look on it and you can actually just add the letter at to the end of your string and what this does is it's basically a function and if you hover your mouse over G forever you can see it's an operator function that actually returns a standard string now similarly to this if you were to put you age at the front that define if you put L at the front then you get a white string which means that this becomes a W string and it also has to be a white string and you can also do queue and allocates duty to be able to do things like you 32 strength for various character lengths so yeah confuse about strings s are we all one other thing that we can actually prevent that string literals is the letter R so I can write it const char here and I'll start this off with the letter R at the front what this means is to ignore a character so in practice what it's useful and we actually haven't have this printed in here is multi-line string so if I wanted to have something like line 1 line 2 line 3 line 4 it makes life a little bit easier because without this we would have to either do something like this where we actually append all stuff together or we could also do on sharp es equal line 1 and then just simply write line 2 line 3 you can see that these don't actually have pluses or anything in between them and additionally we would actually have to put it back slash n onto each of these if we wanted them to actually be on new line so this is very common if you want to actually just write a full paragraph of text or maybe some code in your code as a string and you want to be able to define it fairly easily and this is a lot more work than just being able to write your code freely like this so R is a quite useful and the art after all now the last thing that I wanted to mention was again about the memory of string literals and how that works string literals are always stored in read-only memory right always just because we write something like char name with an array and we set it equal to zero and then we decide to change something like I did earlier and will print out out to the console if I actually take a look at the code that this generates I'll compile it in release mode I'll open up my directory here and I'll drag in this assembly file into Visual Studio will search for Chenault and you can see the channel until defining the constant however we're obviously editing it so how is that working let's scroll down ok great here is our function let's take a look at the code and look at what is happening we are learning a memory address over here which is the location of the name variable now if you look a little bit up here you'll see we have name which has an offset of minus 12 is basically variable that is declared on the stack again this is getting a little bit complicated so we'll have videos about details on this in the future but for now this is our name variable basically this address is being loaded into the EDX register and we are then moving the Cherno this is the location of that shadow string literal in our read-only memory we're putting that into the a apps register again if it's a SIDS because the compiler is trying to do optimizations here this is release most of my ability bit hydrators code and then we're learning ax into that name variable so what we've actually done here basically gotten that that Cherno segment and we've copied it into that name variable so we've actually created an actual variable here before if we don't write this code what we're trying to do is modify the pointer that points to that constant data segment so we're actually trying to write it into the constant data here is create another variable and you can see on this line here we moved a numeric value 97 each that name variable added offset up to that that's what this line of code here is doing 97 is the numerical decimal representation of the character look at a so that's it Michael pretty deep hope you understand now how child pointers and all that stuff works the train whistle the general it's really important to understand itself because you're probably going to be dealing with strength for the rest of your programming career as always I hope you guys enjoyed this video if you did you could hit that like button just so that I know that you liked my video you can also support this series on patreon or convalesce the channel if you if you want to you get some cool rewards such as DMS contribute to planning of these videos as well as getting these videos early sometimes and being drops and all that fun stuff I'll see you guys later good bye [Music] Oh [Music]
Info
Channel: The Cherno
Views: 114,954
Rating: undefined out of 5
Keywords: thecherno, thechernoproject, cherno, c++, programming, gamedev, game development, learn c++, c++ tutorial, string literal, c++ string literal, strings, string, char, array, wchar, wstring, unicode, text
Id: FeHZHF0f2dw
Channel Id: undefined
Length: 14min 6sec (846 seconds)
Published: Wed Aug 23 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.