Golang Rune - Fully Understanding Runes in Go

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

For an authoritative explanation, read this:

The Go Blog
Strings, bytes, runes and characters in Go
Rob Pike
https://blog.golang.org/strings

Rob Pike is a coauthor of the Go programming language and a coauthor of UTF-8 encoding.

👍︎︎ 22 👤︎︎ u/ppetreus 📅︎︎ Apr 26 2021 🗫︎ replies

What kind of IDE are you using in this video?

👍︎︎ 1 👤︎︎ u/helotpl 📅︎︎ Apr 26 2021 🗫︎ replies

Captions

when we write in the real world sentences are made out of words words are then made out of letters or characters where we are writing a computer program not just in go in other languages as well it's the same except that now sentences are called strings but these strings are so made out of individual characters for example in the most basic program hello world program we would print out a string made out of individual characters of hello world and these characters are known as ascii characters but what is an ascii character an ascii character takes a number from 0 to 127 to represent each one of the common characters that we use as you can see from this table and 127 is a number that we can represent with seven binary bits however we have some problems here an ascii table is great but it's only great for lan alphabet-based languages which english uses but not so much for the rest of the world even for other european languages like french german and many more so people started using numbers from 128 to 255 for extended character sets which push the number of bits from seven to eight which is a full computer bite standards that use 8-bit characters like iso a59 became popularized and for the most part they were great however there were still problems there were way too many other languages like chinese for example they have way more characters than just 255 of them as a result there have been many other standards popping up in attempt to solve this problem now fast forward to present days the most widely accepted encoding is no other than utf-8 utf-8 is a flexible encoding that allows single bytes and multi-byte characters to coexist in one single sentence or string now what does that actually mean [Music] now here we have some gibberish on the side here that doesn't seem to make any sense but this is the utf-8 encoding and how it looks like under the hood each one of the characters here represented the binary bits that we use to store characters you can see that we have the actual ones and zeros that are fixed a bit and we have the accesses which represent the wild card base that we can use to store the exact characters that we want in utf-8 we have four types of characters and they are single bytes two bytes three bytes and four bytes of characters one byte is a bit that's why you have eight of these for one single byte and sixteen of them for 2 bytes and so on now how would you differentiate here between 1 2 3 4 by characters and as you can probably already guess you can differentiate them by their starting a bit for a single by character you're always going to have a starting bid of a zero and if you are a two by character then you would have a one one zero as your starting bits and one one one zero for three by characters and one one one one zero for your four by characters and with the caveat that if you are looking at a byte uh in the memory and it starts from with a starting bit of a one at zero instead of anything else and then you know this is a only a storage byte in the middle of a two three four bytes encoder and that leaves us all of these wild card bases to store any characters that we want and that is a total of 21 bits leaving us allowing us to store over 2 million characters instead of at the 255 with a single byte encoding and this is the encoding that golang natively supports now if you have any previous programming experience when we are only working with ascii characters we typically use something called char to store these type of characters and java would be one of these languages as you can see from this example here we have the type hr and we are declaring a variable called c to store the character c and we are trying to print out this character here so if we try to compile this and this will completely it would be completely fine there's no error popping up however if we use a character that is outside of the ascii table and that is within the utfa encoding then it might or might not give us the same thing so if we have a hello world the first character of a world here let's see what the compiler is going to tell us as you can see the compiler is not going to negatively support a utfa encoding character on the contrary if we move back to golan which is an id if i go and try to build the same program with exactly the same instructions that we can see that it is going to build and without giving us any error like we did with a java but how is this even possible this is a multiply character and char as we all know it is a single byte so if i actually try to print out the type of c here with a printf we will actually see it is actually of a type int 32 and that is because if we actually explicitly declare this as its original type the compiler is going to tell us that this is redundant because it is already of a type rune and if we actually go into the implementation of roon we can see that this is the underlying type of rune is actually in 32 which is a four byte character which is how much a space a utf-8 character requires instead of a char or any other single byte type now you might be thinking well this implementation that kind of defeats the purpose of utfh right we talked about that utfva characters uh encoding allows a single by characters and multiply characters to coexist in a single string and that is exactly what it's doing here so if we actually declare a string that has a mix of single bytes characters like hello and we have a multi-by characters in the same sentence like well in chinese and if we actually try to print out the string here we can see that it is indeed printing out hello world and let's also try to make it look a little bit prettier by adding a new line here and put it out we're seeing hello world in a new line so let's also try to print out the length of the string here the implementation of length is going to if we are passing in a string that is going to give us a number of bytes for the string being passed in and let's go back to the string here and look at these first five characters or first six characters because a empty space is also a ascii character as well so these they should be one byte each because they are ascii characters right so the length of this should be six however what about for the world in chinese here let's go ahead and print this out and it's actually not going to give us eight six plus two because this is not these are not uh single byte characters so if i try to be a little bit more explicit here and try to print out these two characters in chinese we're going to have a six and that means that these one each one of these characters are three bytes characters and if we add the sixth byte here for these two chinese characters and the six eight characters the six ascii characters and then we are going to get that the actual length of the string which is a 12. and that means that we are actually using utfa encoding natively as as long as we are working with the strings here but when if we are working with individual new characters that is going to give us a 4 bytes a character which is the max space are required to for a utfa encoding character so yes using rune that may or may not be the most memory efficient but we can use the memory efficient strings instead of runes when writing anything that is more than just individual characters also note that if you are declaring a character like this even though you might pass in a ascii character that is supposed to be one single byte it's not going to give you a single bite character by default but a rune which is a 4 bytes a character instead though if the requirements are that you need to be as a memory efficient as possible and you know for a fact that you are only using ascii characters and then you can utilize a different type in go called byte and then pass in a ascii value maybe like 65 which is a capitalized a and if we actually try to print this out print out the character and also the type of the character and also we have a new line here then we should be able to see the character actually being printed out and that the type is going to be an unsigned int 8 which is what the underlying type of a byte is which is a single byte type so that was a pretty lengthy but a full explanation on her wounds in niko so if you have learned anything at all in today's video make sure to leave this video a thumbs up and consider joining the golang dojo by hitting that subscribe button so that we can become and go land ninjas together with that let's shout again in our next [Music] video you

Info

Channel: Golang Dojo

Views: 4,136

Rating: undefined out of 5

Keywords: golang, golang 2021, learn golang, go lang, golang in 2021, go language, golang tutorial, go tutorial, go programming language, golang tutorial for beginners, golang crash course, golang for beginners, golang rune, go runes, golang runes, go rune, golang characters, golang utf8, golang utf-8, runes in go, golang rune values, golang rune explained, Fully Understanding Runes in Go, runes in golang, runes, understanding runes in go, understand golang runes, runes for golang

Id: 7isCXLWPTqI

Channel Id: undefined

Length: 10min 51sec (651 seconds)

Published: Wed Apr 21 2021