Split C strings into tokens with strtok.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey folks today I want to talk about how to make string parsing easier in see using STR Tok we'll talk about some of the things that that are great about it some of things that are not so great about it and some of the things that you got to watch out for strings and CR simple if you're coming from another language they're not objects they don't have methods a string and C is just an array of characters with a null character at the end so that you know where it ends that's it the most common functions for manipulating strings you're gonna find them in string dot H and string Tok or STR Tok is one of them that a lot of my students don't realize is there so I want to talk about it string Tok tokenized strings that means it takes a string and splits it up into individual chunks using some delimiter that you provide so this is common um this is a common enough task that every language has some variation of it for Java it's the string tokenizer class for python and ruby you have a split method for each string and C has STR tok so so let's let's take a look at the first time you call it you give it a string that you want to tokenize and a string containing your delimiters that you want and each call to STR tok returns one token until there are no more and then it returns null so so just so you can see what happens let's look at this program this is I mean just a simple example program just for illustration purposes I've so so I have a a input string that I want to tokenize but I'm also using a delimiter that actually doesn't show up in the string at all so so now if I run it well first let me compile it if I run it then you're gonna see that it just returned the same string and I'm printing out the pointer I'm printing out the address of this string so that you can see as we go that it's actually it's not copying the string it's just returning the string that I gave it in the first place so okay so let's come back to our code here we'll for illustration we'll just leave this example right here but now let's change up the delimiter and use a comma okay now now you can see that yeah so it's still the first one's returning the same string the second one is returning just that first token so it went until it found a comma and then it just returns ray okay so if I want to tokenize the whole string then I simply can put it into a loop but now instead of this instead of giving it the original the first string I just pass in null so when I pass in null and basically saying hey remember where you left off on the last one and keep going till you see another comma okay and so this is going to go through basically and tell token returns null and so you'll see here let me slide this up a little now if I run it so you can see now each time it's returning the individual tokens and you notice that two of my two of my elements were actually separated by colon rather than comma sometimes you end up with files that you know with text you have to parse that has weird structure like this and so I can simply just go in and add colons to my dress delimiter string and that's just saying hey look I want you to include colons here so yeah so now we got all of them and we have all the names parsed out and that's that's cool now one thing I want to point out is that these addresses you notice that not only does it return for that token it returns a pointer to where that token started but each of these others each of the other tokens are actual pointers into this original string a one thing to keep in mind is that str tok is destructive so str tok actually changes the string so just just to give you a sense of how this works if at the end i decide to print out let's just print out our original string just called input then you're going to notice that that original string down here is only going to print out ray all right that's all that's left and really what's happening we can take a look at this and see so let's break on line 26 okay so so we run to this point now if we if we look at what's going on here and examine our memory so we want to look at all 64 bytes and X input okay so so if you look at the at the output what you notice is that there's the bytes for Ray are here and it inserted a null character in the middle of my where the delimiter had been and then returned a pointer to the next token which starts right here and runs over here and then another it adds a null character again so it's basically going through that original string and breaking it up by sticking null characters into that string and then and then just giving us back a pointer into the middle of the string so just a note if you're gonna use str tok you want to make sure that if you if you need that original string to be intact that you make a copy of it before you start calling str tok that's one important thing the other thing is that str tok str tok stores the where you left off in a static variable and so that means that in in essence str tok is not thread safe so if you have a multi-threaded program and more than one thread is going to be calling str choke this could be really bad I won't give you an example you know a piece of code here just because of time but but if you look at the documentation for STR toke you'll notice that there are two there's STR token there's STR toke underscore R and that's basically underscore R is the reentrant version and so that version if you look has an additional has an additional pointer which keeps track of where it left off so you're basically saying hey here's where I want you to store the last location and and that way this allows you now to work in multiple threads and not have things get corrupted and blow up on you so yeah so anyway that's that's STR toke I hope that's useful for you it's also fairly simple if you think about what is doing it's fairly simple you could very easily write your own version of STR toke if you want it to behave slightly differently or if you want to change it so that it actually makes copies or that it's not destructive I won't do that here but I'll leave that as an exercise to the viewer but but yeah there's a fairly simple operation it's all open source you can find a bunch of different implementations out there but I just want you to know that it's available if you're in a hurry and you need to break up some strings you don't have to do it yourself so that's all STR toke it's super useful have fun with it play around with it it does have some problems and hopefully I've made those clear but yeah put it to put it to good use and hope you do something great with it
Info
Channel: Jacob Sorber
Views: 27,557
Rating: 4.9409838 out of 5
Keywords: strtok, strtok in c, strtok_r, strtok in c programming, strtok explained, split strings, split strings with strtok, tokenizing strings, splitting c strings, programming tutorial, how to split strings, how to strtok, tokenizing with strtok, break up strings, how to break up strings, parse strings, split string, c programming, string tokenizer, c programming tutorial, c programming language
Id: a8l8PwCzw20
Channel Id: undefined
Length: 8min 26sec (506 seconds)
Published: Fri Apr 21 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.