Advanced JSON Handling in Go

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Jonathan Hall in this video I'm going to talk about some Advanced Techniques for Json handling in go this is material I originally prepared for a Meetup where I presented a few months ago but the recording from that day wasn't very good quality so I decided to re-record it here I also go into greater detail on these topics and related topics on my blog and in a book I'm writing called data serialization and go the book is available for sort of pre-purchase on lean Pub you can see a free sample and if you're interested you can of course buy the book and as new chapters are added you get the additional information for an electrophy let's get started in this video I will talk very briefly about what Json is I will talk about basic usage of maps instructs in go when dealing with Json I will talk about handling of inputs of unknown types I'll give some examples of that in a moment and handling data with unknown fields these are based on some questions I see frequently on stack Overflow in other places so I hope you'll find at least some of this to be valuable of course I expect most are familiar with what Json is it's the JavaScript object notation as defined in RFC 8259 uh basically it's a human readable textual representation of arbitrary data compared to goes type system it's very limited of course it has just five basic types plus null but it is used for many many things from configuration files to rest apis uses are endless of course there are many alternatives to Jason and most if not all of the techniques I will describe in this video do apply to others like yaml Tamil message pack possibly even XML and protobuf um but Jason I I like to use as a as a baseline because it's so common and uh practically everybody deals with it at some point so just to get started um when you create Json from a go object in this example of a map of string to string it's pretty straightforward you can more or less pass any arbitrary data type from go into the Json dot Marshall function which will then produce valid Json now one note throughout this talk I often skip error checking which is not a good idea obviously in production never do that unless you have a really good reason uh but for this video to make the the demonstrations a little bit shorter I often skip error checking um just be aware of that so this simple example program takes a go map of string to string with the key of Foo and the value of bar and converts it to Json which is pretty expected food to bar of course we often don't use just maps and go we use structs and that's where Json usually gets a little bit more interesting and easier to work with as well so here's a really simple example I have a person strut uh that contains a name age and description fields and using the Json tags I can tell the marshaller and the unmarshaller the Json names of the respected Fields so they don't have to match exactly and then I also have a fourth field here called secret and this is unexported as you can see because it does not begin with a capital letter this is a common mistake that I see happening a lot I probably see anywhere from two to three maybe four questions about this on stack Overflow each week from people who are new to Json handling and go so this throws a lot of people off but let me just talk about it here when you have a an unexported field in a struct no package outside of the field outside of the package that defines it can access that struct and that includes the Json package in the standard Library which is the reason that unexported fields are excluded when either marshalling or unmarshalling Json and go so while this is often seen as a limitation or a hassle I actually see it as often a benefit gives us some flexibility we can Define structs with unexported fields that don't get marshaled into Json when that's what we want in in some cases it's also useful when unmarshalling Json we can exclude some data we'll talk about both of these in a little bit um mainly I just want you to be aware of this and and consider it as you go forward so now that we have our our structure defined uh here's just an example of it we define a person whose name is Bob who's 32 years old and who has a secret and if we Marshal this we will see that we get the expected Json output we get name is Bob ages 32 and the interesting part here of course the secret is excluded now unmarshalling can often be a little bit trickier and that's just because it takes a little more forethought to map arbitrary Json data to a ghost Rock but here's a really simple example we have Jason the same Jason we saw a moment ago being converted back into go and we get a map to interface a map of string to interface foreign now you often should avoid using just a map a map of string string or string interface when unmarshalling Json and the reason is because those types are just cumbersome to work with of course they're necessary sometimes but whenever it's possible and when it makes sense you should use a struct instead a struct with defined data types in the defined structure so here's an example of that using the same person struct we had a moment ago this time I'm converting Json into a ghost Rock so here we have our sample Json input which is as before name Bob age 32 and the contains a secret now let me execute this code as we might expect we get a go struct with names of Bob ages 32 and as described a moment ago secret is blank even though it's included in the Json because it's unexported the unmarshaller cannot access that and it's left alone so structure nice and maps are useful but what happens when you don't even know the data type before you unmarshall it let's talk about some examples of that one common example that I've dealt with many times is when you get a either a number or a string representing a number some apis are very inconsistent they may return one or the other depending on the face of the Moon maybe it changes between software versions maybe it depends on exactly which options you send to the API another common uh another common pattern I see is an API that returns either an object or an array of objects so rather than returning an array with a single object if there's only one it just returns that object alone this is something I've seen fairly commonly in in more dynamically typed languages like JavaScript or Perl or python where that's simple but it's a little more complicated in go but I'll talk about how to work around that and then the other case I'll talk about is cases where you get completely different object types so maybe uh an API that returns one type of object for success and a different type of object for failure I'll talk about that in some some more details there so let's start with this literal number or literal string option so this is an example of something I actually dealt with not long ago we had an API that was returning a literal number and then I guess through a change and upgrade in the service it started returning a string instead of just a quoted number and that broke our our API so how can you deal with both without having to change your code all the time so in this example I've created a custom type called int or string of course please use more descriptive names in your own code but for the example it's fine and then on that type I've defined a custom unmarshaller which you do by defining a function called on Marshall Json and then inside that uh that function I simply unmarshalled to a number to an integer called V and then I assign that integer to the receiver which is called I now the the magical part here is that I remove any leading or trailing quote marks before doing that in Marshall and in a simple example like this that's sufficient supposing I receive a string the quoted string quote one two three end quote for example by removing those quote marks I'm left with just one two three and I can unmarshall it as an integer normally and if I get just an integer that's just one two three without quilt marks then this trim does nothing and I just still on Marshall one two three now supposing I received a string that was not a number suppose somebody sent a quote hello world end quote then I would remove the quotes and try to unmarshall it to this integer that would produce an error and I would return that error so let's watch this in action I've created a Json input here that contains an array of two elements the first element is a literal number the second is a quoted number let's see what happens so now our go result is a slice of into our string the first is a value of one two three the second value of three two one so let's talk about the next example this time either a single value or an array of values so this is common uh in in some apis maybe you're querying a database or something like that and if you get a single result you get it you get one object if you get multiple results you get an array of objects now to simplify this example I'm going to use strings rather than objects but the the principles are the same and of course you can update the code to to use objects if that's what you need so in my example I have a custom type called slice or string underlying value is of type slice of string and then my custom on Marshall Json function so the first thing this function does is it checks to see if the first character of my input data is a quote Mark if it is I know I'm dealing with a string and I can act accordingly so if I have a string I just unmarshall to a temporary variable called V that is a type string and then I set my receiver to a slice that has a single value of that string and I return any error I've received if the first character is not a string then presumably I have an array in which case I have a temporary slice variable I add Marshal to that and then I set the receiver to that value and once again if I receive an error I return that error so if I were to receive an object or a number or something other than a string or an array then this would behave accordingly so let's look at an example my input to test is an array of two elements the first element is just a string and the second is an array of two strings as expected my my go result is a slice or string type with a single element for the first input and two elements for the second input what happens when you don't know the data type so let's use an example of an API that returns one type of object for success and another type for failure and for the our sake of argument we'll assume you cannot determine which is which based on say the HTTP Response Code you only know by reading the Json result so what I've done here is I've created a success type and an error type which I saw my goal of course is then to unmarshall success objects into success and error objects into error so with these two types defined then I create a third type that embeds both of these so this response is sort of a generic wrapper container if you will a wrapper of type that embeds both success and error now in this case I don't actually need to create a custom on Marshall function and the reason for that is that the success and error structs have no common fields so there's no ambiguity when the Json and marshaller tries to unmarshall data into the embedded structs so let's look at an example here are uh our input is an array of two values the first is a success object the second is a failure object so we can see here that our result for the success case contains a success object with populated values and then an empty error object you can see both error and reason are are empty strings and then for the error case we have an empty response and the error is populated but what happens when we have conflicting or overlapping Fields let's look at an example so in this example our API returns a status key for both success and failure so we update our success in error types to include status you can see that's duplicated between success and error but now it's not sufficient to just embed them because the adjacent on marshaler won't know where the status field goes so we have to now create our custom on Marshall function so the first thing this function does is it just tries to unmarshall into success now uh of course if it if it returns an error then we return that error as well but it's not enough to detect for an error we have to also check the value of status because supposing we get an error value it will still successfully on Marshall it just won't have the error field reason here this this will just be blank so after we run Marshall to success we check the success that the status value if it's okay as we expect for Success then we're done we return nil if we get any other value we assume it's an error so then we set the uh the response.success value to its zero value and then we unmarshall to error let's look at our demonstration once again we have a success case and a failure case both of them have status with different values and as before you can see that in the success case we have a success object status is okay and the other values are set and error is empty for the error case we have an empty success or yeah an empty success value and then the error field is set with its own status but the uh the response type I've been using here is maybe a little bit cumbersome you have to always check whether it's the success or the error value within that response that is that is populated maybe you don't want to have two objects you want to have just a single one and then you know deal with that one so let's talk about a way to use a different container type to solve that problem so here I'm still using the same success and error structs as before I'm only changing the response struct so rather than having both success and error embedded I've just created this result field that's an empty interface now let's look at the custom on Marshall function so here I create a value of type success again the one I defined a moment ago I unmarshall to that same as before and also same as before I check to see if the status is okay if it is then I set the result to success and return nil then if our status is anything other than okay we continue we Define a value called fail which is a type error we unmarshall to that and then we set the result to fail and then return nil let's look at an example so we have the same input as before but now our output has only one result value for each each input rather than one populated to one empty so for Success you see that we have the success case in for error we have the error case let's go a little bit further with container options so in this example I'm defining my unmarshalled Json function on a slice of the empty interface rather than on each individual object the way this works is I start by unmarshalling the Json into a slice of Json raw message as we can see from the go doc for the Json package Json raw message is just a bite slice but it has a custom Marshall and unmartialed method that simply pass the data through unaltered and this is useful as you'll see in just a moment where all I care about right now is chopping my Json into individual elements rather than parsing the entire thing that I can process each element one at a time so let's do that here so I on Marshall into my my slice of Json raw message then I Define a success and a fail value similar to before I pre-allocate a result slice of the same length as my my input and then I Loop through that input for each value in my raw input slice I unmarshall to the success struct if this value if the status value is okay then I assign that success value to the result and I continue if it's anything other than okay then I unmarshall to fail and I set that value to the result at the end I set the result to the receiver and a return nil so you can see it's essentially the same logic as before but in a loop this time let's look at the example our output is pretty much exactly what we would expect almost identical to before for the success case we have one success object for the failure case we have one error object of course it can get much more confusing let's suppose rather than just success or failure we have three or maybe four or a thousand different object types for my example I have three I have a person an animal or an address and other than the type field everything else about the uh the object is arbitrary or completely different so I've defined My Three core types a person animal and address and then I create my uh my container my responses which is a slice of empty interface my unmational Json method goes on that container type the responses and same as before I create a temporary slice of Json raw message and Marshall to that now here I've created a variable called header which is of type struct this anonymous struct that just contains a single field called type and then as before I Define my my result slice pre-allocated to the same length as my input now my Loop of course is a little more complicated now for each case in my input slice of Json raw message I unmarshall first to this header and then I I check the value of that header uh the the the type field in that header if the type is person that I unmarshall to person if it's animal and Marshall to animal or this address and Marshall to address and in each case I assign that value to the uh to that particular index in the result and finally as before I assign the result to the receiver and I return nil let's look at an example so our input here has three different objects of three different types and the output result is a slice of empty interface the first element is a type person the second animal and the third address as you would expect now while this last example Works fairly well and it's perfectly good for for small bits of Json it's not particularly efficient and the reason for that is because it reads the entire Json array into memory when it parses it to the the slice of json.raw message and then it goes through each one of those elements individually and parses to the final result so we have a completely extraneous copy of the entire array in memory that we don't need that's especially wasteful of course if you get an error early on in processing so you've you've read this entire thing into memory you get an error and then you throw the whole thing away but even for Success case it would be much more efficient if we could just read each one individually rather than using up all that memory so let's look at a way to to accomplish that so for this example I'm using the exact same configuration as before I still have a person animal and address types and I still have a responses type that is a slice of the empty interface the only thing I'm changing here is my implementation of on Marshall Json the the functionality should be the same it's just hopefully more efficient now so my new implementation of unmarshall Json takes advantage of the Json decoder which takes an i o reader as input and then the decoder allows you to use this tokenizer interface to interact with it so I Define my decoder here and then the decoder just reads the the input using the bytes.net reader the first thing I want to do is consume the first token which I I'm assuming for this example is is correctly a uh an opening Square brace um of course to be more robust I would actually validate that's what it is and return an error if it's not I can I trust that you can add that to your own implementation then I create my result output now in this case I don't know the the size of my output so I can't pre-allocate it uh so I just set it to zero to zero length and then once again I create my variable called header which is of this Anonymous struct type which contains just the type uh field same as before now my Loop is slightly different now I still use this raw Json raw message but I just have a single instance as you can see rather than an entire slice and that's because I need to unmarshall the value twice still so I start by decoding the next symbol from the decoder the tokenizer into this raw value then I decode that raw value into the header once I've done that then I have my header.type that I can use to see where to unmarshall the rest if the type is person ion martial to person if it's animal and Marshall to animal if it's address I unmarked its address essentially the same as before now this time rather than assigning to a particular index of the result I just append since I don't know how many results I have at the end as before I assign the result to the receiver and I return a nil error now our example should be identical we have the same inputs we expect the same outputs and indeed we see the same outputs it's just a more efficient implementation now let's talk about something that's even more interesting at least for me supposing you have a data structure that has some known fields that would map well to a struct and some unknown fields that would require a map say of string to string or string to interface but you don't want to do everything as a map let's talk about a way to do a hybrid approach so here's our example input data we have in this case two fields that are the same across all objects the underscore ID and the type everything else is arbitrary and we don't have any way to know what it is it might be that these fields vary based on the the type or they might be completely arbitrary and so we'd like to access the ID and type through struct Fields rather than using a map everything else must be in a map because we don't know what it is so here's our Target type I've created a type called item represents some generic thing that has an ID and a type and arbitrary data let's look at the custom on Marshall function I've created so the first thing I do in this in this function is I Define a variable called X which is at this Anonymous struct type this Anonymous truck embeds the item type which I defined on the previous page here and it also adds this field called unmarshall Json now the only reason this exists is to overshadow the method the unmarshalled Json method on the item type if I don't do that then on this next line what I call Json Marshall on X it will find that item has the unmarshall Json method which is actually defined right here which will create an infinite Loop and cause the program to crash so with that in mind I do call Jason unmarshall of the raw data into X and then I call uh down here I unmarshall the data a second time this time into a map of string to interface then I delete the two fields that were previously handled up here the ones defined in item ID and type I assign the uh the item here to my receiver and then I set my data my arbitrary data so let's look at an example here are our three example input objects from the beginning and when we execute we see that we get a go slice of type item and each one the first one has an ID of Bob type of user and then the arbitrary data the name uh second one has an idea of Meetup type website and arbitrary data with URL and again the third ID of soup type of recipe and arbitrary data of course doing this is most useful if you can reverse the process so let's talk about that so now instead of creating an unmarshall Json method we're creating a marshall Json method it's working on the same data type the same item struct the first thing I do is I Marshal the data so this is the arbitrary map data to a value called Data then again I use an anonymous struct type for the marshling I embed item inside of it and same as before I create this empty struct field called Marshall Json to avoid the infinite Loop in addition I added this Json tag if I don't do this then the output will contain something called Marshall Json with no no fields which I don't want then I Marshall this uh temp value into raw data called obj so at this point both data and obj should be completely valid Json objects each with a subset of the data we're concerned with data contains the arbitrary data values and obj contains the ID and type fields so to combine those uh what I do is I replace the final character of obj which should be a curly brace a closed curly brace with a comma and then I append all except the first character of data to that and return that so here we have our input data is this image go objects but they're essentially the same data we had before we have Bob Meetup and soup and let's see what the Json output looks like in the first case we have an object of three Fields ID type and name as expected the second one ID type URL and the third one ID type and ingredients thank you for watching I hope you found this information valuable if you have any questions please reach out to me either in the comments or on my blog I'd love to hear from you and don't forget to like And subscribe if you found this valuable foreign [Music]
Info
Channel: Boldly Go
Views: 8,901
Rating: undefined out of 5
Keywords:
Id: Tgg-ChT4IZE
Channel Id: undefined
Length: 29min 32sec (1772 seconds)
Published: Sun Oct 16 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.