JSON: Like a Boss

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so I'll introduce myself on the next slide I just wanted to take a minute to reflect on the title of the presentation you might be asking yourself Jason like a boss like what does that even mean like bosses don't use Jason well that that's true but let's turn to urban dictionary and you know figure out what does boss mean so boss means incredibly awesome miraculous and great that is something I feel towards JQ but there's also a song that's called like a boss by Lonely Island I'm not sure if you've ever heard it now what I was really struck by how boss this individual was in this Lonely Island video and he'll make a cameo later on in the presentation so you'll get the meat on but just know that this topic JQ is very dear to my heart anyone who's been around me for the last couple years knows I'd like to you know throw out there a cameo for JQ itself so that leads us into what I do and how JQ has changed my life so let's let's go to the next slide it's not going to the next slide all right okay so I'm Bob I've been working in the Freddy lab now for four years through my time at the Freddy lab I've had to deal with many different types of processing data pipelines data integrations ETLs api's Jason dumps a bunch of Jason basically and as Jason is the lingua franca of the web something that uh that I needed in my daily life was something to help me manage Jason data so let me give you a little bit of motivation for JQ so how many times it has happened to you someone has given you an API internal probably doesn't have any documentation and they said okay it's in there the data is in there you just have to find it in this case let's assume we want to find this gender of what is called a case it's a concept in in genomics and clinical data so very relevant how do we get this piece of information all we see is a wall of text you might go thousands of lines in either direction so there is a solution and that is JQ + JQ is going to be part of your arsenal if you invite it to tackle many different types of problems so let's talk about JQ and its family so JQ was actually part of a family of tools they were written by the same author but they all share the same philosophy and that is the UNIX tools philosophy and that's to have small programs to accomplish a particular task instead of trying to develop one large monolithic program to do a large number of tasks because when you do that you end up serving really nobody because everything's watered down so some of the the tools that I've used in the past that are kana in this family are well JQ xml start 'let very similar to JQ although not as powerful for xml wrangling on the command line there's pop for HTML it'll let you do things like Jake jQuery selectors on HTML and wrangle HTML and transform it and why Q is kind of close twin I'd say of JQ which is for Gamal and of course in the the lineage there's also the cousins or some of them are actually I would say parents of this philosophy and that is said awk and grep which I'm sure you've used in your daily work so let's focus on JQ for a bit JQ is a lightweight and flexible command line jason processor that's what the author how the author describes JQ but I think it's not really that telling that the way to encapsulate it I think what's more important is actually you know how you express yourself in Jason JQ so it's written in C it has no runtime dependencies who you can just install it and it'll work it's actually a functional programming language I think it's pure functional but I don't really know if that's true it's also Turing complete and there is a proof and the language I'm not allowed to pronounce and it's very terse and expressive and it allows you to do all the nice functional operations that you may be used to from other languages and as we'll see later there's a lot of concepts that is borrowed from languages like Scala and es6 and those languages as well so you might have got a sense for you know how the applicability of the tool is but these are the main use cases that I see when I use it it's exploring j-jason api's and exploring json l dumps and jason l dumped is nothing but a new line separated concatenated Jason's so when you're exploring JSON API is one thing I've used it very frequently is with elasticsearch github API docker docker has commands that spit out things in JSON and for jason l dumps i use it to slice and dice the data to do very very simple command-line pipelines and one-off mini et ELLs in aggregations counts reports that you need to do from jason metadata and in the end it's also just a darn handy JSON format or it's very convenient for that task alone so there's one thing you take away it's going to be it should be one of the slides I want to present on next so these are the the main entry points into the world of JQ the first one is the home page of JQ it has the documentation on how to use JQ it has the installation instructions there's also an online interactive JQ playground that's great for just hacking if you don't want to download it and you're saying I don't know if I trust this guy Bob about JQ I just want to kind of you know see for myself without having to install a new binary it's also a way to share snippets with people and of course there's a dedicated tag on Stack Overflow and a free no channel for IRC if people still use them that it's multi-platform all the major platforms are supported I've only really used it on Mac and Linux and this is how you can quickly install it although there's tarballs you can download from the site and now we'll get into how do you use this thing well there's two main ways you can invoke JQ one is a streaming way and another way is reading from the file system so typically what you end up doing is you write a curl command if you're reading from an API and you pipe it directly into JQ or if you're just hacking on the command line you use echo to echo into the standard end of JQ and of course you can cat and so forth but there's also JQ file if you have a big JSON file maybe it's a JSON I'll dump or something you don't want to necessarily you know cat it or anything you can just use the file input there's also HTTP i.e which i recommend as well if you want to be hacking a lot of jason stuff jason endpoints on the command line it's very useful adds a lot of features on top of curl I'm not going to go through all the command line things here but some of the things that are important are the slurp mode and the colorize and the output is raw strings so this is just kind of a quick handy cheat sheet if you if you need it but it's also on the command line there are two primary modes of operation you can go line by line which means that every line from the standard in is red and processed through the expression you supply to JQ and then output or you can treat the whole input as a big array and it will be put through JQ as well so these serve two different purposes sometimes you want to process everything at once and it's hard to do in the line-by-line mode so slurp mode is very convenient for that so I'm going to kind of work through the language features the core concepts of JQ and then we'll go through like a realistic exercise that I've used in my daily life that some other people actually might find useful it's um in the field of bioinformatics so the core idea in JQ is that of the filter and the filter is very much like a Bosch filter or shelf filter the idea is being able to pipe things in to one another and create chains and filters just allow you to do the basic transformation steps in JQ expression so let's just examine the most basic one and this is the one that I was saying earlier if you learn anything from this presentation it should be this it's the JQ dot filter it's the identity filter so what we have here is we have some JSON and we're piping that Jason into JQ with the dot and it just colorize --is it by default and pretty prints it so you get a nice indented colorized output now the next one is one that actually will project out a property say you have an object and you want to just get the value of one of the properties in there you can just specify dot property so in this case I'm just wanted to grab or pluck the x value and it gives you one and it just kind of removes the other object so here's a protip an JQ it might kind of squawk at you if you try to do certain types of operations on things that don't make sense so what you can do is you can add this little question mark at the end and it won't give you an error if you try to say project out of integer or something like that it's a way to kind of make your pipelines a little bit more resilient in a way into differing data if you're not actually anticipating it uses quite a few times it's very helpful of course you can take the property access and projection to multiple levels so we here we have this nested document XY and Z and we can project out the the value Z just by changing the path expression this is another operation it's useful for arrays it's called I call it flat map if you're familiar with the concept of flat map it just takes the array unwraps it and it gives you a set of elements so the outer array goes away and you end up with one element per element basically and you can do processing on the per element basis downstream and I'll cover a few more complex examples of this so the next concept is that of operators we saw the basic filters this operators allow you to combine different types of filters together to form more complex expressions the first one is the pipe and I'll the pipe is very useful for chaining things just like in in Linux or UNIX if you're on the command line you want to pipe one command into another and another into another you can do the same sort of thing with JQ so in this case what we're doing is we're taking the X property so we're taking the value Y colon Z and then we're projecting out the dot Y this could also be done in the previous web doing dot X dot Y and we get Z so very important operator the comma the comma is a way of producing multiple outputs so in this case we have this x : 100 K I want the identity but I also want the identity output so you can see that there's two outputs now so one record becomes two and you can do this in more interesting ways instead of just using you know dots you can do more complex expressions on the left or right or you can have multiple commas now this one is a grouping one so you can again group things together just like in math you can make the order of operations different or you can encapsulate some processing in some subtree of the expression and use the resulting value as part of the output this one's not that interesting but you can see that I have the identity and then I have something where I'm piping and projecting out the X it's more of a an example pedagogical example to show you you know you can't combine these operators together and now we're getting to more arithmetic operations so you can say have this array on the left and you pipe it into JQ and you end up doing different operations adding two to it subtracting two to it multiplying by two divided by two modulus and so those are the two main building blocks and on top of this you can actually build new objects so we saw just kind of streaming through some of the results and plucking out some stuff this allows you to build new from the existing the the first one is the array this is how you build an array you have elements flowing in for each element that flows in by the way I'm showing one element here just for cuz for ease but normally each one of these would be newline separated so you would have X colon want have X colon 2 or something and this will apply each one of those records as it flows through in this case I'm saying for every element you that flows through Jake you wrap it in your ring and it's very useful you'll find yourself using this I think frequently another one is the object constructor so in this example I'm taking one piping it to JQ and I'm saying wrap one around or wrap an object around this one value and then output that then you can start to see you can pipe these things break break it up into logical steps reusable pieces you might use in other contexts by these primitives variables these aren't really variables for the theorists of programming language these are what they would call bindings so it's a way of just capturing the value of an expression for reuse later as you'd mention that they're usually not needed but to help to cut down on noise in certain situations they must start with a dollar and they're scoped over the expression that defines them so if we see this basic pattern here we have an expression as the variable so you can take dot as dollar x and then you can refer to that dot as x in the next expression or sub expression you can use destructuring so this is an example of doing a multiple assignment based on something that's incoming so this is saying take whatever record that's incoming it will be expected to have a B fields and then assign the value of the a field to X and assign the value of y and z to whatever the elements of B has and it's very similar to how this works in JavaScript okay of course now that you have these basic building blocks what what language would be complete without a nice set of functions to operate on those I'm not going to go through all the functions but I'm gonna give you a little flavor some of the ones that I find myself using it from time to time so it's the unit of reuse and encapsulation in JQ they're introduced with the DEF keyword can take arguments and the important point is that the dot is an implicit argument which means you don't have to pass it in so it's just kind of a global variable you can think of it as a global variable that you can always access very simple example is this increment basically this creates a function when you pass a value into it whatever comes out will have 1 added to it and on the right this is actually a standard function but I'll just show in the definition here this is actually doing a mapping which basically will make an array of the values that are passed through the function f by the way if there's any questions you can stop me and ask me at any time [Music] more than willing to take questions at the end as well so I'm gonna break down some of the just some examples based on the object type that we have in JQ so some of common ones you might see our length so this would give you four the indices of the element eight would be three does it contain any of the values in this array - yes it does what does it look like if I reverse it what's the minimum value with the maximum value saying with strings you can split just like you might do in JavaScript you can test a regular expression you can get the length of a string you can see if it contains with the string see if it starts with something you can do to lower case and objects have a very useful set of functions as well the one I use a quite a bit is this keys keys is basically telling you what values what are the keys of an object that comes through and also works on arrays so an array will give you a list of the ordinals of that array you can delete a property add of that you can add which will add all the values you can then pivot this structure to turn it into an array of key value pairs and you can also flatten it which gives you the values out select now this is these are kind of more higher-order functions where you can pass in a boolean expression and it will only return those values that pass a boolean expression this one is used quite a bit it's what some other languages might call filter but if that's an overloaded term so I think they went would select a very useful one that we'll touch on a bit later is the paths so this nested document here this nested JSON will actually pass through paths and I'm providing a in expression two paths to say only show me the scalars so that will basically show me only the scalar values I think I have a a problem here I think it should be seeing D only the last two ones recurse it's so handy that they have a shortcut for it called dot dot again very useful one how can you go through the entire substructure of adjacent document and do something for each one of those things so the object itself the the value of a the value of B and all the primitives well you can do this with the dot dot expression or recurse the the dot dot expression will not take any arguments so it's just as it is another one group by great if you're familiar with sequel this is cut or something like lo - this is great - it allows you to basically group by a certain criteria and in this case I'm grouping by the field X you can see how all the X's with one go together an array and the X's with two go together with an array you can have modules I'm not gonna talk much about these but modules are the way that you can break down a more complicated program into smaller bits and reuse them share them with other people there's actually a loader for these module system for JQ there was someone wrote and you just import the functions from the modules another interesting aspect of JQ is its for matters and escapers and you can see some of the common ones here you have text json HTML your i csv ets ve shell and base64 if your output needs to conform to one of those specifications or environments then you can use one of these at the very end it's very handy for TS visa as we'll see very simple example of how to convert an array to a CSV you pipe it through at CSV great for bioinformatics then there's a whole bunch of other language features that I just won't have time to discuss today and that's dates control flows like while and F and try-catch generators parsers generators are very integral actually to JQ but more of an advanced topic parsers streaming and some input/output functions now we go to the second segment of the presentation and this is the like a boss section and that's of course the the guy from Lonely Island and he is the boss and we will see some of the useful applications of qjq actually just one example so let's just say someone came to you and said create a TCGA Bart barcode the UUID mapping and TSV format this is actually something that I need to do quite often because of certain issues with the way TCGA submits to icgc so this one is kind of near and dear to my heart it also uses the GDC API which is a great API by the way but if you if you query the GCC API you get something that looks like that second or third slot I had this wall of text especially if you add some extra stuff at the end there so you have the same problem where okay I have this data now how do I understand what it is of course they have great documentation but let's just assume that they don't the first thing you want to do is just you know apply the identity operator which just formats everything by default you can start to see the structure a bit better now however it's quite big in a structure so we might go on for thousands of lines so one of the things that I like to do in this case is to curl it jqf I it and then pipe it to less and this sequence of flags will preserve the colorization and that JQ does it might be some handy snippet you could put in your balances or something so that that's good and you can page the results unless just like you might do for a large text document this one is actually pretty cool it's what I call instant schema and it's a way of figuring out the structure of a document without having its Jason schema and what it does is it recursively goes through prints out the paths the paths have to be converted to string and then it joins all the segments together and you get something that looks like this as a result which basically is showing you the structure you can see you have data doesn't tell you if it's an array or a an object but you have hits you have the array indices you have the fields that are underneath there and the way I look at this and it's much more immediate to me about how to actually craft JQ queries because as you can see the paths kind of resemble how you might express a JQ filter so this is actually the solution to the problem there are three types of barcodes there's one for cases there's ones for specimens it's one for samples they call them slightly different terminology and this is actually the solution here it's quite a complex data structure to try to process but this is the result at the end you can see I'm doing at TS V on the result and here's an example because this is so useful you can actually you know save this away for something that you might want to use the next session and so this little snippet on the top it's basically appending it to a dr. JQ file and Jake you will source that automatically for you every time it loads up so I just defined a new function called schema if you pass that stuff into it it's the equivalent of what we did back here so here are some things that you don't want to do or you know you can let learn from my experiences about some of the pitfalls that you can fall into with JQ there's really only a couple that will catch a newbie off-guard and that is shell quoting so in some of the examples it was a bit liberal with my use of quotes but it's actually very important when you are on the command line because the first will interpret that is multiple shell expressions or pipes the second is the way to a problem that is to wrap things in single quotes or double quotes I would say by default use single quotes so you don't have like a dollar trying to be replaced by a shell environmental variable so that is a very important thing to to consider when you're first starting off with JQ and if something goes wrong it's probably because of this another one a lot of my example started with X&Y this one doesn't have a dot in front of it so it's thinks that it's a function but that's not really what we wanted if we wanted the actual property so make sure you add the dot in front of it to access the property another thing that you might see in a lot of different types of json Zoar - seperated field names or property names not - no goal and JQ so you need to actually wrap that in what looks to be kind of like an array access it's kind of like the associative race style of accessing something that you might see in JavaScript as well and this one's is a little edge case it's it's not something I really ran into but just be aware that you can't do this and this is the solution just out of pipe look if something doesn't work just out of pipe basically that's what I've learned so that's the basics of it of JQ they're kind of an overview with a little application about how you might use it in your daily life but there's so much more to talk about in JQ world so I'm just gonna throw out some stuff here and see what sticks so we have Java this is a pure Java port of JQ I've actually used this in a project as it was really handy for quiddity yeah I'm looking at you Vincent this is in quiddity it's a hundred percent Java it's embeddable it works with Jackson's JSON node and if anyone's done any JSON programming in Java they know that Jackson is really at the core the core of it and you can write functions in Java so instead of writing a JQ function you can write a little you can extend a certain interface or implement an interface and you could actually you know get that code to run so you can do interesting things like database operations or something like that inside of it this particular example is just showing what you might see on the command line compiled into a Java object then taking a Jason node and then applying the expression with a scope that you can define variables and so forth and getting a list of results just like you would get at the command line this is node JQ this is a nice module for integrating JQ which remember is native it's not written in JavaScript or note or anything like that this allows you to integrate this in other products such as vs code or atom which we'll talk about later and it's just a wrapper around the native library so it's more of a convenience thing so at the moment there is no vs code integration or plugins so it would be a great way to get your JQ on this is kind of interesting and this is what I was saying it could be really cool for vs code I know a lot of people use that there is an atom module that runs JQ inside atom and you can see maybe you can see on the bottom there's a JQ filter that's dynamically filtering the left-hand side to give you the right-hand side it's a quite expressive and great so there's actually quite a few other things I found just by surfing github there's a go wrapper there's a JQ shell to JQ based shell implementation there's three Ruby implementations for JQ there's RB q JQ r jr. I don't know if it's called jr. but I like to call it jr. there's JQ a heart which is an R wrapper there's a Lula wrapper then there's this collection of snippets that I found which are quite useful and there's also this kind of meant to kind of strike through this one because it kind of puts down JQ and saying that this is more powerful than JQ but it's a it's a basically a JavaScript like language that apparently is more powerful than JQ but it's really not then there's some other tools and stuff JQ httpd which is a a server that spins up that you can fire Jason on it and it responds with Jason I guess there's a use case for that there's a Skala JQ parser there's this JQ package manager which allows you to use something like NPM for JQ so you can MPI I don't know if it's called JQ NPM install a particular module there's a UI a Mac thick client for doing some of the processing and then there's this one was kind of cool just like I showed that schema function there's actually a script that will generate JQ paths instead of simply the concatenated fields so I think that's kind of interesting so it allows you to take those and then use them into a JQ expression and that's basically it and all the images are by the noun project and these are all the things I used are there any questions yeah so back on the on the module side there's a way you can specify the module path just like you do and Python or something like that so you can have a directory that's just full of JQ files and you can import those in your application or even on the command line so you could have something to do with a particular API maybe and you file that away in a folder that has some files in there that have a JQ extension and you can import that and use it the dot JQ if it's a file it was source it if it's a directory then you can reference you can import what's inside of it so it's there's a kind of convention there but you can override it on the command line yeah it is a function it is a it is a language in don't rate but but I'd say 90% of the time you know you're using it more like said Orrock or corrupt and you are using it like a language because the there's really no native looping in it or because the things are built on generators it's it feels very much like a pipeline you might get in linux shell so that's the way I've always felt it felt like to me until I learned some of these other features which I thought were quite interesting especially the generators and restructuring and stuff like that any other questions so the question was how similar are the related tools I haven't played around with much of the the gamelan because I don't know I've never really had a need to do that type of processing I guess someone felt then necessary to create another tool for it but yet the only they all have their own kind of dialects and flavors I think JQ is pretty advanced it's more I would say akin to awk maybe more advanced in awk awk has a lot of structures and it's a its own programming language you can write programs in it and you can also do very small snippets so I'd say if I had to compare it to something else it's more like awk all right I think that's all thank you you [Applause]
Info
Channel: OICR Software Engineering Club
Views: 15,968
Rating: undefined out of 5
Keywords: OICR, software engineering, JSON, jq
Id: _ZTibHotSew
Channel Id: undefined
Length: 35min 10sec (2110 seconds)
Published: Mon Feb 13 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.