Firebase & BigQuery - Do Mobile App Analytics Easily & at Scale - Queries Included (Cloud Next '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
👍︎︎ 2 👤︎︎ u/fhoffa 📅︎︎ Aug 03 2018 🗫︎ replies
Captions
[UPBEAT MUSIC PLAYING] CHAD JENNINGS: Hey, everyone. Thanks for coming. My name is Chad. I'm a product manager on BigQuery. And thanks for spending the next hour of your lives with us. We're going to-- I've got joined by a couple of colleagues here, and we're going to spend the next roughly 40, 45 minutes going through Firebase Analytics, the Firebase Analytics schema, and showing you some really slick SQL tricks to be able to deal with the schema. So how many folks here are, like, primarily Firebase Analytics people? Cool. How many people here are primarily SQL analytics folks? OK. This talk is for you. And it's a 300-level talk, so we're going to be swapping between demos, and lots of SQL code, and back and forth between that and a couple slides. And there's very little marketecture in this talk, right? [LAUGHTER] Yeah, that generally gets a shout out. All right, so here's what we're going to do with the session. We're going to go through the schema in detail. It's a highly nested and repeated schema. And so if you don't know the tricks about working with that, it can be a little bit daunting. But hopefully at the end of this session you will have completely demystified it. We're going to go through a whole bunch of queries that are outside the scope of the standard Firebase Analytics dashboard, and then cover the SQL techniques. And then at the end, Felipe Hoffa is going to go through how to take the Firebase Analytics template that's available to everybody and then customize it with whatever query you want. And he'll do an example from the session here, all right? So your presenters are me, I'm on BigQuery. Eugene Girard is Product Manager for Firebase. And then Felipe Hoffa, literally if you type Mr. BigQuery into Google, you get him. All righty, I won't spend a ton of time on these kind of introductory slides. But for the benefit of those watching on YouTube, and for those who don't know, we'll just do a little bit of high-level stuff. So Google Analytics for Firebase is the agent by which we collect data that comes from your app, do some aggregates, and then produce the stats that you can then render inside the template or wherever you want. But you have a question. One of the problems that folks encounter here is what if you get to a question that's not contained in that template or what if you get to our question that's kind of outside what Firebase produces out of the box. What happens if you want something that's not on this dashboard? And that one is enter BigQuery. So BigQuery is Google Cloud platform's enterprise data warehouse. It's where big data comes and lives and thrives. It goes super fast over super big amounts of data. It also goes really fast over smaller amounts of data. So Firebase is an analyst from folks working over gigabytes to fix folks working over petabytes, works equally well for both use cases. Couple of details here. It's server-less, which means you don't have to worry about provisioning resources. You just pipe your data in and bring your workloads, write your queries, and go for it. So you don't have to spin up nodes. We take care of all of that in the background for you. This one is the only call it marketecture slide. And I actually think this is the most useful marketecture slide that we have, because GCP has over one hun-- yeah, now the phones come out, right? [LAUGHTER] GCP has over a hundred products right now. So there are over a hundred little blue hexagons. And I don't know about you all, but I can get a little bit confused when I'm trying to interpret what these icons mean. So I wanted to put this up here to show you kind of the spectrum of storage and database products that we have and where BigQuery sits. So BigQuery is over here as the Enterprise Data Warehouse. So it behaves somewhat like a relational database, except you can go arbitrarily big. Really good for analytics, really good for porting, really good for dashboarding. So yeah, the old story was this. New story is this. And you can ask it kind of any question you want. You don't have to do indexes. You don't have to pre-aggregate. You can do full table scans on whatever question you want. Cool. So with that, we're going to get into the next part of the talk. So Eugene is going to come up, and he's going to talk a little bit about SQL and then go through the schema. All right, tag. You're it. EUGENE GIRARD: Awesome. Thanks, Chad. So obviously, we're talking about BigQuery, so we're going to be talking about SQL here. And the important thing about SQL is that you've got to know your data if you're going to make use of it. Little-- thank you. So when I started learning SQL, way back in university, I ran a lot of queries that kind of looked like this where you select some stuff from a bunch of rows and columns, and you do a select statement that said which column, which rows you were actually interested in. And then you add it up with something down at the bottom that kind of showed you what you're looking at. The thing there is that it was always kind of a very grid-like look. You have all these rows and columns of data, and you kind of know exactly what you're going to get out of it. Now that's got me through CS 322 just fine a few years ago. The world's changed, and we've got a slightly more robust and capable engine in BigQuery that can represent a lot more than just rows and columns. So a lot of these things are things like key value pairs. There are things like records or objects. There's things like arrays of objects or records or key value pairs or all those things. And all of that gets a little involved. So you have to really understand your schema if you're going to be able to make use of a big table like this. So this is the schema that we generate as part of analytics from Firebase. And the thing here, we're going to go through a few of these. A lot of these things we really don't care about today. And you can kind of ignore the parts you don't need, that you're not using at the moment. But we're going to focus on a couple of these. So right off the bat, there's a couple of key value pairs that are really important. There's things like the date and the timestamp, and also the event name. Every event that the developer emits in their Firebase app gets its own unique kind of schema here. And we always key off the event name to say what kind of an event are we looking at. Afterwards, there's a bunch of event parameters. This is an array. You can throw in a many things you want here. And in those, there's a name for the key. So that says-- for example, here we're looking at the final score when your level is done. And that's going to have an int value associated with it. Just to make these easier to use we throw in, like, four different values. Only one of those is ever going to be non-null. Every single final score will always be using the end, and all the others are going to be null. And something like a level name we'd be using the string value and all the others would be null. So this makes it very predictable from event to event. But the actual values that are filled in you're going to have to kind of know your schema for your own events that you're generating in your app. And this is where your own app can specialize and generate the kind of events that you care about. So we also have a user ID. This is something that the app can set if they choose to. And we also generated a pseudo ID for the user. The pseudo ID is unique to that user for one run of the app on one device. So this is what we use to tie in a bunch of events together to say that they're from the same run of that app. And this is the one that we're keying in all the time. The user ID, again, it's optional if you use that or not. In our sample data we've actually nulled all of those out, because we don't want any PI leaking out in our demos. So we have a slightly sanitized, pseudonymized version of the data that we're going to be working with. There's also a bunch of user properties that get associated with that row in the database. And the nice thing about throwing the user properties here is that those properties change a lot over the run of an app. And this way you can see what the properties were at the moment that that particular event fired. And that makes it really easy to have complex things like the experience level, which for this user is going to change. Basically every time there's an event, their XP is going to be slightly different at that point. So it lets you see at that moment in time what the values were. We also have whether they're a premium player or not. All of this data is up to the app developer to decide what they want to stow into this. We also track a bunch of useful metrics. Firebase supports a lot of different devices and a lot of different platforms. So it will let you track down where the users are, what devices they're using, where they came from, which exact version of your app they're using. If you're doing A/B testing, that would also show up there, the platform that you're running on. All of this kind that goes into the app itself. And so every app that you write is going to have its own unique concerns about what the user's up to or what it's doing. And these are the kind of things that you're going to want to ask questions about. So here's an example of the Bingo Blast game. It's a game that's out in the app store. You can run it and play it. It's actually kind of fun. But we use that to demonstrate some of the capabilities of Firebase. It happens to spit out a couple of events. So every time you finish around in Bingo Blast we create a round completed event. And we tie-in a bunch of things like the score. Whether the game was a solo or a social game-- the type of game that you're playing. We also keep track the user data down at the bottom, whether or not this user happens to be signed in right now. And if they are, we want to know whether they have ever spent money on the game. Because at the end of the day, that's a very interesting metric to be looking at. We also keep track of things like XP, which is really a measure of how long they've been playing the game, and how experienced you'd expect them to be. So with all of that background, I'm going to turn it back over to Chad and ask him to walk you through how to use this kind of data. CHAD JENNINGS: The next section of the talk-- thanks Eugene. The next section of the talk is we're going to go through a series of questions, I think back to the deck, please. We're going to do questions at the end. Oh, and also-- we'll remind you of this-- when you do questions, come up and use the mic. But we'll do them all at the end. So speaking of questions, we're going to go through a few. These are the questions we're going to do. We'll start-- we'll do them in order. These are crafted in such a way as to literally run us into a problem, and then show you how to fix the problem. Run us into a new problem. But we've played-- oh, and all of the code for all of these is in the GitHub repo. And the link to that repo will be at the end of the talk. But to save a little bit of time for Felipe's section on Augmenting Data Studio Templates, we're going to take question four about filtering by a range of properties and question seven about closed funnels, and we're just going to leave them for whoever wants to play with that in the GitHub repo. The codes already there. And what we're going to do is we're going to do number eight, Customizing Data Studio Template. And we're going to do that with question number five. So the bulk of the talk is just running through this henceforth. Okey-doke. Now, let's switch to the demo. Okey-doke. So here is the BigQuery UI. And what we're going to do is we're going to run through a series of queries to get to the answer to question number one. So for those not familiar with it, this is the query composition. Oh, let me just do an eyesight check. So for the folks in the back of the room, is this big enough? I can make it bigger. Would anybody like this larger? OK, no hands. We'll go with this. So anyway, this is where the-- this is where you compose the queries. I've run all of these in advance, so the results are already here. But what I want to show with this one-- so select star, also this is a great thing to do for demonstrations. I don't recommend selecting all the columns, right? BigQuery is a columnar store. Just pick the ones you want. Because select star queries can get expensive. So not a best practice, just an example for the talk. And so when you run this query, you can see that we've got an event right here. And then I'll show you how that nested and repeated field, right? And so here is the country, the device. And all the way out here you can start to see that in this row there are a bunch of these other nested and repeated fields. All right. So how do you start selecting these things? Okey-doke. So first off, what I'm going to do, like, just a simple where. We're going to select round completed. OK, so down here on the query window, let's see I have to-- for some reason it's not showing the-- oh, there we go. For some reason it's not showing me the scroll bar. If it showed-- it was showing me the-- let's see if I can get that to refresh. Well, if it was showing me-- there it is. All right. So event type, like, here are the basic parameters here. Here's round completed. But then still we've got all the events, like the possible event keys. And you can see that, right, this round completed. There is data here for bingos claimed. And then further out-- I won't bother scrolling out here but-- I guess it will-- further out here we've got the user events. So that's still-- like, the result here is still a nested and repeated field. So what we want to do is let's just pick out all of the event-- or the event params where the key is equal to score, right? We want to find out-- like, the goal of this question is, like, what are the scores? And let's do some stats on the scores that people are playing. And as you can tell, right, we're getting an error here. So why are we getting an error here, right? One of the event parameters is score. But the problem is event params is an array. So you can't-- you have to do something to event params first. All right. So let's go ahead and switch back to the slides. And we'll walk through what you have to do with event params to get there. Can we switch back to the slides, please? All righty. So here's event params. And here's the problem, there's no .key property. It's part of this. So we have to extract that first. And so unnest to the rescue. So for those of you who are SQL-minded, unnest acts like a cross join. For those of you who are table-minded, unnest turns nested and repeated into a table. So let's see how that works. So in this example, we want to find all the starships where crew member equals Zoe. So if you do it like this, this is what we just did in the BigQuery UI. If you want to do this, you're going to get an error. So you've got to unnest it first. So select star from starships. That was what we did in the very first query. And if you cross join-- if you wrote this out, right, this would actually work. You know, so this will actually produce this table where you blow out-- you create a new row for every member of the event params with crew names. And then you can filter on that by picking the crew member that you want. But they're simpler syntax to do that where you can just use this unnest command. Again, all of this code is in the GitHub repo. So feel free to take pictures, but you have access to it at the end. And now you can filter on the parameter you want, right? So instead of just doing this filter on crew member as that first-- like, in that first letter query-- unnest first, and then you can filter on what you want, right? Turn the array into a table with unnest, and then you filter the way you like. There we go. And who would have thought that Zoe likes wingspans of 32.4 whatever the units are. OK, so in the context of our example here we want to do the same thing, right? We want to pick the score param. So unnest event params, and then filter-- or do the where-- on score, right? And that filters out the row that we want. OK, oh, and then here one thing I should say. If you select star here, you're going to get the event params array in your answer. And that can just make your answer a little bit messy, like a little larger than you need. So there's a neat tip here where you select star except for the event params, and you just cut that out. Up to you. I thought that was a cool trick. All right. Let's go back to the demo, and we'll get to where we want to go. So here's that same command where we unnest. And then you can see what's happened here to the results, right? The results now look like a table. And then we can look-- what we're going to do next is we'll look through the param.key key and find all of the score events. All right, so here it is. So unnesting where event name equals round completed and where the param key equals score. And there we have it. OK, so that's pretty cool. So that's how you deal with the top level of schema. But remember, the goal was we want to do something with these numbers. So we'll go ahead and select. Instead of just the event param, we're going to select the-- like, instead of select star, we'll pull out the stuff that we actually want. And as Eugene showed you, right, we'll just pull out the int value for score, since that's the one that's not null for this parameter. OK, cool. And then we can actually do some arithmetic on it right in the select. So here we have-- so we're calculating the average of the int value. And we're calculating the standard deviation, and then returning those. So, I mean, as analysts on a game, you can then run this back to your game developers and say you know, hey, the standard deviation is way too high. I don't know, maybe that translates to a bad user experience in some way. But this is data that you can start to use to see whatever you want to see. In any can, let us now-- let's see what the message here is. We're going to go ahead and start number two. Let's go ahead and switch back to the slides, please. Cool. All right. So finished number one. On to number two with, how does this score correlate with the user's experience level? OK, back to the demo, please. Okey-doke. So starting with the same query that we had before-- right, this is exactly the same one-- we want to correlate this with user's experience. So we're going to have-- so remember, the event parameters and the user experiences were in two different-- remember from what Eugene was talking about-- those are in two different nests inside a single row. So we're going to have to unnest that second parameter here-- or sorry-- that second array I should say. Well, boy-- like, not clairvoyance, right. I just managed to get that right. And so here we go. We're going to unnest user properties. And then you can see that that blows out, or it just increases the scope of the table. So the answers include the user properties as well as the event properties, again, filtered by a score. So I'm saving myself the trouble of running every query here live. But now how do we correlate between the two? All right. So next step is go ahead and filter by the user property that you want. So we wanted to correlate score with experience points. So we'll go ahead and filter those out. So we get the table. So for round completed events, here's the score. Here's the user experience points. And, yeah, this step-- oh, yeah, we should point out almost all of the user parameters come back as strings. So before we actually do some computation on it we have to cast the string as an int, and call that experience points. And here is the beginning of the next one. Sorry, here's the beginning of the next trick. We'll go through this one in a little bit of detail in just a second, but I'll introduce it here. So in that last query, right, we selected event name, and then the params as the score. Then we cast the experience points into it. So we've got all the data that we want here, but now you have to calculate the correlation between the two. And for those of you who aren't SQL, or who weren't born speaking SQL like me, a very clever way to do that-- and for those-- the other problem with this construct in SQL is it's kind of counter-intuitive if you've learned another programming language first. And so what this is called is a sub select. So what I've highlighted here is the query that generated the table that we wanted. And then you can take that inside a SQL query and write a query on top of that, which is this top row here. And then do operations on the result of the subquery. So that's kind of cool. We'll go through that in a little bit of detail. But here's the correlation between experience points and score. So you can see that folks with a lot of experience have a slightly positive correlation to better scores. I don't know, bingo is mostly random. But if you get the hang of it, you're better than random. I guess that's what that says. So for those of you who gamble on bingo, beware. That was a joke everyone. Thank you. All right. Let's go back to the demo. I'll just queue up the next query-- oh, sorry, go back to the slides. And we'll go through what the sub selects are. So Yeah, correlation of score with experience. Subqueries make life easier. So here we'll go through it. So we have this schema for Google-- yeah, the schema for Firebase Analytics inside of BigQuery. And then you do some operation on it, right? That was that query where we pulled out the parameters that we wanted, where we did the unnesting to turn it into a table, and we get the table that we want. Now we want to do an operation on this. In the example, this was doing the correlation between experience points and score. So we want to turn that into this other table. And we do that with a query that goes around the outside of that first one. So here, I'll just get to this story. So here's a long one. We're not going to go through this one in detail. But the way you read queries in SQL is not the way you read Python or-- like in my case-- Matlab or C or C sharp, right? Where you define stuff at the top of the file and then you kind of go down. In SQL you start here kind of at the bottom. Like, you read from the middle out, right? So that's the base query that's going into the table-- in our case, like, pulling out and unnesting stuff. And then you operate on that with a query that's above it. And then, you know, your final query you're doing a little bit of counting and then grouping by. But, like, realizing that this is how SQL worked was a total watershed moment for me. And so hopefully that's helpful for the folks who are, like, deeply steeped in writing games but not so steeped in writing SQL. Very, very helpful trick. All righty. So we'll go back to the code to finish number two. Actually we already finished number two. Let's go back to the slides and we'll move on. All right. So we covered the correlation. Next one is how do we break out score by game type? All right. So switching back to the demo. So you would think this one would be super easy, right? We just figured out this unnest. Unnest works beautifully when you've got two different nests in the same row. But this problem is-- sorry, this question is actually going to run us into a problem where game type is actually stored in a different row. So we'll have to see how we deal with that one. So here is the way to do it. So in a subquery, in this line, we're going to go into our event params table and pull out all of the event params where the key equals score. And so that's kind of what we did in the previous example. And now since we need game type we have to go-- we have to unnest again and pull out the game type. So you'll see this come up, especially in Felipe's section. And we'll go through this in a second in the demo. But when you-- so here we pull out event name. We unnest. So we've now turned these two nested fields into a really big table. And then we can also-- we can clean this up again. Give these some names to make the SQL a lot readable. By the way, these aliases-- super useful to make your SQL readable. It's fun. As a product manager working on BigQuery, I work with people who are literally born speaking SQL. And so they don't do stuff like this, because this all makes sense to them. And then they send me a query like, hey, Chad what do you think of this query? I'm like, I can't tell. So if you want your code readable, do that. That's a good thing. And so here we're using the sub select. So in this, how many lines of code? One, two, three, four, five, six, seven lines of code. We're doing four-- sorry-- three subselects and one select. So you can tell this trick of using subselects is really, really prevalent. So select from unnest to get the score. Select from unnest-- whoops, sorry-- select from unnest to get the type of game. And that is all executed inside this subquery. And then we're filtering to get the round completed. And then in the outermost query, we're doing the math and getting the average score for the game type. And so here-- down here, like, here's the result, right? So we see-- and again, this is the thing you take back to the developer team. If you're on the analyst team you say, look, folks are doing much better in the social game than they are in the solo game. Is that what we want? Is that the good thing, or is that the bad thing? But having this detail, you know, it's good. Especially if that causes you to make a change or to double down on a behavior that you like. All right. Make sense? Cool. Again, all the code is in GitHub. So you can sit with that as much as you like. All right. We'll go ahead and switch back to the slides, please. Let me just get this window set up for the next one. All right. Cool. So select from unnest. Let's go through a little bit of an example of how this operation really works. So here we are. So selecting event name from the events table, from this big table. And you see the gold bricks here represent the different key values in the array. So we'll go ahead and unnest event params and search on score. Come along. There we go. All right. So we unnest the key. And then we're going to filter that and just pick out where key equals score. There we go. And that will put in a score column and expand our table. So, again, turning the nested array into a table. And then we're going to do that again. All right. Now we're going to go in and get type of games. So still going into the event params, and this one we're going to pull out type of game. And that adds a column over here. And now this is kind of like a regular table. I find this one much easier to hold in my head conceptually. And then applying, you know, filters where event name equals round completed or where the game type equals whatever, like, that kind of operation in SQL is far more intuitive to me. So once you get through this part-- like once you get everything into this kind of tabular format-- then operating on it, I think, gets a lot easier. All right. We'll go back to the demo. Actually, to be honest, I think we've finished this part of the demo already. So why don't we stay on the slides, please. Yap. So we finished breaking out the score by game type. We're going to skip over four. Felipe is going to do five. So now we're going to literally cut right to the money, and we're going to do an analysis to figure out what are the patterns that end with spending virtual currency, right, since this is what makes the world go around in some circles. Okey-doke. Now we go back. Thanks. All right. Here, let me go ahead and maximize this so that everybody can see a bit easier. OK, so again with the select star. Just to show you what the report-- sorry-- what the result looks like. And we're looking for event names where the event is spend virtual currency. Here we go. That filter works. And then, like, as you see out here, like, event params has a bunch in the array. So let's start working with that. So even if you want to just take a quick look, you can-- pulling out event names, user pseudo ID and timestamp. And then ordering by pseudo ID and timestamp, you can start to get a look at what the things were that a particular pseudo-- or a user pseudo ID went ahead and did, right? Because we're ordering by timestamp, right, they initialized the API, session start, logging in. The problem with this is that-- I can't quite show-- is that you can actually get some mixed results when people-- or different people-- are interacting with the game. So you've got to be a little bit clever here. All right. So let's go back to the slides. We're going to introduce another topic calling lead-- are called Using LEAD to Peek Ahead. So this is a trick that we're going to employ in the demo in just a couple of minutes. So what we want to do here, right, because we're trying to identify-- like in the demo we're going to identify these strings of events. So the user did this, then the user did that, user did this. And this is paramount to, like, looking ahead a little bit in the database. So with our friends-- actually with the exception of Mortimer, this kind of looks like names from a pop band. I don't know Mortimer. Maybe Mortimer could be a good pop band name. Any case, we're going to take we're going to take this, and we're going to figure out how we look at the next age. So the command to do that is lead by name. So we're going to look ahead by one space over order by age. So we're going to order this table by age and then look ahead one space. So in memory, BigQuery creates this other table where it's ordered by-- there we go-- where it's ordered by age. And then you can see that as we look at Mortimer, the next oldest person in this list is Summer. So that's going to be the answer to our question of, like, what's the next one looking ahead in the Database And then we assign that. We can create a whole new column here as next oldest. So let's go back and see how that fits into our construct of doing this work in our example here. So-- whoops-- so here we go. So we have lead over-- so we're going to order by pseudo ID and event timestamp. And so you can see here that-- so when the event name is, like, initialize the API, looking ahead-- like, looking at the next event for that user ordered by timestamp-- you can see that the next event is session start. So you can actually start to get a look at a funnel for this particular user ID, right? But that's not-- like, we don't want to look at what, you know, Bob, Jane, Sally, Phil, whatever, what that person did. We want to look in aggregate and see what our users are doing and which one is the most popular funnel, or most popular path through our game that ends in spend virtual currency, right? Make sense? All right. Here's another trick, by the way. Including this partition by command guarantees that the segments of the table you're looking at are partitioned on the user's pseudo ID. If you don't put that, you're not guaranteed to get all the block of user that ends in BA5. You're not guaranteed to get all of them together in one chunk. All right. A very useful tip. I'm sure that's going to save somebody, like, one all-nighter of heartache right there. That's our good deed for the day, everyone. Well done, well done. Okey-doke. So we're going to peek ahead. Oh, yeah, and sorry-- a funnel is not super useful if it just has one step in it. So we're going to look at the event name, right? Eventually we're going to look at that event name that ends in spend virtual currency. And we'll look ahead one spot, and we'll call that one, you know, S1. And then we'll look ahead one other spot, and we'll call that one S2. So we'll get this nice little triplet of they did this, they did this, and then, boom, that ended with spend virtual currency. So here we are. Here's where we introduced the filter that defines the endpoint for our funnel. And that starts to generate these cool little triplets. They log in. They spun some slots. Spend virtual currency. So that's cool, right? Now we know a whole bunch of tricks. But the question now becomes, you know, great Chad. We now have like how many results? Like we now have 1,000-- here, down here at the bottom. We now have 1,057 different things that ended up with spend virtual currency. Which one is the most important? Well, fortunately with the tricks that we learned before, we can figure that one out. Here let me expand this a little bit. So what we're going to do is we're going to take that same query that we had before, and we're going to wrap it. We're going to make it a subquery. And we're going to wrap it in a larger one where we select the elements that we want. We count the ones. And then this code down here goes with the outer query, right, because it's outside. See, because it's outside of the parentheses. We'll order by count descending. So here we go. So here is the most popular or the most often used trajectory that ends in spin virtual currently, goes with select content, and round completed. Round completed, that's interesting. When they complete a round, they're more apt to spend virtual currency. There is a bit of interesting information that-- you know, now you can create these funnels and figure out where are the places that we need to invest. Or maybe you look at this and you're like, oh, you know what? I really shop page viewed. I'd really rather this be the funnel that drives the most value. What do we need to do to our design to push that one higher up on the list? OK, let's go back to the slides real quick. So we did that. Everybody gets a checkmark for number six. Well done, everyone. You've accomplished something today. And now we're going to kick it right over to Felipe, who's going to show you how to take one of these queries and customize the dashboard with it. So over to you, sir. FELIPE HOFFA: Thank you, Chad. Hey, I'm loud now. Thank you, Chad. How are you doing so far? Good. Yes, I love Chad. I love Eugene. I try to be quick, because I love speaking. I'm a developer advocate. But today you have the chance to ask questions to my favorite product managers, so please get your questions ready. Who is familiar with Data Studio? Good. Good, good, good. Are you using Data Studio with Firebase already? OK, let me show you. So we've been writing a lot of queries. Just for the ones that don't know data Studio, I want to connect it to BigQuery, and show you how to extend the two with the queries we've been running. So if you look for the Firebase Data Studio report template, you will find this page. And this will point you straight to the template. And the template is pretty nice. It's connected to a sample data set. You can play with it. You can run your filters. You can-- Someone already figured out interesting visualizations to do with existing data-- dashboards, advanced, conversions. Now, this doesn't have my data. So if I want to pull-- connect it to my data set in the query, I just need to click use template. And instead of connecting it to an existing data source, I can create a new data source where I choose BigQuery. I choose my billing project. I choose mine, where my data is. And then I get the template that I can edit and change in any way that I want. And I'm getting visual results. I can share this with my team. I can let my team filter by whatever they want. Let's get all the Android results. Perfect. This goes on the background to BigQuery [INAUDIBLE] things. So let's look at events. And here I can search for an event name. There's post score, for example. And I can only look at those events and whatever name I want. Where the score, for example, is wherever I want. And this is pretty cool. Now I don't need to write SQL queries anymore. Except that there are many things that I cannot solve just with an interactive report. Sometimes I want to write smarter queries, more complex queries. And we can get back to BigQuery for that. So if I get back to BigQuery, one of the cool things you will notice is as Data Studio is running queries on BigQuery on your behalf, you get to see all the queries that Data Studio is running. So I can open the queries here, and I can-- let me format this query. You can see the queries that Data Studio is writing. You can use this to write your own queries. Even better than that, you can figure out sometimes how the unnests are done. Now in this case, in Data Studio I could not filter for two different properties. So I wrote my own query. Here I'm looking for three things. I'm looking for an event. I only want to look at the event name when the round is completed. And I want to be able to look at the score to filter out if people are spending money or not and how many powers they have. And I can run the query here. And these are my results. Like, when the round is completed, my first result I had a score of 4,000. This person did not spend any money. And this person had this many powers. And my next challenge here is that I want to group this. I want to count. I want to get my averages. Grouping by is a spender is pretty easy. I can just group by is a spender or not. But how do I differentiate if I have people that use a lot of powers or people that are not using a lot of powers? First you'll notice that-- as Chad showed you earlier-- I'm running my unnest. Not at the from level. I'm not doing joins between the unnest. I'm just doing these mini selects at the select level where I go through these properties. And I just choose the ones I want. And then with these lines that I will uncomment here, I'm getting the average score. I want to know what score are people getting. Oh, how did I click that? I don't know. So I'm here getting the average score. I'm getting the variable if they are spending or not. And I'm differentiating between people that have a lot of powers or not with an if sentence. If they have less than 20 powers, it's few powers. If they have more than that, it's many powers. And I will be able to group by those valuables. So let's see our people here. Let me run the query. Here I have my results. This is the average score for people that are not spenders. And they are divided by are they using few powers or many powers. And these results are hard to understand. Because I have the numbers here, but it would be really nice if I could visualize them. And now in the new BigQuery UI, we have this handy button, Exploring Data Studio. And whatever queries I'm running, I can bring back to Data Studio and just start visualizing my results. So let me do a scatter chart here. Let me get on the X side I will get the average score. Let me add in another column here. I just want a count. This is my count of how many people I have in every block. Because, yeah, I have my four groups. But I don't have enough data. Like, it would be nice to see how many people we have in each group. Let me go back to exploring Data Studio. Again, I'm getting my results here. I'm going to do a scattered chart. On X, I want to get the count of people. On Y, I want to get the average score. And to make it easy to differentiate between people that use a lot of powers or not, I can add that dimension here. So now I have my four different groups visualized. Now I can even style this to get these people in a different color, depending on if they are spenders or not. Let's do it like this. And now my results make sense. Like, who's getting the highest average score? People that have spent money. So it's good to spend money. And that of many powers. Even then people have spent money and don't have money-- but only have few powers-- you can visualize here that get lower scores than people that are not spenders and have the situation. Now, this is a pretty visualization, and I want to use it in my larger dashboard. You can just save this. You can do-- let me wait until this saves. It saves. Now, I can export the report to an existing report. The one we just created here I will edit it. I will create a new page, because every page is super full of data already. And here I can just Control, Paste, my new visualization. And that's how simple it is to go from Data Studio to the BigQuery, create your queries, visualize them, and paste it in your new reports. I think it's pretty, pretty powerful. And then the rest of your team can come here, use it, and they don't need to learn anything about how to query BigQuery. We just do this for them. Two important concepts here is that in this report every time someone changes one variable, filters by different features, that's going back to BigQuery. That is taking time. That takes some cost. And if you have a lot of people looking at this, your costs could go up. There are two things that you should do. One, is when you're using BigQuery, please set up your cost controls. We have nice features there that will limit how much you're spending. You won't wake up to any surprise. Also there there's a lot of tricks to make the spend use when you're using Data Studio to go almost to zero. I made a presentation about that two days ago. It's already on YouTube. Look for it on YouTube or on my Twitter account, @FelipeHoffa. [MUSIC PLAYING]
Info
Channel: Google Cloud Tech
Views: 9,476
Rating: undefined out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate
Id: L5J6RwQ_Y-M
Channel Id: undefined
Length: 46min 23sec (2783 seconds)
Published: Thu Jul 26 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.