[UPBEAT MUSIC PLAYING] CHAD JENNINGS: Hey, everyone. Thanks for coming. My name is Chad. I'm a product
manager on BigQuery. And thanks for spending the
next hour of your lives with us. We're going to-- I've got joined by a
couple of colleagues here, and we're going to spend
the next roughly 40, 45 minutes going
through Firebase Analytics, the Firebase
Analytics schema, and showing you some
really slick SQL tricks to be able to deal
with the schema. So how many folks
here are, like, primarily Firebase
Analytics people? Cool. How many people here are
primarily SQL analytics folks? OK. This talk is for you. And it's a 300-level
talk, so we're going to be swapping between
demos, and lots of SQL code, and back and forth between
that and a couple slides. And there's very little
marketecture in this talk, right? [LAUGHTER] Yeah, that generally
gets a shout out. All right, so here's what we're
going to do with the session. We're going to go through
the schema in detail. It's a highly nested
and repeated schema. And so if you don't know the
tricks about working with that, it can be a little bit daunting. But hopefully at the
end of this session you will have completely
demystified it. We're going to go through a
whole bunch of queries that are outside the scope of the
standard Firebase Analytics dashboard, and then
cover the SQL techniques. And then at the
end, Felipe Hoffa is going to go through how to
take the Firebase Analytics template that's
available to everybody and then customize it with
whatever query you want. And he'll do an example from
the session here, all right? So your presenters are
me, I'm on BigQuery. Eugene Girard is Product
Manager for Firebase. And then Felipe Hoffa, literally
if you type Mr. BigQuery into Google, you get him. All righty, I won't
spend a ton of time on these kind of
introductory slides. But for the benefit of
those watching on YouTube, and for those who
don't know, we'll just do a little bit
of high-level stuff. So Google Analytics for
Firebase is the agent by which we collect data
that comes from your app, do some aggregates,
and then produce the stats that you can then
render inside the template or wherever you want. But you have a question. One of the problems that
folks encounter here is what if you get
to a question that's not contained in
that template or what if you get to our
question that's kind of outside what Firebase
produces out of the box. What happens if
you want something that's not on this dashboard? And that one is enter BigQuery. So BigQuery is Google Cloud
platform's enterprise data warehouse. It's where big data comes
and lives and thrives. It goes super fast over
super big amounts of data. It also goes really fast
over smaller amounts of data. So Firebase is an analyst from
folks working over gigabytes to fix folks working
over petabytes, works equally well for both use cases. Couple of details here. It's server-less, which
means you don't have to worry about provisioning resources. You just pipe your data in
and bring your workloads, write your queries,
and go for it. So you don't have
to spin up nodes. We take care of all of that
in the background for you. This one is the only call
it marketecture slide. And I actually think this is the
most useful marketecture slide that we have, because GCP
has over one hun-- yeah, now the phones come out, right? [LAUGHTER] GCP has over a hundred
products right now. So there are over a hundred
little blue hexagons. And I don't know
about you all, but I can get a little
bit confused when I'm trying to interpret
what these icons mean. So I wanted to put
this up here to show you kind of the spectrum of
storage and database products that we have and
where BigQuery sits. So BigQuery is over here as
the Enterprise Data Warehouse. So it behaves somewhat
like a relational database, except you can go
arbitrarily big. Really good for analytics,
really good for porting, really good for dashboarding. So yeah, the old story was this. New story is this. And you can ask it kind
of any question you want. You don't have to do indexes. You don't have to pre-aggregate. You can do full table scans
on whatever question you want. Cool. So with that, we're going to get
into the next part of the talk. So Eugene is going
to come up, and he's going to talk a
little bit about SQL and then go through the schema. All right, tag. You're it. EUGENE GIRARD: Awesome. Thanks, Chad. So obviously, we're
talking about BigQuery, so we're going to be
talking about SQL here. And the important
thing about SQL is that you've got to
know your data if you're going to make use of it. Little-- thank you. So when I started learning
SQL, way back in university, I ran a lot of queries that
kind of looked like this where you select some stuff from
a bunch of rows and columns, and you do a select
statement that said which column, which
rows you were actually interested in. And then you add it
up with something down at the bottom
that kind of showed you what you're looking at. The thing there is
that it was always kind of a very grid-like look. You have all these rows
and columns of data, and you kind of
know exactly what you're going to get out of it. Now that's got me through CS
322 just fine a few years ago. The world's changed, and we've
got a slightly more robust and capable engine in BigQuery
that can represent a lot more than just rows and columns. So a lot of these things are
things like key value pairs. There are things like
records or objects. There's things like arrays
of objects or records or key value pairs
or all those things. And all of that gets
a little involved. So you have to really
understand your schema if you're going
to be able to make use of a big table like this. So this is the schema that we
generate as part of analytics from Firebase. And the thing here, we're going
to go through a few of these. A lot of these things we
really don't care about today. And you can kind of ignore
the parts you don't need, that you're not
using at the moment. But we're going to focus
on a couple of these. So right off the bat, there's
a couple of key value pairs that are really important. There's things like the date
and the timestamp, and also the event name. Every event that the developer
emits in their Firebase app gets its own unique
kind of schema here. And we always key
off the event name to say what kind of an
event are we looking at. Afterwards, there's a
bunch of event parameters. This is an array. You can throw in a many
things you want here. And in those, there's
a name for the key. So that says-- for
example, here we're looking at the final score
when your level is done. And that's going to have an
int value associated with it. Just to make these easier
to use we throw in, like, four different values. Only one of those is ever
going to be non-null. Every single final score
will always be using the end, and all the others
are going to be null. And something like
a level name we'd be using the string value and
all the others would be null. So this makes it very
predictable from event to event. But the actual values
that are filled in you're going to have to kind of know
your schema for your own events that you're generating
in your app. And this is where your own app
can specialize and generate the kind of events
that you care about. So we also have a user ID. This is something that the
app can set if they choose to. And we also generated a
pseudo ID for the user. The pseudo ID is unique
to that user for one run of the app on one device. So this is what we use to tie
in a bunch of events together to say that they're from
the same run of that app. And this is the one that
we're keying in all the time. The user ID, again,
it's optional if you use that or not. In our sample data we've
actually nulled all of those out, because we don't want any
PI leaking out in our demos. So we have a slightly
sanitized, pseudonymized version of the data that we're
going to be working with. There's also a bunch
of user properties that get associated with
that row in the database. And the nice thing about
throwing the user properties here is that those
properties change a lot over the run of an app. And this way you can
see what the properties were at the moment that
that particular event fired. And that makes it
really easy to have complex things like the
experience level, which for this user is
going to change. Basically every time
there's an event, their XP is going to be slightly
different at that point. So it lets you see at
that moment in time what the values were. We also have whether they're
a premium player or not. All of this data is up
to the app developer to decide what they
want to stow into this. We also track a bunch
of useful metrics. Firebase supports a lot
of different devices and a lot of
different platforms. So it will let you track
down where the users are, what devices they're using,
where they came from, which exact version of
your app they're using. If you're doing
A/B testing, that would also show up
there, the platform that you're running on. All of this kind that
goes into the app itself. And so every app
that you write is going to have its
own unique concerns about what the user's up
to or what it's doing. And these are the kind of
things that you're going to want to ask questions about. So here's an example of
the Bingo Blast game. It's a game that's
out in the app store. You can run it and play it. It's actually kind of fun. But we use that to demonstrate
some of the capabilities of Firebase. It happens to spit out
a couple of events. So every time you finish
around in Bingo Blast we create a round
completed event. And we tie-in a bunch of
things like the score. Whether the game was a
solo or a social game-- the type of game
that you're playing. We also keep track the user
data down at the bottom, whether or not this user happens
to be signed in right now. And if they are, we want to
know whether they have ever spent money on the game. Because at the end
of the day, that's a very interesting
metric to be looking at. We also keep track of things
like XP, which is really a measure of how long they've
been playing the game, and how experienced
you'd expect them to be. So with all of that
background, I'm going to turn it
back over to Chad and ask him to walk you through
how to use this kind of data. CHAD JENNINGS: The next
section of the talk-- thanks Eugene. The next section of
the talk is we're going to go through a
series of questions, I think back to
the deck, please. We're going to do
questions at the end. Oh, and also-- we'll
remind you of this-- when you do questions,
come up and use the mic. But we'll do them
all at the end. So speaking of questions, we're
going to go through a few. These are the questions
we're going to do. We'll start-- we'll
do them in order. These are crafted
in such a way as to literally run
us into a problem, and then show you how
to fix the problem. Run us into a new problem. But we've played-- oh, and all
of the code for all of these is in the GitHub repo. And the link to that repo will
be at the end of the talk. But to save a little bit of
time for Felipe's section on Augmenting Data
Studio Templates, we're going to take question
four about filtering by a range of properties
and question seven about closed funnels,
and we're just going to leave them for
whoever wants to play with that in the GitHub repo. The codes already there. And what we're
going to do is we're going to do number eight,
Customizing Data Studio Template. And we're going to do that
with question number five. So the bulk of the
talk is just running through this henceforth. Okey-doke. Now, let's switch to the demo. Okey-doke. So here is the BigQuery UI. And what we're
going to do is we're going to run through a series
of queries to get to the answer to question number one. So for those not
familiar with it, this is the query composition. Oh, let me just do
an eyesight check. So for the folks in the back of
the room, is this big enough? I can make it bigger. Would anybody like this larger? OK, no hands. We'll go with this. So anyway, this is where the-- this is where you
compose the queries. I've run all of
these in advance, so the results are already here. But what I want to show with
this one-- so select star, also this is a great thing to
do for demonstrations. I don't recommend selecting
all the columns, right? BigQuery is a columnar store. Just pick the ones you want. Because select star
queries can get expensive. So not a best practice, just
an example for the talk. And so when you
run this query, you can see that we've got
an event right here. And then I'll show you how
that nested and repeated field, right? And so here is the
country, the device. And all the way out here you can
start to see that in this row there are a bunch of these other
nested and repeated fields. All right. So how do you start
selecting these things? Okey-doke. So first off, what I'm going to
do, like, just a simple where. We're going to select
round completed. OK, so down here on the query
window, let's see I have to-- for some reason it's not
showing the-- oh, there we go. For some reason it's not
showing me the scroll bar. If it showed-- it
was showing me the-- let's see if I can
get that to refresh. Well, if it was showing
me-- there it is. All right. So event type, like, here are
the basic parameters here. Here's round completed. But then still we've
got all the events, like the possible event keys. And you can see that,
right, this round completed. There is data here
for bingos claimed. And then further out-- I won't bother
scrolling out here but-- I guess it will-- further out here we've
got the user events. So that's still--
like, the result here is still a nested
and repeated field. So what we want to
do is let's just pick out all of the event-- or the event params where the
key is equal to score, right? We want to find out-- like, the goal of this question
is, like, what are the scores? And let's do some stats on the
scores that people are playing. And as you can tell, right,
we're getting an error here. So why are we getting
an error here, right? One of the event
parameters is score. But the problem is event
params is an array. So you can't-- you have to
do something to event params first. All right. So let's go ahead and
switch back to the slides. And we'll walk through what you
have to do with event params to get there. Can we switch back to
the slides, please? All righty. So here's event params. And here's the problem,
there's no .key property. It's part of this. So we have to
extract that first. And so unnest to the rescue. So for those of you
who are SQL-minded, unnest acts like a cross join. For those of you who
are table-minded, unnest turns nested and
repeated into a table. So let's see how that works. So in this example, we want
to find all the starships where crew member equals Zoe. So if you do it like
this, this is what we just did in the BigQuery UI. If you want to do this,
you're going to get an error. So you've got to
unnest it first. So select star from starships. That was what we did in
the very first query. And if you cross join-- if you wrote this out, right,
this would actually work. You know, so this will
actually produce this table where you blow out--
you create a new row for every member of the
event params with crew names. And then you can filter on
that by picking the crew member that you want. But they're simpler syntax
to do that where you can just use this unnest command. Again, all of this code
is in the GitHub repo. So feel free to take pictures,
but you have access to it at the end. And now you can filter on the
parameter you want, right? So instead of just doing this
filter on crew member as that first-- like, in that
first letter query-- unnest first, and then you can
filter on what you want, right? Turn the array into
a table with unnest, and then you filter
the way you like. There we go. And who would have thought
that Zoe likes wingspans of 32.4 whatever the units are. OK, so in the context
of our example here we want to do the
same thing, right? We want to pick the score param. So unnest event params,
and then filter-- or do the where-- on score, right? And that filters out
the row that we want. OK, oh, and then here
one thing I should say. If you select star here, you're
going to get the event params array in your answer. And that can just make your
answer a little bit messy, like a little larger
than you need. So there's a neat tip
here where you select star except for the event params,
and you just cut that out. Up to you. I thought that was a cool trick. All right. Let's go back to
the demo, and we'll get to where we want to go. So here's that same
command where we unnest. And then you can see what's
happened here to the results, right? The results now
look like a table. And then we can look-- what we're going
to do next is we'll look through the
param.key key and find all of the score events. All right, so here it is. So unnesting where event
name equals round completed and where the param
key equals score. And there we have it. OK, so that's pretty cool. So that's how you deal with
the top level of schema. But remember, the goal was
we want to do something with these numbers. So we'll go ahead and select. Instead of just the
event param, we're going to select the-- like,
instead of select star, we'll pull out the stuff
that we actually want. And as Eugene showed
you, right, we'll just pull out the
int value for score, since that's the one that's
not null for this parameter. OK, cool. And then we can actually
do some arithmetic on it right in the select. So here we have--
so we're calculating the average of the int value. And we're calculating
the standard deviation, and then returning those. So, I mean, as
analysts on a game, you can then run this back
to your game developers and say you know, hey, the
standard deviation is way too high. I don't know, maybe that
translates to a bad user experience in some way. But this is data
that you can start to use to see whatever
you want to see. In any can, let us now-- let's see what the
message here is. We're going to go ahead
and start number two. Let's go ahead and switch
back to the slides, please. Cool. All right. So finished number one. On to number two with,
how does this score correlate with the
user's experience level? OK, back to the demo, please. Okey-doke. So starting with the same
query that we had before-- right, this is
exactly the same one-- we want to correlate this
with user's experience. So we're going to
have-- so remember, the event parameters
and the user experiences were in two different-- remember
from what Eugene was talking about-- those are in
two different nests inside a single row. So we're going to have to unnest
that second parameter here-- or sorry-- that second
array I should say. Well, boy-- like, not
clairvoyance, right. I just managed to
get that right. And so here we go. We're going to unnest
user properties. And then you can see
that that blows out, or it just increases
the scope of the table. So the answers include the
user properties as well as the event properties,
again, filtered by a score. So I'm saving myself the trouble
of running every query here live. But now how do we
correlate between the two? All right. So next step is go ahead and
filter by the user property that you want. So we wanted to correlate
score with experience points. So we'll go ahead
and filter those out. So we get the table. So for round completed
events, here's the score. Here's the user
experience points. And, yeah, this step-- oh, yeah, we should
point out almost all of the user parameters
come back as strings. So before we actually do
some computation on it we have to cast the
string as an int, and call that experience points. And here is the beginning
of the next one. Sorry, here's the beginning
of the next trick. We'll go through this one
in a little bit of detail in just a second, but
I'll introduce it here. So in that last query, right,
we selected event name, and then the params as the score. Then we cast the
experience points into it. So we've got all the
data that we want here, but now you have to
calculate the correlation between the two. And for those of
you who aren't SQL, or who weren't born
speaking SQL like me, a very clever way to do
that-- and for those-- the other problem with
this construct in SQL is it's kind of
counter-intuitive if you've learned another
programming language first. And so what this is
called is a sub select. So what I've highlighted
here is the query that generated the
table that we wanted. And then you can take
that inside a SQL query and write a query
on top of that, which is this top row here. And then do operations on
the result of the subquery. So that's kind of cool. We'll go through that in
a little bit of detail. But here's the correlation
between experience points and score. So you can see that folks
with a lot of experience have a slightly positive
correlation to better scores. I don't know, bingo
is mostly random. But if you get the hang of
it, you're better than random. I guess that's what that says. So for those of you who
gamble on bingo, beware. That was a joke everyone. Thank you. All right. Let's go back to the demo. I'll just queue up the
next query-- oh, sorry, go back to the slides. And we'll go through
what the sub selects are. So Yeah, correlation of
score with experience. Subqueries make life easier. So here we'll go through it. So we have this
schema for Google-- yeah, the schema for Firebase
Analytics inside of BigQuery. And then you do some
operation on it, right? That was that query where
we pulled out the parameters that we wanted, where
we did the unnesting to turn it into a table, and
we get the table that we want. Now we want to do an
operation on this. In the example, this was
doing the correlation between experience
points and score. So we want to turn that
into this other table. And we do that with
a query that goes around the outside
of that first one. So here, I'll just
get to this story. So here's a long one. We're not going to go
through this one in detail. But the way you read queries
in SQL is not the way you read Python or-- like in my case-- Matlab or C or C sharp, right? Where you define stuff at
the top of the file and then you kind of go down. In SQL you start here
kind of at the bottom. Like, you read from
the middle out, right? So that's the base query
that's going into the table-- in our case, like, pulling
out and unnesting stuff. And then you operate on that
with a query that's above it. And then, you know,
your final query you're doing a little bit of
counting and then grouping by. But, like, realizing that
this is how SQL worked was a total watershed
moment for me. And so hopefully that's helpful
for the folks who are, like, deeply steeped in writing
games but not so steeped in writing SQL. Very, very helpful trick. All righty. So we'll go back to the
code to finish number two. Actually we already
finished number two. Let's go back to the
slides and we'll move on. All right. So we covered the correlation. Next one is how do we break
out score by game type? All right. So switching back to the demo. So you would think this one
would be super easy, right? We just figured out this unnest. Unnest works
beautifully when you've got two different
nests in the same row. But this problem is-- sorry, this question is actually
going to run us into a problem where game type is actually
stored in a different row. So we'll have to see how
we deal with that one. So here is the way to do it. So in a subquery,
in this line, we're going to go into our
event params table and pull out all
of the event params where the key equals score. And so that's kind of what we
did in the previous example. And now since we
need game type we have to go-- we
have to unnest again and pull out the game type. So you'll see this come up,
especially in Felipe's section. And we'll go through this
in a second in the demo. But when you-- so here
we pull out event name. We unnest. So we've now turned
these two nested fields into a really big table. And then we can also-- we can clean this up again. Give these some names to
make the SQL a lot readable. By the way, these aliases-- super useful to make
your SQL readable. It's fun. As a product manager
working on BigQuery, I work with people who are
literally born speaking SQL. And so they don't do stuff
like this, because this all makes sense to them. And then they send
me a query like, hey, Chad what do you
think of this query? I'm like, I can't tell. So if you want your
code readable, do that. That's a good thing. And so here we're
using the sub select. So in this, how
many lines of code? One, two, three, four, five,
six, seven lines of code. We're doing four-- sorry--
three subselects and one select. So you can tell this
trick of using subselects is really, really prevalent. So select from unnest
to get the score. Select from unnest--
whoops, sorry-- select from unnest to
get the type of game. And that is all executed
inside this subquery. And then we're filtering
to get the round completed. And then in the outermost
query, we're doing the math and getting the average
score for the game type. And so here-- down here, like,
here's the result, right? So we see-- and again, this
is the thing you take back to the developer team. If you're on the
analyst team you say, look, folks are doing
much better in the social game than they are in the solo game. Is that what we want? Is that the good thing,
or is that the bad thing? But having this detail,
you know, it's good. Especially if that causes
you to make a change or to double down on a
behavior that you like. All right. Make sense? Cool. Again, all the
code is in GitHub. So you can sit with that
as much as you like. All right. We'll go ahead and switch
back to the slides, please. Let me just get this window
set up for the next one. All right. Cool. So select from unnest. Let's go through a little
bit of an example of how this operation really works. So here we are. So selecting event name
from the events table, from this big table. And you see the gold
bricks here represent the different key
values in the array. So we'll go ahead and unnest
event params and search on score. Come along. There we go. All right. So we unnest the key. And then we're going to
filter that and just pick out where key equals score. There we go. And that will put in a score
column and expand our table. So, again, turning the
nested array into a table. And then we're going
to do that again. All right. Now we're going to go in
and get type of games. So still going into
the event params, and this one we're going
to pull out type of game. And that adds a
column over here. And now this is kind of
like a regular table. I find this one much easier to
hold in my head conceptually. And then applying,
you know, filters where event name
equals round completed or where the game
type equals whatever, like, that kind of
operation in SQL is far more intuitive to me. So once you get
through this part-- like once you get everything
into this kind of tabular format-- then operating on it, I
think, gets a lot easier. All right. We'll go back to the demo. Actually, to be honest,
I think we've finished this part of the demo already. So why don't we stay
on the slides, please. Yap. So we finished breaking
out the score by game type. We're going to skip over four. Felipe is going to do five. So now we're going to literally
cut right to the money, and we're going
to do an analysis to figure out what are the
patterns that end with spending virtual currency, right, since
this is what makes the world go around in some circles. Okey-doke. Now we go back. Thanks. All right. Here, let me go ahead
and maximize this so that everybody
can see a bit easier. OK, so again with
the select star. Just to show you
what the report-- sorry-- what the
result looks like. And we're looking
for event names where the event is
spend virtual currency. Here we go. That filter works. And then, like, as
you see out here, like, event params has
a bunch in the array. So let's start
working with that. So even if you want to just
take a quick look, you can-- pulling out event names,
user pseudo ID and timestamp. And then ordering by
pseudo ID and timestamp, you can start to get a look
at what the things were that a particular pseudo--
or a user pseudo ID went ahead and did, right? Because we're
ordering by timestamp, right, they initialized the
API, session start, logging in. The problem with this is that-- I can't quite show-- is that you can actually
get some mixed results when people-- or different people-- are
interacting with the game. So you've got to be a
little bit clever here. All right. So let's go back to the slides. We're going to introduce
another topic calling lead-- are called Using
LEAD to Peek Ahead. So this is a trick
that we're going to employ in the demo in
just a couple of minutes. So what we want
to do here, right, because we're trying to
identify-- like in the demo we're going to identify
these strings of events. So the user did this, then the
user did that, user did this. And this is paramount to, like,
looking ahead a little bit in the database. So with our friends-- actually with the exception of
Mortimer, this kind of looks like names from a pop band. I don't know Mortimer. Maybe Mortimer could be
a good pop band name. Any case, we're going to take
we're going to take this, and we're going to figure out
how we look at the next age. So the command to do
that is lead by name. So we're going to look ahead
by one space over order by age. So we're going to
order this table by age and then look ahead one space. So in memory, BigQuery
creates this other table where it's ordered by-- there we go-- where
it's ordered by age. And then you can see that
as we look at Mortimer, the next oldest person
in this list is Summer. So that's going to be the
answer to our question of, like, what's the next one
looking ahead in the Database And then we assign that. We can create a whole new
column here as next oldest. So let's go back
and see how that fits into our construct of doing
this work in our example here. So-- whoops-- so here we go. So we have lead over-- so we're going to order by
pseudo ID and event timestamp. And so you can see here that-- so when the event name is,
like, initialize the API, looking ahead-- like, looking
at the next event for that user ordered by timestamp-- you can see that the next
event is session start. So you can actually
start to get a look at a funnel for this
particular user ID, right? But that's not-- like, we
don't want to look at what, you know, Bob, Jane,
Sally, Phil, whatever, what that person did. We want to look in aggregate
and see what our users are doing and which one is the most
popular funnel, or most popular path through
our game that ends in spend virtual
currency, right? Make sense? All right. Here's another
trick, by the way. Including this
partition by command guarantees that the segments
of the table you're looking at are partitioned on
the user's pseudo ID. If you don't put
that, you're not guaranteed to get all the
block of user that ends in BA5. You're not guaranteed to get all
of them together in one chunk. All right. A very useful tip. I'm sure that's going to save
somebody, like, one all-nighter of heartache right there. That's our good deed
for the day, everyone. Well done, well done. Okey-doke. So we're going to peek ahead. Oh, yeah, and sorry-- a funnel is not super useful
if it just has one step in it. So we're going to look
at the event name, right? Eventually we're going to
look at that event name that ends in spend virtual currency. And we'll look ahead one
spot, and we'll call that one, you know, S1. And then we'll look
ahead one other spot, and we'll call that one S2. So we'll get this nice little
triplet of they did this, they did this, and then,
boom, that ended with spend virtual currency. So here we are. Here's where we introduced
the filter that defines the endpoint for our funnel. And that starts to generate
these cool little triplets. They log in. They spun some slots. Spend virtual currency. So that's cool, right? Now we know a whole
bunch of tricks. But the question now becomes,
you know, great Chad. We now have like
how many results? Like we now have 1,000--
here, down here at the bottom. We now have 1,057 different
things that ended up with spend virtual currency. Which one is the most important? Well, fortunately
with the tricks that we learned before, we
can figure that one out. Here let me expand
this a little bit. So what we're going
to do is we're going to take that same
query that we had before, and we're going to wrap it. We're going to
make it a subquery. And we're going to wrap it in
a larger one where we select the elements that we want. We count the ones. And then this code down here
goes with the outer query, right, because it's outside. See, because it's outside
of the parentheses. We'll order by count descending. So here we go. So here is the most popular or
the most often used trajectory that ends in spin
virtual currently, goes with select content,
and round completed. Round completed,
that's interesting. When they complete
a round, they're more apt to spend
virtual currency. There is a bit of interesting
information that-- you know, now you can
create these funnels and figure out where are the
places that we need to invest. Or maybe you look at this and
you're like, oh, you know what? I really shop page viewed. I'd really rather
this be the funnel that drives the most value. What do we need to do to
our design to push that one higher up on the list? OK, let's go back to
the slides real quick. So we did that. Everybody gets a
checkmark for number six. Well done, everyone. You've accomplished
something today. And now we're going
to kick it right over to Felipe, who's
going to show you how to take one of these
queries and customize the dashboard with it. So over to you, sir. FELIPE HOFFA: Thank you, Chad. Hey, I'm loud now. Thank you, Chad. How are you doing so far? Good. Yes, I love Chad. I love Eugene. I try to be quick,
because I love speaking. I'm a developer advocate. But today you have the
chance to ask questions to my favorite product
managers, so please get your questions ready. Who is familiar
with Data Studio? Good. Good, good, good. Are you using Data Studio
with Firebase already? OK, let me show you. So we've been writing
a lot of queries. Just for the ones that
don't know data Studio, I want to connect
it to BigQuery, and show you how to extend
the two with the queries we've been running. So if you look for the Firebase
Data Studio report template, you will find this page. And this will point you
straight to the template. And the template is pretty nice. It's connected to
a sample data set. You can play with it. You can run your filters. You can-- Someone already figured out
interesting visualizations to do with existing data-- dashboards, advanced,
conversions. Now, this doesn't have my data. So if I want to pull-- connect
it to my data set in the query, I just need to
click use template. And instead of connecting it
to an existing data source, I can create a new data source
where I choose BigQuery. I choose my billing project. I choose mine, where my data is. And then I get the template
that I can edit and change in any way that I want. And I'm getting visual results. I can share this with my team. I can let my team filter
by whatever they want. Let's get all the
Android results. Perfect. This goes on the background to
BigQuery [INAUDIBLE] things. So let's look at events. And here I can search
for an event name. There's post score, for example. And I can only look
at those events and whatever name I want. Where the score, for
example, is wherever I want. And this is pretty cool. Now I don't need to write
SQL queries anymore. Except that there are many
things that I cannot solve just with an interactive report. Sometimes I want
to write smarter queries, more complex queries. And we can get back
to BigQuery for that. So if I get back
to BigQuery, one of the cool things
you will notice is as Data Studio
is running queries on BigQuery on your
behalf, you get to see all the queries that
Data Studio is running. So I can open the
queries here, and I can-- let me format this query. You can see the queries
that Data Studio is writing. You can use this to
write your own queries. Even better than that, you
can figure out sometimes how the unnests are done. Now in this case,
in Data Studio I could not filter for two
different properties. So I wrote my own query. Here I'm looking
for three things. I'm looking for an event. I only want to look
at the event name when the round is completed. And I want to be able
to look at the score to filter out if people
are spending money or not and how many
powers they have. And I can run the query here. And these are my results. Like, when the
round is completed, my first result I
had a score of 4,000. This person did not
spend any money. And this person had
this many powers. And my next challenge here
is that I want to group this. I want to count. I want to get my averages. Grouping by is a
spender is pretty easy. I can just group by
is a spender or not. But how do I
differentiate if I have people that use a lot
of powers or people that are not using
a lot of powers? First you'll notice that-- as Chad showed you earlier-- I'm running my unnest. Not at the from level. I'm not doing joins
between the unnest. I'm just doing these mini
selects at the select level where I go through
these properties. And I just choose
the ones I want. And then with these lines
that I will uncomment here, I'm getting the average score. I want to know what
score are people getting. Oh, how did I click that? I don't know. So I'm here getting
the average score. I'm getting the variable if
they are spending or not. And I'm differentiating
between people that have a lot of powers
or not with an if sentence. If they have less than 20
powers, it's few powers. If they have more than
that, it's many powers. And I will be able to
group by those valuables. So let's see our people here. Let me run the query. Here I have my results. This is the average score for
people that are not spenders. And they are divided
by are they using few powers or many powers. And these results are
hard to understand. Because I have the
numbers here, but it would be really nice
if I could visualize them. And now in the new
BigQuery UI, we have this handy button,
Exploring Data Studio. And whatever
queries I'm running, I can bring back to
Data Studio and just start visualizing my results. So let me do a
scatter chart here. Let me get on the X side I
will get the average score. Let me add in
another column here. I just want a count. This is my count of how many
people I have in every block. Because, yeah, I
have my four groups. But I don't have enough data. Like, it would be nice
to see how many people we have in each group. Let me go back to
exploring Data Studio. Again, I'm getting
my results here. I'm going to do a
scattered chart. On X, I want to get
the count of people. On Y, I want to get
the average score. And to make it easy to
differentiate between people that use a lot of powers or not,
I can add that dimension here. So now I have my four
different groups visualized. Now I can even style
this to get these people in a different color, depending
on if they are spenders or not. Let's do it like this. And now my results make sense. Like, who's getting the
highest average score? People that have spent money. So it's good to spend money. And that of many powers. Even then people have spent
money and don't have money-- but only have few powers-- you can visualize here
that get lower scores than people that
are not spenders and have the situation. Now, this is a
pretty visualization, and I want to use it
in my larger dashboard. You can just save this. You can do-- let me
wait until this saves. It saves. Now, I can export the report
to an existing report. The one we just created
here I will edit it. I will create a new
page, because every page is super full of data already. And here I can just Control,
Paste, my new visualization. And that's how simple it
is to go from Data Studio to the BigQuery,
create your queries, visualize them, and paste
it in your new reports. I think it's pretty,
pretty powerful. And then the rest of your
team can come here, use it, and they don't need
to learn anything about how to query BigQuery. We just do this for them. Two important concepts
here is that in this report every time someone changes
one variable, filters by different features, that's
going back to BigQuery. That is taking time. That takes some cost. And if you have a lot of
people looking at this, your costs could go up. There are two things
that you should do. One, is when you're
using BigQuery, please set up your
cost controls. We have nice features
there that will limit how much you're spending. You won't wake up
to any surprise. Also there there's
a lot of tricks to make the spend use when
you're using Data Studio to go almost to zero. I made a presentation
about that two days ago. It's already on YouTube. Look for it on YouTube or on my
Twitter account, @FelipeHoffa. [MUSIC PLAYING]
Todd's queries: https://gist.github.com/ToddKerpelman/eec3e60c0a5e0bd720d8991bd45487cb