[THEME MUSIC] FELIPE HOFFA: Are
you ready for this? AUDIENCE: Yeah. Yes, excellent. Welcome to-- MINHAZ KAZI: We can't hear you. Are you ready for this? [APPLAUSE] Awesome. FELIPE HOFFA: Thank you
for that, [INAUDIBLE].. So welcome to the Secrets
of Scaling Interactive and Insightful Dashboards
with Data Studio and BigQuery. Pretty long title. I'm Felipa Hoffa. I've been at Google for
the last seven years. MINHAZ KAZI: Felipe is
the developer advocate for BigQuery, and
I'm Minhaz Kazi. I'm the developer advocate
for Google Data Studio, and I've been here for a year. FELIPE HOFFA: Yes,
welcome to Google, Minhaz. MINHAZ KAZI: Thank you. It's been exciting. FELIPE HOFFA: Happy
to have you here. So a lot of people
know me because I've been doing BigQuery. I'm being the developer
advocate [INAUDIBLE] for the last five years. I love analyzing data. I love visualizing it. And we have a lot
of partners that help us visualize this
data, but there has always been an empty space in my life
until Data Studio showed up. True story. MINHAZ KAZI: Yes. Data Studio is Google's
business intelligence slash virtualization
platform that anyone can use for free to build
scalable, insightful dashboards. And well, BigQuery
makes Felipe happy, Data Studio makes me
happy, and together we would like to make
the audience happy. So here, we would like to
answer this question for you. How can you get your big data
ready to visualize and share with the world? And the answer to that is-- FELIPE HOFFA: Let's combine
BigQuery and Data Studio. But if you want to
combine them, there are certain rules, as when
you want to combine me and Minhaz on a stage. MINHAZ KAZI: Yes, it
doesn't always work. Sometimes, it does. Sometimes, it doesn't, and you
have to follow certain rules. FELIPE HOFFA:
That's right, so we want to share with you
our best secrets when using both together. MINHAZ KAZI: One thing
is that everything we talk about here today are
all information as of July 2018, so if you are looking at this
recording at a later date, please know that new
information might be available. So yeah. FELIPE HOFFA: That's right, so
YouTube people, the principles we're going to show
you here remain, but the relationship with
BigQuery and Data Studio will all improve, as my
relationship with Minhaz. MINHAZ KAZI: Hopefully. FELIPE HOFFA: Hopefully, yes. So let's get started. Let's do some examples. MINHAZ KAZI: So we can
look at this dashboard that you built a few days ago. FELIPE HOFFA: If
anyone likes football, I made this dashboard
with all the goals that happened last month. What I love about
this dashboard is that I was able to put it on
the middle of a Medium post. MINHAZ KAZI: So you
have a dashboard in the medium of a Medium post? FELIPE HOFFA: Exactly,
and it's interactive. So let's say you want to see the
goals from your favorite team, which would be? MINHAZ KAZI: Chile. FELIPE HOFFA: Chile. Yes, I am from Chile, but
Chile did not play this year. MINHAZ KAZI: Oh, OK. FELIPE HOFFA: Give
me another one. MINHAZ KAZI: I don't know. Brazil? FELIPE HOFFA: Brazil, cool. So this is interactive. You can show your
favorite team, and these were the goals of Brazil. MINHAZ KAZI: Felipe,
there are all these dots on the right side. What is the one
on the left side? FELIPE HOFFA: Oh, the one
on the left is an own goal. Belgium says obrigado,
but let's continue. MINHAZ KAZI: Sure, so
this is one dashboard that you built in Data
Studio from the creator and published it. People are using it. We have a few other examples. FELIPE HOFFA: Mm-hmm, Wikipedia. Who knows Wikipedia? If you don't know it,
look for it on Wikipedia. How many pages does Wikipedia
have in a month or this year? MINHAZ KAZI: Billions? FELIPE HOFFA: Millions? Billions? MINHAZ KAZI: Maybe. FELIPE HOFFA: Yeah, so
in this shared table, I have all of the
page views for 2018. And this table has a summary
of page views per hour, and this table has,
already, 31 billion rows, more than one terabyte of data. And visualizing this will
be pretty interesting, and we're going to do that next. MINHAZ KAZI: What else are
we doing to look at today? FELIPE HOFFA: Stack Overflow. You know Stack Overflow? MINHAZ KAZI: Yes. FELIPE HOFFA: Yes. MINHAZ KAZI: That's where
I spend most of my time. FELIPE HOFFA: [INAUDIBLE],,
so again, a lot of us spend a lot of time
on Stack Overflow, and I want to see the
most current trends. Where are all the
page views going? And again, a publisher
dashboard, a medium, it's interactive. Let's look for
your favorite tag. MINHAZ KAZI: Angular? FELIPE HOFFA: Angular? Angular.js? MINHAZ KAZI: Let's take the-- FELIPE HOFFA: One
of the top tags? Is it going up or down? MINHAZ KAZI: Show
of hands-- who here thinks Angular is going up? And who here thinks
Angular is going down? FELIPE HOFFA: Oh, come on. Angular.js is going down. But wait a minute-- right below Angular.js,
there is a tag. MINHAZ KAZI: Different
tag called Angular. FELIPE HOFFA: Angular, yes. MINHAZ KAZI: Let's
try that one out. FELIPE HOFFA: So the
Angular team switched tags. MINHAZ KAZI: Everyone is right. FELIPE HOFFA: Exactly,
everyone is right. It's going down and up. But Angular as a technology
is going up and to the right. MINHAZ KAZI: Cool. FELIPE HOFFA: Which
is pretty good. MINHAZ KAZI: So we have
also other datasets, like Google Analytics
for Firebase. If you have your own data for
Google Analytics for Firebase for your app, you
can visualize that by putting the data in BigQuery
and pulling out that into Data Studio and having a dashboard. FELIPE HOFFA: Exactly. So it's pretty good. You can also use these
techniques for private data, and we're going to show
you a little bit of that. MINHAZ KAZI: Yes,
and the last dataset that we have a very
interesting one. It's the Chrome UX report. The Chrome UX team
has anonymized data up for all the Chrome users who-- for you can basically see, for
five million different URLs, the actual website usage
performance, like how fast does it take to-- for that website to load. And you have that data for
five million different URLs, and you can look at it
from different countries, by different connection types,
and by different devices. FELIPE HOFFA: So this
is millions and millions of observations,
anonymized and shared with BigQuery by
the Chrome team, and you can look at
three million websites? MINHAZ KAZI: Five million. FELIPE HOFFA: Five
million websites. MINHAZ KAZI: And you can have a
dashboard that looks like this. FELIPE HOFFA:
[INAUDIBLE],, and here, you can see how the Google's
homepage performance changed through time. MINHAZ KAZI: Yes, so from
October 2017 to February, most of the times it
was 80% people had a fast performance for the
first contentful paint, but then something
happened in March. It went down, and then it
slowly went back up again in June 2018. FELIPE HOFFA: There
was a regression. Someone fixed that pretty fast. But the fun thing
is, on one hand, it's not easy to
write these queries, because how do you
query anonymized data ? It's a little different. But you can make everything easy
for users with a Data Studio report. MINHAZ KAZI: Yes,
and we will show you how you can see the same
thing for your own website. FELIPE HOFFA: Cool. MINHAZ KAZI: So, Felipe,
these are all big data sets. FELIPE HOFFA: Mm-hmm. MINHAZ KAZI: I have
a question for you. What is big data? FELIPE HOFFA: Do you know what
we call big data at Google? MINHAZ KAZI: No. What? FELIPE HOFFA: We call it data. MINHAZ KAZI: I see
what you did there. FELIPE HOFFA: Yes. MINHAZ KAZI: All right,
I can get behind that. FELIPE HOFFA: We'll have data. We deal with huge
datasets all the time. MINHAZ KAZI: So what do
you use to deal with data? FELIPE HOFFA: We have
amazing tools inside Google. We share them with the world. One of them is BigQuery. We use it to analyze
terabytes of data. Everyone here is
familiar with BigQuery? OK, so many hands. I think you might be
at a Google conference. So it's-- BigQuery's awesome. You can analyze a lot. You can use SQL. It's scalable. It's always on. It operates with your
favorite data science tools, and you can share data. Everyone has a free terabyte
for queries every month, and it's ready for you to try. MINHAZ KAZI: Awesome. So maybe we can see an example? FELIPE HOFFA: Yeah, let's look
at an example of BigQuery. Page views of
Wikipedia-- we were talking about how many page
views we have in a month. Oh. MINHAZ KAZI: Oh. FELIPE HOFFA: This is how
we keep our website secure, with a little [INAUDIBLE]. And we are here on the
new UI for BigQuery. For those that have not
tried it, it's pretty cool. MINHAZ KAZI: So we-- FELIPE HOFFA: Let's compare
what page got more page views? Which [INAUDIBLE]
should we compare? MINHAZ KAZI: We were
talking about the World Cup. Maybe we can look at
France and Croatia. FELIPE HOFFA: OK, let's
compare France and Croatia. MINHAZ KAZI: We have
this query saved. We'll open up the query. FELIPE HOFFA: The
query's ready there. Cool. MINHAZ KAZI: And
run the queries. FELIPE HOFFA: Which one
got more page views-- France or Croatia? MINHAZ KAZI: Who here thinks
France got more page views? FELIPE HOFFA: Croatia? MINHAZ KAZI: All right, Croatia? FELIPE HOFFA: We have
the results there, and Croatia got eight
times more views-- 5 million views. France was less than a million. MINHAZ KAZI: And this says
data for last one month around. FELIPE HOFFA: Exactly. So the question now
is, is this related to the World Cup or not? MINHAZ KAZI: We don't
know that, because we have to look into the data. This is just one number. If you can do this quickly
maybe by day or by hour, that would give us more insight. FELIPE HOFFA: Let's try
to find some correlation. So can you visualize? MINHAZ KAZI: So let's
try to see the query. FELIPE HOFFA: OK, how-- MINHAZ KAZI: We will
run the query for-- by hour. FELIPE HOFFA: Let's get
the results hour by hour. MINHAZ KAZI: And we'll run
this query in the Query for France and Croatia. FELIPE HOFFA: We have the
results hour by hour here. MINHAZ KAZI: So the
problem with this is these are a lot of numbers,
and from here, you don't really know what's going on. You can't compare. This is where you
need visualization. FELIPE HOFFA: Here
comes the question. How do we visualize the results? What's the easiest way? MINHAZ KAZI: Well,
you use Data Studio. FELIPE HOFFA: Let's
use Data Studio. MINHAZ KAZI: So Data Studio
is Google's, like I said, VI and visualization
platform that lets you build engaging and-- engaging reports and dashboards. It works like other
GSuite applications like Sheets or Docs or Slides,
so you can cooperatively build dashboards. It's a turnkey
publishing platform, which means you just turn
the key, and it works. That's it. It has built in
permissions model, and it is easy and
completely free to use. FELIPE HOFFA: Free. MINHAZ KAZI: Free. FELIPE HOFFA: Cool. MINHAZ KAZI: [INAUDIBLE] FELIPE HOFFA: Let's try it out. MINHAZ KAZI: All right, so we
will take the same query you had or this one. So copy this query. And go into Data Studio,
create a new data source. And there are a
lot of characters in Data Studio that will let
you connect to different data sources. So we will connect to Big
Query, create a custom query, take our project. FELIPE HOFFA: And we choose a
project, the building project. MINHAZ KAZI: And we will
paste the query we had. And we were looking
at only two countries. FELIPE HOFFA: Let's look
at all of the countries. MINHAZ KAZI: All the
countries in the world FELIPE HOFFA: OK,
let's do that one. MINHAZ KAZI: Let's do that. So-- FELIPE HOFFA: And every
country that participated. Let's see which one got more
page views hour by hour. MINHAZ KAZI: Yes. So we will [INAUDIBLE] this. And we know that
this is hour by hour. So we'll pick this hourly. FELIPE HOFFA: Cool. MINHAZ KAZI: And
create dashboard. So what Data Studio
is doing now is creating a live
connection to Big Query and pulling in the data. FELIPE HOFFA: All right,
so, yeah, Data Studio has thousands of connectors? MINHAZ KAZI: Almost
there-- hundreds. FELIPE HOFFA: Hundreds. And one of them is a
connector to Big Query. MINHAZ KAZI: Right. FELIPE HOFFA: And people can
keep writing more connectors until we have samples. MINHAZ KAZI: Yes. FELIPE HOFFA: Cool. This is the-- now
we're visualizing the results of this query. MINHAZ KAZI: We'll
make a line chart. FELIPE HOFFA: A
timeline, hour by hour. And Data Studio is now going
all the way to Big Query to get the results. MINHAZ KAZI: [INAUDIBLE] again
and then pulling in the data. So here, you can see the
results for all the countries. FELIPE HOFFA: OK, but I
only see 10 countries. MINHAZ KAZI: Data Studio is just
picking you 10 to begin with. You can just change this number. You can put in any number
of countries you want. For example, I'll just
put in a filter here. FELIPE HOFFA: Yeah, show
me just France and Croatia. MINHAZ KAZI: All right,
so filter is here. Review it again so it's
now again populating the value for the filters. We will pick Croatia and France. FELIPE HOFFA: France. We already know that
Croatia got more page views, but when did they
get the page views? MINHAZ KAZI: Yes. FELIPE HOFFA: This
is taking some time because Data Studio is going
all the way to Big Query, analyzing our huge table. MINHAZ KAZI: Every time
we make the changes, it will make a new query
and pull in the data again from the query. So as you can see, Croatia
got these huge spikes at different times. And let's see at what time. This is around 9:00 PM. And this is UTC. FELIPE HOFFA: UTC. So it's when the
games were playing. So-- MINHAZ KAZI: This is basically
when the semifinal happened and the final happened. So all the spikes
Croatia is getting is exactly when either
they were winning or they're playing the finals. FELIPE HOFFA: Exactly. So correlation is not
causation, but at least we can see that that's when Croatia
was getting all the page views. And probably they were getting
more page views than France because less people
know what is Croatia. But now they do. Oh, that's why they
go to Wikipedia. Now, this was cool,
but a little slow, because every time we're
going to Big Query, we're doing a round trip. So can you make it fast? MINHAZ KAZI: You can. There are different
ways of making it fast, and we are introducing a new
feature in Data Studio shortly. It's not available
right now, so this is like a sneak peek preview. We already have a
feature called Explore that lets you look at
data in a more visual way. And you can drag and
drop and visualize it. And along with that, you'll be
able to materialize the data. Or whatever data set you're
pulling in, you can take that-- Data Studio can take the data
and materialize inside GCS and pull in data from there. And we can probably
show a quick of that. FELIPE HOFFA: Let's show a quick
demo of that future feature. MINHAZ KAZI: So we will
go back to Big Query and we will run this
query inside the query. FELIPE HOFFA: Again,
the new Big Query UI-- try it out, now on
beta, open for everyone. MINHAZ KAZI: Running the query. And I believe the query ran. And then you have this button
called Explore in Data Studio. Once you think this
button, it will take you immediately
from Big Query into Data Studio with the
query that you just selected and let you explore this data. And we can try to recreate
what we did earlier. So we will just get rid
of that record count. And we will make this-- FELIPE HOFFA: Edit by hourly. By default, it makes it daily. But let's look at things hourly. MINHAZ KAZI: Time series
chart and [INAUDIBLE] views. And this should recreate
what we did in earlier but in the Explore view. So it's pulling in the data. And what will happen
is once you do this, you'll get a nice button
here that will let you materialize the data set. So you'll be able
to click the button. And then that button will
basically take that data and put it in Data Studio. And then you'll be
able to query it very fast from within Data Studio. FELIPE HOFFA: Yes, and you
will stop doing the round trips to Big Query. MINHAZ KAZI: So where it takes
maybe five or four seconds right now, it will probably
take less than a second, half a second, to materialize. So that's how basically
Materialize would work. It would work like just
any other data source. FELIPE HOFFA: Yes, I can't wait
until we get the Materialize button. Will it have some limitations? MINHAZ KAZI: It will. So it's a feature
still being developed. It's not available right now. The workflow probably
might be a bit different. There will be a 100-megabyte
limit initially, so because the data will be
materialized in cloud storage. You might also get some
limitations around scheduling. You'd have to manually
update the data, or there might be
scheduling options available where you can set up for
that data to be updated every maybe one day or hourly. So we're still looking into it. FELIPE HOFFA: So
that a great feature. You can write your queries. It will materialize it. You will schedule them. But what if I need
a solution now? MINHAZ KAZI: So you want
to do scheduling today. FELIPE HOFFA: Yes. MINHAZ KAZI: I believe you
have a solution for that. FELIPE HOFFA: Yes, because
as I showed you before, we have published
these dashboards online and we wanted a solution that
didn't go through Big Query every time. So let's show them. This is a Google spreadsheet
developed by Andy Lopez, one of our teammates. And in this sheet, I
just can copy any query. And behind it, we
have App Script. MINHAZ KAZI: Who here
knows about App Script? Awesome. So if you don't know
about App Script, App Script is Google's
scripting language that is a subset of ES5 JavaScript. You can code on the
web UI, or you can also code locally and push
the code into the web UI. It works very well with
other Google technologies like Docs, Sheets,
Slides, and Google APIs, and services like Big Query. FELIPE HOFFA: Exactly. So with a few lines
of code, I can get App Script to query Big Query. And with no code, I
can schedule App Script to go there every hour, every
day, whenever I want to run it. MINHAZ KAZI: Awesome. So what do you do with
this App Script thing? FELIPE HOFFA: [INAUDIBLE] I
scheduled it to work every hour and I added three lines of code. So App Script would materialize
the results of my query into Google Cloud Storage. This is all it takes to
materialize to Google Cloud storage. And then Data STudio can
read the data straight from Google Cloud Storage. MINHAZ KAZI: Yes, that is true. So it will work with scheduling. However, there would be
some problems with that. One is you still get limited
by that 100-megabyte dataset in cloud storage. And this data is not indexed. It's like a flat file. FELIPE HOFFA: Exactly. You get pretty fast
results with small files, but bigger files will not
give you indexed results. So for that, there
are more solutions. You can connect Data
Studio to more sources. In this case, for a bigger
dashboard, I use CloudSQL. Let me show you,
then, the dashboard. MINHAZ KAZI: Sure. So this is a dashboard that
you haven't published yet? FELIPE HOFFA: Exactly. I would publish it soon. Let me show you a preview. This is, again, Stack Overflow. But in this case, I'm looking
at the trends question by question for every question
in Stack Overflow so you can see what is
trending, what are the most important questions
right now, what questions are not needed anymore. And this one had the
questions for Big Query. One of the top questions of this
quarter and two quarters ago was getting the application
credential for Python. Another big question is, how
come we cast in Big Query. Some people are lost. They go to Stack Overflow
and they keep going there. On the other side, you can
see the trending question, things that are new questions
for our users, how to move-- MINHAZ KAZI: Straight to date. FELIPE HOFFA: Yeah, straight
to date in Big Query. And the other side
are questions that are not that important now. And as you can see,
this is a lot of data. So put it in a database that
gives me index of result is a good way to serve this. MINHAZ KAZI: So what is this
trend line on the left side? FELIPE HOFFA: Oh yeah,
a couple of tricks when using Data Studio. In this case, I managed
to get a [INAUDIBLE].. You can change the tag
for anything else-- let's say, TensorFlow. MINHAZ KAZI: TensorFlow? All right. FELIPE HOFFA: Or JavaScript. Let's see if JavaScript is
trending up or down on-- MINHAZ KAZI: Let's
do JavaScript. FELIPE HOFFA: OK, anything. We can just jump anywhere here. And you can see that JavaScript
is not getting any more views, but you can get this
little thread here. And you can also see
those little lines there in each line. MINHAZ KAZI: How
do you do those? FELIPE HOFFA: Oh, those we build
using just some SQL magic to-- and if you want to
see how we made them, we did this as a collaborative
project on Stack Overflow. So we have-- looked for
[INAUDIBLE] in Big Query, you will find my answer and
[INAUDIBLE] and Mikhail's. We got there. MINHAZ KAZI: They don't
want to know about SQL. They just want to
drag and drop and make beautiful visualizations. FELIPE HOFFA: Yeah, but this is
a pretty cool trend line here. MINHAZ KAZI: So can you tell us
what you exactly did behind us? How did you set up
your data pipeline? FELIPE HOFFA: Oh, yes. Let's go back to how we
made this with CloudSQL. Basically, I had Big Query. And in Big Query ran my huge
analysis reducing the data. I store it in cloud storage. That can be imported
into CloudSQL, which is MySQL in this case. And then Data
Studio can read it. MINHAZ KAZI: OK, so you
have this whole pipeline where you are getting
data from Big Query, putting it in cloud storage. And from cloud storage, it's
getting imported into CloudSQL. And Data Studio just uses its
native connector to CloudSQL to pull in the data. And the data is fast
because it's indexed. And data studio has a native
connector to CloudSQL. You don't have to set
up the firewall rules. It just works out of the box. FELIPE HOFFA: Exactly. So it's pretty easy. And then I could put any
other catch in the middle. CloudSQL was just pretty
easy for me for this case. MINHAZ KAZI: OK. FELIPE HOFFA: What
about private data? Because all of these, we've
been looking at public datasets. But I want to bring these
principles also to my own data. MINHAZ KAZI: Sure. We can do that. Like, we looked at Stack
Exchange data, Wikipedia data. But you might have private
data where you want to use the data with yourself. Or maybe it's an
organizational data where do you want multiple
people within the organization to look at the same data. From there, the question
also becomes that, hey, can I use templates? I have my private
data and I want to create a template where
all my internal users can use the same template with my data. Or can I create a template
where different private users can use the same
template with their data? And the thing is,
yes, it is possible. Data Studio lets you use all
the reports as templates. And we have a Template Gallery. If you go to
datastudio.google.com/gallery, there's a huge gallery with all
different kinds of templates available. And we will look at one example. So this is the example for our
Google Analytics for Firebase. Let's say you have
your Google Analytics for Firebase data in Big Query. FELIPE HOFFA: [INAUDIBLE]. So this is a sample
visualization with someone else's data. MINHAZ KAZI: Yes, this
is like sample data. And you have your own
dataset in Big Query. And you want to use that
dataset with this dashboard. So the way you do it is you'd
need the name of your project. So I will just take an
example because I don't have my own Firebase project. FELIPE HOFFA: That's
the name of the project you want to visualize, one
you have permissions to. MINHAZ KAZI: I'll go back
and all I have to do is use-- click-- click on this Use Template. And I will change and
create a new source. I know the data is
in Big Query and I know it's a shared project. If you're doing it
for your own project, you might have to go
through your My Projects and go through that flow. And I'll pick a project,
the shared project name. The data set is Analytics
and the table is Events. FELIPE HOFFA: Boom. MINHAZ KAZI: Click on
Events and click Connect. And that should be it. FELIPE HOFFA: Exactly. You created a connector
for the pretty template. And now any template
that you look-- use data that you might
have the similar shape, you can use it for your
own data with Data Studio. MINHAZ KAZI: Yes. So this is the sample data
set we have on Big Query. It's using that sample data set. But I just connected that
data set to this template. And it's working. FELIPE HOFFA: Cool,
and this is working. And it's connecting
straight to Big Query. And now we're in
a similar place. Like, this is great
for a data analyst. MINHAZ KAZI: Yes. FELIPE HOFFA: But if I
want to publish this-- MINHAZ KAZI: So if you
want to make this scalable, for an analyst,
what would happen is let's say analyst
would probably create a direct connection
from Big Query to Data Studio. So you write a query. You create a dashboard and you
have the direct connection. And for the analyst,
analyst this can create a dashboard
like this, for example, based on Chrome UX dataset. But they would incur
very little cost, depending on how many
queries they're doing. So if one analyst does 300
completely unique queries, so it doesn't use any of the
cache in Big Query or Data Studio, it's completely
unique queries, and on average, 30
gigabytes of query size, they will use 9 terabytes
of big data, Big Query data. And then that would
cost them $40. FELIPE HOFFA: Yeah, for
an analyst, pretty cool. Now you are querying 300
times, 30 gigabytes of data. That might be a big table,
but the price is reasonable. Everything is working fine. But then you want
to publish this. MINHAZ KAZI: That's where you
might face some difficulties, but-- because let's say you
take the same dashboard. You put it on a
website, embed it somewhere in your Medium post. And then suddenly, hundreds
of people are looking at it. So lets just assume,
hypothetical scenario, 500 visitors are
coming to your website, There are four different
queries in your dashboard and each user is looking at
the dashboard six times-- again, around 30
gigabytes of size. You end out with 350 terabytes
of data usage in Big Query. FELIPE HOFFA: So
the first thing you have to do when you start
playing with this is-- MINHAZ KAZI: You have to have
some kind of caching there. FELIPE HOFFA: Yes, or turn on? MINHAZ KAZI: Well, yes,
that's a good point. First line of defense is this. FELIPE HOFFA: Yes turn on your
cost controls in Big Query so you will not wake
up to any surprise. These are pretty easy to use. You just turn them
on and everything-- you will not wake
up to any surprise because your queries
will be contained. But them I still want to
put this dashboard online and I want to have all
my costs contained. MINHAZ KAZI: There
is a way to do it. FELIPE HOFFA: Mm-hm. MINHAZ KAZI: So what we
could do is everything that we just looked at, you
have this caching mechanism. You have templates and
you have scalability. We can put all this
together into one solution. And that is, how can your
users go from complex data to a scalable, cost-effective
dashboard in five clicks? By using community connectors. FELIPE HOFFA: Exactly. With a community
connector, we will be able to encapsulate
all of this logic without having the users need
to know all of these tricks. MINHAZ KAZI: Yes. Community connectors are,
again, written in App Script. They are connectors
for Google Data studio. You basically define
a few functions and then add your
connector to Data Studio. And then Data Studio can pull
the data through the connector. And you can implement
your own logic for caching or permissions. You can add all of that. FELIPE HOFFA: Yeah,
with App Studio, you have the flexibility to
build any logic you want. MINHAZ KAZI: Yes. For example, we took
the Chrome UX data set. From the Chrome UX data
set, you can potentially build a dashboard
that looks like this. But we wanted to give the
user more flexibility. So what we did is we
created a workflow that looks kind of like this. We have the initial
dataset in Big Query. So every month, we do an
initial level of aggregation, and we create a smaller
Big Query table. FELIPE HOFFA: Exactly. You go from the 30
gigabytes table to-- MINHAZ KAZI: Less
than 2 gigabytes. FELIPE HOFFA: Less than 2
gigabytes-- that's much better. MINHAZ KAZI: Yes, so that
is our first level of cache. Then for every URL,
once per month, we cache that URL's data
into Cloud Firestore, which again has a free tier. And from Cloud Firestore,
every time someone wants to look at a dashboard
containing that URL, we pull in that data
into Apps Script Cache. Apps Script Cache
is not permanent. It will be alive maximum
from maybe a few minutes to six hours. So if you're using the connector
to connect to that data, you can get very
fast performance, but it's not persistent. So you'll have to pull
it again from Firestore. So we take all of that. We take the cache data and we
add it to a custom template. And then we have this
community connector that you can use in Data Studio. FELIPE HOFFA: Yeah,
that so Data Studio connects to the connector. You encapsulate a lot
of the logic there, and it goes on every level
of cache that you want. MINHAZ KAZI: Exactly. FELIPE HOFFA: But as a user, I
want to know my own websites. How do I use this? MINHAZ KAZI: So if
you're the user, you don't see any of this. You don't have to worry
about any of this. All you get is one URL. So you go to this URL. And we have hyperlinked it. We go to this URL. That's it. You get a new
interface where you have to add in your website. So let's try Wikipedia. FELIPE HOFFA: Wikipedia or
any of your favorite websites. We have 5 million URLs there. MINHAZ KAZI: So
we'll add Wikipedia. We will add these
origins to be modified. Connect, allow. FELIPE HOFFA: And
this is a public URL where you are right now. MINHAZ KAZI: Yes,
this is [INAUDIBLE].. And so what this is doing is
it's going into Big Query, pulling in the data. So right now, it went into
Big Query, pulled in the data, put it in Cloud Firestore,
pulled the data from Firestore, put in Apps Strip Cache. And it's now ready to
create the dashboard. So we create the
report, and we already have a template attached to it. So the user makes-- puts in the URL,
makes five clicks. And they end up with their
own dashboard for the URL. FELIPE HOFFA: Ooh. MINHAZ KAZI: And this
is their own dashboard. It's not like a shared
dashboard that different people are viewing. It's their dashboard. They can go in. They can change
things, edit things. They can do whatever
basically they want. FELIPE HOFFA: [INAUDIBLE] MINHAZ KAZI: For example, I
will put in a comparison here. So I can take the same thing. I will check the URL and take
this one, copy it, paste it. So I have two things side
by side for the same URL. And maybe I will compare
with a different website. FELIPE HOFFA: Any website? MINHAZ KAZI: Any
website, anyone? FELIPE HOFFA: Any suggestion. MINHAZ KAZI: Google dot-- FELIPE HOFFA: Google.com, yes. MINHAZ KAZI: [INAUDIBLE]
dot google.com. And the users can do this. We have our-- this is the
user just went into this. They can view the same thing. So you have side-by-side
comparison of your website with any other
website that you pick. And this is
real-life usage data. It's not even like sample data. This is total Chrome
anonymous usage statistics. FELIPE HOFFA: And in
this case, the connector is using your credentials. MINHAZ KAZI: Yes. FELIPE HOFFA: Like-- and user
can create the new instance of the connector. The credentials are
privately stored there? MINHAZ KAZI: Yes, the
credentials we're using is on a service account. So whoever created the
connector is bearing the cost. So the users who are
using this connector, they don't have to worry about
having a Big Query account, or they don't have
to worry about cost, because they're not
incurring any cost. FELIPE HOFFA: And you
don't worry about the cost either because you built
all of these cutting layers. MINHAZ KAZI: Yes, so we
can have a look at the cost now if we go back. Initially, we were
incurring this much cost and this much data usage
for this connector. But if we implement
the new model for caching, what
ends up happening is it's the same 500 users. But you get less than one
query for dashboard view because you have
everything cached. And your query size
becomes 2 gigabytes, so it's less than 1 terabyte
of Big Query data usage, which is in the free tier. And it's less than 10
gigabytes of Firebase usage, with is, again,
in the free tier. So your cost becomes zero. That's how you can
encapsulate all of this and create a complete solution. But what about permissions? Oh, good thing is if you
want to create new community connectors, we
have this Codelab. You can go to this
Codelab and learn how to create your own community
connector using Apps Script. It should take you about
15 minutes to 30 minutes. And you can create
connectors to any Web API or external services. FELIPE HOFFA: Yes, and
it's an amazing Codelab because Minhaz wrote it. So let's talk a little
bit about permissions because we have private data. We have enterprise data. There are levels where we
want to secure our data. So what can we do about it? MINHAZ KAZI: And I believe
Big Query has some permissions inside of-- FELIPE HOFFA: Exactly. Big Query has the
Google permissions model of who can read, write,
or modify tables. And you can make your data
available at that level. You can, if you go to the next
screenshot, you can see here, like, you can have users by
email, or create Google Groups to make easy to manage
them, or with any domain, or for all of them to get
the user for public data. Or even you can
create views that allow people to see your data
but only through a view, which is a pretty cool model. But you don't even need to
show your data on Big Query if you do it on Data Studio. MINHAZ KAZI: Yes,
Data Studio also has several layers
of permissions. If you create a
Data Studio report, you can share it with
other people just using the drive permissions. So you can share it with
a selected group of users. You can share it within
your entire organization. You can also share it with
everyone who has the URL. And you can share it either
for viewing permissions or for editing permissions. You can also take this
report and embed it. So let's say you shared
this with everyone, and then you can take-- that report and embed it in a
Medium post like Felipe did. You also get permissions
at the data source level. So when you create a data source
and you add the data source to a report, you can put in
either viewer's credentials or owner's credentials. What that means is if
I create the report and I put in owner's
credentials and then I shared that report
with Felipe and Felipe views that dashboard,
Felipe will see my data. But if you enable
viewer's credentials, when Felipe views
that dashboard, Felipe will see Felipe's
data and I will see my data. FELIPE HOFFA: And I
can control if people want to see my data that I share
with them or if I can only-- they can only see data they
have the permission to see. The permissions get passed all
the way through Data Studio into your source. MINHAZ KAZI: Yes,
and the good thing about community connectors
is that using community connectors, you can implement
any kind of permissions model that you have. If you have your own permissions
and own credential setup, you can also flow that
through community connectors and add that as a layer
on top of all of this. FELIPE HOFFA: Nice. MINHAZ KAZI: That
is pretty cool. FELIPE HOFFA: That's
how our partners create their own connections. MINHAZ KAZI: Yes,
most of our partners have their own credentials
which will flow through this. And then they will-- they basically know which
user is using the connector. And then they can show that
user their specific data. We went through all
of these four data sets-- the Wikipedia page
views, Stack Overflow trends, Google Analytics for Firebase,
and Chrome UX reports. That was pretty interesting
for us, I guess. FELIPE HOFFA: Yep. Hopefully, it was
interesting for you, too. Let's go to our takeaways. So Big Query is
really, really awesome to analyze terabytes of data. MINHAZ KAZI: And Data
Studio is more awesome because it lets you dashboard
all of your data for free. FELIPE HOFFA: Exactly. But if you want to
put them together, you want some rules in between. MINHAZ KAZI: Yes, there
needs to be some something in between-- the caching
layer, something in between. FELIPE HOFFA: Exactly. And there are ways that you can
do this that Materialize will-- you will have soon and
will be pretty easy to use. MINHAZ KAZI: And you
can use Cloud SQL. You can use
community connectors. Adding a caching
layer will let you avoid having, like, exploding
number of queries in Big Query. FELIPE HOFFA: Yeah. MINHAZ KAZI: So
that is basically the end of our presentation. If you want to know more about
Big Query or Data Studio, you can view the subreddits. You can ask questions
on Stack Overflow. FELIPE HOFFA: If you want
to know more about Minhaz, follow him on Twitter. MINHAZ KAZI: If you can't
get enough of Felipe, follow him on Twitter. FELIPE HOFFA: Your
feedback is our data. MINHAZ KAZI: And we love data. FELIPE HOFFA: Yes. Thank you very much.
I really wish BigQuery had more documentation tailored to non-developers. I'd love to start leveraging it when I build out my GDS reports/templates but there has been a really steep learning curve to get started.