Scaling Interactive and Insightful Dashboards with Data Studio and BigQuery (Cloud Next '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I really wish BigQuery had more documentation tailored to non-developers. I'd love to start leveraging it when I build out my GDS reports/templates but there has been a really steep learning curve to get started.

👍︎︎ 1 👤︎︎ u/north2future 📅︎︎ Aug 04 2018 🗫︎ replies
Captions
[THEME MUSIC] FELIPE HOFFA: Are you ready for this? AUDIENCE: Yeah. Yes, excellent. Welcome to-- MINHAZ KAZI: We can't hear you. Are you ready for this? [APPLAUSE] Awesome. FELIPE HOFFA: Thank you for that, [INAUDIBLE].. So welcome to the Secrets of Scaling Interactive and Insightful Dashboards with Data Studio and BigQuery. Pretty long title. I'm Felipa Hoffa. I've been at Google for the last seven years. MINHAZ KAZI: Felipe is the developer advocate for BigQuery, and I'm Minhaz Kazi. I'm the developer advocate for Google Data Studio, and I've been here for a year. FELIPE HOFFA: Yes, welcome to Google, Minhaz. MINHAZ KAZI: Thank you. It's been exciting. FELIPE HOFFA: Happy to have you here. So a lot of people know me because I've been doing BigQuery. I'm being the developer advocate [INAUDIBLE] for the last five years. I love analyzing data. I love visualizing it. And we have a lot of partners that help us visualize this data, but there has always been an empty space in my life until Data Studio showed up. True story. MINHAZ KAZI: Yes. Data Studio is Google's business intelligence slash virtualization platform that anyone can use for free to build scalable, insightful dashboards. And well, BigQuery makes Felipe happy, Data Studio makes me happy, and together we would like to make the audience happy. So here, we would like to answer this question for you. How can you get your big data ready to visualize and share with the world? And the answer to that is-- FELIPE HOFFA: Let's combine BigQuery and Data Studio. But if you want to combine them, there are certain rules, as when you want to combine me and Minhaz on a stage. MINHAZ KAZI: Yes, it doesn't always work. Sometimes, it does. Sometimes, it doesn't, and you have to follow certain rules. FELIPE HOFFA: That's right, so we want to share with you our best secrets when using both together. MINHAZ KAZI: One thing is that everything we talk about here today are all information as of July 2018, so if you are looking at this recording at a later date, please know that new information might be available. So yeah. FELIPE HOFFA: That's right, so YouTube people, the principles we're going to show you here remain, but the relationship with BigQuery and Data Studio will all improve, as my relationship with Minhaz. MINHAZ KAZI: Hopefully. FELIPE HOFFA: Hopefully, yes. So let's get started. Let's do some examples. MINHAZ KAZI: So we can look at this dashboard that you built a few days ago. FELIPE HOFFA: If anyone likes football, I made this dashboard with all the goals that happened last month. What I love about this dashboard is that I was able to put it on the middle of a Medium post. MINHAZ KAZI: So you have a dashboard in the medium of a Medium post? FELIPE HOFFA: Exactly, and it's interactive. So let's say you want to see the goals from your favorite team, which would be? MINHAZ KAZI: Chile. FELIPE HOFFA: Chile. Yes, I am from Chile, but Chile did not play this year. MINHAZ KAZI: Oh, OK. FELIPE HOFFA: Give me another one. MINHAZ KAZI: I don't know. Brazil? FELIPE HOFFA: Brazil, cool. So this is interactive. You can show your favorite team, and these were the goals of Brazil. MINHAZ KAZI: Felipe, there are all these dots on the right side. What is the one on the left side? FELIPE HOFFA: Oh, the one on the left is an own goal. Belgium says obrigado, but let's continue. MINHAZ KAZI: Sure, so this is one dashboard that you built in Data Studio from the creator and published it. People are using it. We have a few other examples. FELIPE HOFFA: Mm-hmm, Wikipedia. Who knows Wikipedia? If you don't know it, look for it on Wikipedia. How many pages does Wikipedia have in a month or this year? MINHAZ KAZI: Billions? FELIPE HOFFA: Millions? Billions? MINHAZ KAZI: Maybe. FELIPE HOFFA: Yeah, so in this shared table, I have all of the page views for 2018. And this table has a summary of page views per hour, and this table has, already, 31 billion rows, more than one terabyte of data. And visualizing this will be pretty interesting, and we're going to do that next. MINHAZ KAZI: What else are we doing to look at today? FELIPE HOFFA: Stack Overflow. You know Stack Overflow? MINHAZ KAZI: Yes. FELIPE HOFFA: Yes. MINHAZ KAZI: That's where I spend most of my time. FELIPE HOFFA: [INAUDIBLE],, so again, a lot of us spend a lot of time on Stack Overflow, and I want to see the most current trends. Where are all the page views going? And again, a publisher dashboard, a medium, it's interactive. Let's look for your favorite tag. MINHAZ KAZI: Angular? FELIPE HOFFA: Angular? Angular.js? MINHAZ KAZI: Let's take the-- FELIPE HOFFA: One of the top tags? Is it going up or down? MINHAZ KAZI: Show of hands-- who here thinks Angular is going up? And who here thinks Angular is going down? FELIPE HOFFA: Oh, come on. Angular.js is going down. But wait a minute-- right below Angular.js, there is a tag. MINHAZ KAZI: Different tag called Angular. FELIPE HOFFA: Angular, yes. MINHAZ KAZI: Let's try that one out. FELIPE HOFFA: So the Angular team switched tags. MINHAZ KAZI: Everyone is right. FELIPE HOFFA: Exactly, everyone is right. It's going down and up. But Angular as a technology is going up and to the right. MINHAZ KAZI: Cool. FELIPE HOFFA: Which is pretty good. MINHAZ KAZI: So we have also other datasets, like Google Analytics for Firebase. If you have your own data for Google Analytics for Firebase for your app, you can visualize that by putting the data in BigQuery and pulling out that into Data Studio and having a dashboard. FELIPE HOFFA: Exactly. So it's pretty good. You can also use these techniques for private data, and we're going to show you a little bit of that. MINHAZ KAZI: Yes, and the last dataset that we have a very interesting one. It's the Chrome UX report. The Chrome UX team has anonymized data up for all the Chrome users who-- for you can basically see, for five million different URLs, the actual website usage performance, like how fast does it take to-- for that website to load. And you have that data for five million different URLs, and you can look at it from different countries, by different connection types, and by different devices. FELIPE HOFFA: So this is millions and millions of observations, anonymized and shared with BigQuery by the Chrome team, and you can look at three million websites? MINHAZ KAZI: Five million. FELIPE HOFFA: Five million websites. MINHAZ KAZI: And you can have a dashboard that looks like this. FELIPE HOFFA: [INAUDIBLE],, and here, you can see how the Google's homepage performance changed through time. MINHAZ KAZI: Yes, so from October 2017 to February, most of the times it was 80% people had a fast performance for the first contentful paint, but then something happened in March. It went down, and then it slowly went back up again in June 2018. FELIPE HOFFA: There was a regression. Someone fixed that pretty fast. But the fun thing is, on one hand, it's not easy to write these queries, because how do you query anonymized data ? It's a little different. But you can make everything easy for users with a Data Studio report. MINHAZ KAZI: Yes, and we will show you how you can see the same thing for your own website. FELIPE HOFFA: Cool. MINHAZ KAZI: So, Felipe, these are all big data sets. FELIPE HOFFA: Mm-hmm. MINHAZ KAZI: I have a question for you. What is big data? FELIPE HOFFA: Do you know what we call big data at Google? MINHAZ KAZI: No. What? FELIPE HOFFA: We call it data. MINHAZ KAZI: I see what you did there. FELIPE HOFFA: Yes. MINHAZ KAZI: All right, I can get behind that. FELIPE HOFFA: We'll have data. We deal with huge datasets all the time. MINHAZ KAZI: So what do you use to deal with data? FELIPE HOFFA: We have amazing tools inside Google. We share them with the world. One of them is BigQuery. We use it to analyze terabytes of data. Everyone here is familiar with BigQuery? OK, so many hands. I think you might be at a Google conference. So it's-- BigQuery's awesome. You can analyze a lot. You can use SQL. It's scalable. It's always on. It operates with your favorite data science tools, and you can share data. Everyone has a free terabyte for queries every month, and it's ready for you to try. MINHAZ KAZI: Awesome. So maybe we can see an example? FELIPE HOFFA: Yeah, let's look at an example of BigQuery. Page views of Wikipedia-- we were talking about how many page views we have in a month. Oh. MINHAZ KAZI: Oh. FELIPE HOFFA: This is how we keep our website secure, with a little [INAUDIBLE]. And we are here on the new UI for BigQuery. For those that have not tried it, it's pretty cool. MINHAZ KAZI: So we-- FELIPE HOFFA: Let's compare what page got more page views? Which [INAUDIBLE] should we compare? MINHAZ KAZI: We were talking about the World Cup. Maybe we can look at France and Croatia. FELIPE HOFFA: OK, let's compare France and Croatia. MINHAZ KAZI: We have this query saved. We'll open up the query. FELIPE HOFFA: The query's ready there. Cool. MINHAZ KAZI: And run the queries. FELIPE HOFFA: Which one got more page views-- France or Croatia? MINHAZ KAZI: Who here thinks France got more page views? FELIPE HOFFA: Croatia? MINHAZ KAZI: All right, Croatia? FELIPE HOFFA: We have the results there, and Croatia got eight times more views-- 5 million views. France was less than a million. MINHAZ KAZI: And this says data for last one month around. FELIPE HOFFA: Exactly. So the question now is, is this related to the World Cup or not? MINHAZ KAZI: We don't know that, because we have to look into the data. This is just one number. If you can do this quickly maybe by day or by hour, that would give us more insight. FELIPE HOFFA: Let's try to find some correlation. So can you visualize? MINHAZ KAZI: So let's try to see the query. FELIPE HOFFA: OK, how-- MINHAZ KAZI: We will run the query for-- by hour. FELIPE HOFFA: Let's get the results hour by hour. MINHAZ KAZI: And we'll run this query in the Query for France and Croatia. FELIPE HOFFA: We have the results hour by hour here. MINHAZ KAZI: So the problem with this is these are a lot of numbers, and from here, you don't really know what's going on. You can't compare. This is where you need visualization. FELIPE HOFFA: Here comes the question. How do we visualize the results? What's the easiest way? MINHAZ KAZI: Well, you use Data Studio. FELIPE HOFFA: Let's use Data Studio. MINHAZ KAZI: So Data Studio is Google's, like I said, VI and visualization platform that lets you build engaging and-- engaging reports and dashboards. It works like other GSuite applications like Sheets or Docs or Slides, so you can cooperatively build dashboards. It's a turnkey publishing platform, which means you just turn the key, and it works. That's it. It has built in permissions model, and it is easy and completely free to use. FELIPE HOFFA: Free. MINHAZ KAZI: Free. FELIPE HOFFA: Cool. MINHAZ KAZI: [INAUDIBLE] FELIPE HOFFA: Let's try it out. MINHAZ KAZI: All right, so we will take the same query you had or this one. So copy this query. And go into Data Studio, create a new data source. And there are a lot of characters in Data Studio that will let you connect to different data sources. So we will connect to Big Query, create a custom query, take our project. FELIPE HOFFA: And we choose a project, the building project. MINHAZ KAZI: And we will paste the query we had. And we were looking at only two countries. FELIPE HOFFA: Let's look at all of the countries. MINHAZ KAZI: All the countries in the world FELIPE HOFFA: OK, let's do that one. MINHAZ KAZI: Let's do that. So-- FELIPE HOFFA: And every country that participated. Let's see which one got more page views hour by hour. MINHAZ KAZI: Yes. So we will [INAUDIBLE] this. And we know that this is hour by hour. So we'll pick this hourly. FELIPE HOFFA: Cool. MINHAZ KAZI: And create dashboard. So what Data Studio is doing now is creating a live connection to Big Query and pulling in the data. FELIPE HOFFA: All right, so, yeah, Data Studio has thousands of connectors? MINHAZ KAZI: Almost there-- hundreds. FELIPE HOFFA: Hundreds. And one of them is a connector to Big Query. MINHAZ KAZI: Right. FELIPE HOFFA: And people can keep writing more connectors until we have samples. MINHAZ KAZI: Yes. FELIPE HOFFA: Cool. This is the-- now we're visualizing the results of this query. MINHAZ KAZI: We'll make a line chart. FELIPE HOFFA: A timeline, hour by hour. And Data Studio is now going all the way to Big Query to get the results. MINHAZ KAZI: [INAUDIBLE] again and then pulling in the data. So here, you can see the results for all the countries. FELIPE HOFFA: OK, but I only see 10 countries. MINHAZ KAZI: Data Studio is just picking you 10 to begin with. You can just change this number. You can put in any number of countries you want. For example, I'll just put in a filter here. FELIPE HOFFA: Yeah, show me just France and Croatia. MINHAZ KAZI: All right, so filter is here. Review it again so it's now again populating the value for the filters. We will pick Croatia and France. FELIPE HOFFA: France. We already know that Croatia got more page views, but when did they get the page views? MINHAZ KAZI: Yes. FELIPE HOFFA: This is taking some time because Data Studio is going all the way to Big Query, analyzing our huge table. MINHAZ KAZI: Every time we make the changes, it will make a new query and pull in the data again from the query. So as you can see, Croatia got these huge spikes at different times. And let's see at what time. This is around 9:00 PM. And this is UTC. FELIPE HOFFA: UTC. So it's when the games were playing. So-- MINHAZ KAZI: This is basically when the semifinal happened and the final happened. So all the spikes Croatia is getting is exactly when either they were winning or they're playing the finals. FELIPE HOFFA: Exactly. So correlation is not causation, but at least we can see that that's when Croatia was getting all the page views. And probably they were getting more page views than France because less people know what is Croatia. But now they do. Oh, that's why they go to Wikipedia. Now, this was cool, but a little slow, because every time we're going to Big Query, we're doing a round trip. So can you make it fast? MINHAZ KAZI: You can. There are different ways of making it fast, and we are introducing a new feature in Data Studio shortly. It's not available right now, so this is like a sneak peek preview. We already have a feature called Explore that lets you look at data in a more visual way. And you can drag and drop and visualize it. And along with that, you'll be able to materialize the data. Or whatever data set you're pulling in, you can take that-- Data Studio can take the data and materialize inside GCS and pull in data from there. And we can probably show a quick of that. FELIPE HOFFA: Let's show a quick demo of that future feature. MINHAZ KAZI: So we will go back to Big Query and we will run this query inside the query. FELIPE HOFFA: Again, the new Big Query UI-- try it out, now on beta, open for everyone. MINHAZ KAZI: Running the query. And I believe the query ran. And then you have this button called Explore in Data Studio. Once you think this button, it will take you immediately from Big Query into Data Studio with the query that you just selected and let you explore this data. And we can try to recreate what we did earlier. So we will just get rid of that record count. And we will make this-- FELIPE HOFFA: Edit by hourly. By default, it makes it daily. But let's look at things hourly. MINHAZ KAZI: Time series chart and [INAUDIBLE] views. And this should recreate what we did in earlier but in the Explore view. So it's pulling in the data. And what will happen is once you do this, you'll get a nice button here that will let you materialize the data set. So you'll be able to click the button. And then that button will basically take that data and put it in Data Studio. And then you'll be able to query it very fast from within Data Studio. FELIPE HOFFA: Yes, and you will stop doing the round trips to Big Query. MINHAZ KAZI: So where it takes maybe five or four seconds right now, it will probably take less than a second, half a second, to materialize. So that's how basically Materialize would work. It would work like just any other data source. FELIPE HOFFA: Yes, I can't wait until we get the Materialize button. Will it have some limitations? MINHAZ KAZI: It will. So it's a feature still being developed. It's not available right now. The workflow probably might be a bit different. There will be a 100-megabyte limit initially, so because the data will be materialized in cloud storage. You might also get some limitations around scheduling. You'd have to manually update the data, or there might be scheduling options available where you can set up for that data to be updated every maybe one day or hourly. So we're still looking into it. FELIPE HOFFA: So that a great feature. You can write your queries. It will materialize it. You will schedule them. But what if I need a solution now? MINHAZ KAZI: So you want to do scheduling today. FELIPE HOFFA: Yes. MINHAZ KAZI: I believe you have a solution for that. FELIPE HOFFA: Yes, because as I showed you before, we have published these dashboards online and we wanted a solution that didn't go through Big Query every time. So let's show them. This is a Google spreadsheet developed by Andy Lopez, one of our teammates. And in this sheet, I just can copy any query. And behind it, we have App Script. MINHAZ KAZI: Who here knows about App Script? Awesome. So if you don't know about App Script, App Script is Google's scripting language that is a subset of ES5 JavaScript. You can code on the web UI, or you can also code locally and push the code into the web UI. It works very well with other Google technologies like Docs, Sheets, Slides, and Google APIs, and services like Big Query. FELIPE HOFFA: Exactly. So with a few lines of code, I can get App Script to query Big Query. And with no code, I can schedule App Script to go there every hour, every day, whenever I want to run it. MINHAZ KAZI: Awesome. So what do you do with this App Script thing? FELIPE HOFFA: [INAUDIBLE] I scheduled it to work every hour and I added three lines of code. So App Script would materialize the results of my query into Google Cloud Storage. This is all it takes to materialize to Google Cloud storage. And then Data STudio can read the data straight from Google Cloud Storage. MINHAZ KAZI: Yes, that is true. So it will work with scheduling. However, there would be some problems with that. One is you still get limited by that 100-megabyte dataset in cloud storage. And this data is not indexed. It's like a flat file. FELIPE HOFFA: Exactly. You get pretty fast results with small files, but bigger files will not give you indexed results. So for that, there are more solutions. You can connect Data Studio to more sources. In this case, for a bigger dashboard, I use CloudSQL. Let me show you, then, the dashboard. MINHAZ KAZI: Sure. So this is a dashboard that you haven't published yet? FELIPE HOFFA: Exactly. I would publish it soon. Let me show you a preview. This is, again, Stack Overflow. But in this case, I'm looking at the trends question by question for every question in Stack Overflow so you can see what is trending, what are the most important questions right now, what questions are not needed anymore. And this one had the questions for Big Query. One of the top questions of this quarter and two quarters ago was getting the application credential for Python. Another big question is, how come we cast in Big Query. Some people are lost. They go to Stack Overflow and they keep going there. On the other side, you can see the trending question, things that are new questions for our users, how to move-- MINHAZ KAZI: Straight to date. FELIPE HOFFA: Yeah, straight to date in Big Query. And the other side are questions that are not that important now. And as you can see, this is a lot of data. So put it in a database that gives me index of result is a good way to serve this. MINHAZ KAZI: So what is this trend line on the left side? FELIPE HOFFA: Oh yeah, a couple of tricks when using Data Studio. In this case, I managed to get a [INAUDIBLE].. You can change the tag for anything else-- let's say, TensorFlow. MINHAZ KAZI: TensorFlow? All right. FELIPE HOFFA: Or JavaScript. Let's see if JavaScript is trending up or down on-- MINHAZ KAZI: Let's do JavaScript. FELIPE HOFFA: OK, anything. We can just jump anywhere here. And you can see that JavaScript is not getting any more views, but you can get this little thread here. And you can also see those little lines there in each line. MINHAZ KAZI: How do you do those? FELIPE HOFFA: Oh, those we build using just some SQL magic to-- and if you want to see how we made them, we did this as a collaborative project on Stack Overflow. So we have-- looked for [INAUDIBLE] in Big Query, you will find my answer and [INAUDIBLE] and Mikhail's. We got there. MINHAZ KAZI: They don't want to know about SQL. They just want to drag and drop and make beautiful visualizations. FELIPE HOFFA: Yeah, but this is a pretty cool trend line here. MINHAZ KAZI: So can you tell us what you exactly did behind us? How did you set up your data pipeline? FELIPE HOFFA: Oh, yes. Let's go back to how we made this with CloudSQL. Basically, I had Big Query. And in Big Query ran my huge analysis reducing the data. I store it in cloud storage. That can be imported into CloudSQL, which is MySQL in this case. And then Data Studio can read it. MINHAZ KAZI: OK, so you have this whole pipeline where you are getting data from Big Query, putting it in cloud storage. And from cloud storage, it's getting imported into CloudSQL. And Data Studio just uses its native connector to CloudSQL to pull in the data. And the data is fast because it's indexed. And data studio has a native connector to CloudSQL. You don't have to set up the firewall rules. It just works out of the box. FELIPE HOFFA: Exactly. So it's pretty easy. And then I could put any other catch in the middle. CloudSQL was just pretty easy for me for this case. MINHAZ KAZI: OK. FELIPE HOFFA: What about private data? Because all of these, we've been looking at public datasets. But I want to bring these principles also to my own data. MINHAZ KAZI: Sure. We can do that. Like, we looked at Stack Exchange data, Wikipedia data. But you might have private data where you want to use the data with yourself. Or maybe it's an organizational data where do you want multiple people within the organization to look at the same data. From there, the question also becomes that, hey, can I use templates? I have my private data and I want to create a template where all my internal users can use the same template with my data. Or can I create a template where different private users can use the same template with their data? And the thing is, yes, it is possible. Data Studio lets you use all the reports as templates. And we have a Template Gallery. If you go to datastudio.google.com/gallery, there's a huge gallery with all different kinds of templates available. And we will look at one example. So this is the example for our Google Analytics for Firebase. Let's say you have your Google Analytics for Firebase data in Big Query. FELIPE HOFFA: [INAUDIBLE]. So this is a sample visualization with someone else's data. MINHAZ KAZI: Yes, this is like sample data. And you have your own dataset in Big Query. And you want to use that dataset with this dashboard. So the way you do it is you'd need the name of your project. So I will just take an example because I don't have my own Firebase project. FELIPE HOFFA: That's the name of the project you want to visualize, one you have permissions to. MINHAZ KAZI: I'll go back and all I have to do is use-- click-- click on this Use Template. And I will change and create a new source. I know the data is in Big Query and I know it's a shared project. If you're doing it for your own project, you might have to go through your My Projects and go through that flow. And I'll pick a project, the shared project name. The data set is Analytics and the table is Events. FELIPE HOFFA: Boom. MINHAZ KAZI: Click on Events and click Connect. And that should be it. FELIPE HOFFA: Exactly. You created a connector for the pretty template. And now any template that you look-- use data that you might have the similar shape, you can use it for your own data with Data Studio. MINHAZ KAZI: Yes. So this is the sample data set we have on Big Query. It's using that sample data set. But I just connected that data set to this template. And it's working. FELIPE HOFFA: Cool, and this is working. And it's connecting straight to Big Query. And now we're in a similar place. Like, this is great for a data analyst. MINHAZ KAZI: Yes. FELIPE HOFFA: But if I want to publish this-- MINHAZ KAZI: So if you want to make this scalable, for an analyst, what would happen is let's say analyst would probably create a direct connection from Big Query to Data Studio. So you write a query. You create a dashboard and you have the direct connection. And for the analyst, analyst this can create a dashboard like this, for example, based on Chrome UX dataset. But they would incur very little cost, depending on how many queries they're doing. So if one analyst does 300 completely unique queries, so it doesn't use any of the cache in Big Query or Data Studio, it's completely unique queries, and on average, 30 gigabytes of query size, they will use 9 terabytes of big data, Big Query data. And then that would cost them $40. FELIPE HOFFA: Yeah, for an analyst, pretty cool. Now you are querying 300 times, 30 gigabytes of data. That might be a big table, but the price is reasonable. Everything is working fine. But then you want to publish this. MINHAZ KAZI: That's where you might face some difficulties, but-- because let's say you take the same dashboard. You put it on a website, embed it somewhere in your Medium post. And then suddenly, hundreds of people are looking at it. So lets just assume, hypothetical scenario, 500 visitors are coming to your website, There are four different queries in your dashboard and each user is looking at the dashboard six times-- again, around 30 gigabytes of size. You end out with 350 terabytes of data usage in Big Query. FELIPE HOFFA: So the first thing you have to do when you start playing with this is-- MINHAZ KAZI: You have to have some kind of caching there. FELIPE HOFFA: Yes, or turn on? MINHAZ KAZI: Well, yes, that's a good point. First line of defense is this. FELIPE HOFFA: Yes turn on your cost controls in Big Query so you will not wake up to any surprise. These are pretty easy to use. You just turn them on and everything-- you will not wake up to any surprise because your queries will be contained. But them I still want to put this dashboard online and I want to have all my costs contained. MINHAZ KAZI: There is a way to do it. FELIPE HOFFA: Mm-hm. MINHAZ KAZI: So what we could do is everything that we just looked at, you have this caching mechanism. You have templates and you have scalability. We can put all this together into one solution. And that is, how can your users go from complex data to a scalable, cost-effective dashboard in five clicks? By using community connectors. FELIPE HOFFA: Exactly. With a community connector, we will be able to encapsulate all of this logic without having the users need to know all of these tricks. MINHAZ KAZI: Yes. Community connectors are, again, written in App Script. They are connectors for Google Data studio. You basically define a few functions and then add your connector to Data Studio. And then Data Studio can pull the data through the connector. And you can implement your own logic for caching or permissions. You can add all of that. FELIPE HOFFA: Yeah, with App Studio, you have the flexibility to build any logic you want. MINHAZ KAZI: Yes. For example, we took the Chrome UX data set. From the Chrome UX data set, you can potentially build a dashboard that looks like this. But we wanted to give the user more flexibility. So what we did is we created a workflow that looks kind of like this. We have the initial dataset in Big Query. So every month, we do an initial level of aggregation, and we create a smaller Big Query table. FELIPE HOFFA: Exactly. You go from the 30 gigabytes table to-- MINHAZ KAZI: Less than 2 gigabytes. FELIPE HOFFA: Less than 2 gigabytes-- that's much better. MINHAZ KAZI: Yes, so that is our first level of cache. Then for every URL, once per month, we cache that URL's data into Cloud Firestore, which again has a free tier. And from Cloud Firestore, every time someone wants to look at a dashboard containing that URL, we pull in that data into Apps Script Cache. Apps Script Cache is not permanent. It will be alive maximum from maybe a few minutes to six hours. So if you're using the connector to connect to that data, you can get very fast performance, but it's not persistent. So you'll have to pull it again from Firestore. So we take all of that. We take the cache data and we add it to a custom template. And then we have this community connector that you can use in Data Studio. FELIPE HOFFA: Yeah, that so Data Studio connects to the connector. You encapsulate a lot of the logic there, and it goes on every level of cache that you want. MINHAZ KAZI: Exactly. FELIPE HOFFA: But as a user, I want to know my own websites. How do I use this? MINHAZ KAZI: So if you're the user, you don't see any of this. You don't have to worry about any of this. All you get is one URL. So you go to this URL. And we have hyperlinked it. We go to this URL. That's it. You get a new interface where you have to add in your website. So let's try Wikipedia. FELIPE HOFFA: Wikipedia or any of your favorite websites. We have 5 million URLs there. MINHAZ KAZI: So we'll add Wikipedia. We will add these origins to be modified. Connect, allow. FELIPE HOFFA: And this is a public URL where you are right now. MINHAZ KAZI: Yes, this is [INAUDIBLE].. And so what this is doing is it's going into Big Query, pulling in the data. So right now, it went into Big Query, pulled in the data, put it in Cloud Firestore, pulled the data from Firestore, put in Apps Strip Cache. And it's now ready to create the dashboard. So we create the report, and we already have a template attached to it. So the user makes-- puts in the URL, makes five clicks. And they end up with their own dashboard for the URL. FELIPE HOFFA: Ooh. MINHAZ KAZI: And this is their own dashboard. It's not like a shared dashboard that different people are viewing. It's their dashboard. They can go in. They can change things, edit things. They can do whatever basically they want. FELIPE HOFFA: [INAUDIBLE] MINHAZ KAZI: For example, I will put in a comparison here. So I can take the same thing. I will check the URL and take this one, copy it, paste it. So I have two things side by side for the same URL. And maybe I will compare with a different website. FELIPE HOFFA: Any website? MINHAZ KAZI: Any website, anyone? FELIPE HOFFA: Any suggestion. MINHAZ KAZI: Google dot-- FELIPE HOFFA: Google.com, yes. MINHAZ KAZI: [INAUDIBLE] dot google.com. And the users can do this. We have our-- this is the user just went into this. They can view the same thing. So you have side-by-side comparison of your website with any other website that you pick. And this is real-life usage data. It's not even like sample data. This is total Chrome anonymous usage statistics. FELIPE HOFFA: And in this case, the connector is using your credentials. MINHAZ KAZI: Yes. FELIPE HOFFA: Like-- and user can create the new instance of the connector. The credentials are privately stored there? MINHAZ KAZI: Yes, the credentials we're using is on a service account. So whoever created the connector is bearing the cost. So the users who are using this connector, they don't have to worry about having a Big Query account, or they don't have to worry about cost, because they're not incurring any cost. FELIPE HOFFA: And you don't worry about the cost either because you built all of these cutting layers. MINHAZ KAZI: Yes, so we can have a look at the cost now if we go back. Initially, we were incurring this much cost and this much data usage for this connector. But if we implement the new model for caching, what ends up happening is it's the same 500 users. But you get less than one query for dashboard view because you have everything cached. And your query size becomes 2 gigabytes, so it's less than 1 terabyte of Big Query data usage, which is in the free tier. And it's less than 10 gigabytes of Firebase usage, with is, again, in the free tier. So your cost becomes zero. That's how you can encapsulate all of this and create a complete solution. But what about permissions? Oh, good thing is if you want to create new community connectors, we have this Codelab. You can go to this Codelab and learn how to create your own community connector using Apps Script. It should take you about 15 minutes to 30 minutes. And you can create connectors to any Web API or external services. FELIPE HOFFA: Yes, and it's an amazing Codelab because Minhaz wrote it. So let's talk a little bit about permissions because we have private data. We have enterprise data. There are levels where we want to secure our data. So what can we do about it? MINHAZ KAZI: And I believe Big Query has some permissions inside of-- FELIPE HOFFA: Exactly. Big Query has the Google permissions model of who can read, write, or modify tables. And you can make your data available at that level. You can, if you go to the next screenshot, you can see here, like, you can have users by email, or create Google Groups to make easy to manage them, or with any domain, or for all of them to get the user for public data. Or even you can create views that allow people to see your data but only through a view, which is a pretty cool model. But you don't even need to show your data on Big Query if you do it on Data Studio. MINHAZ KAZI: Yes, Data Studio also has several layers of permissions. If you create a Data Studio report, you can share it with other people just using the drive permissions. So you can share it with a selected group of users. You can share it within your entire organization. You can also share it with everyone who has the URL. And you can share it either for viewing permissions or for editing permissions. You can also take this report and embed it. So let's say you shared this with everyone, and then you can take-- that report and embed it in a Medium post like Felipe did. You also get permissions at the data source level. So when you create a data source and you add the data source to a report, you can put in either viewer's credentials or owner's credentials. What that means is if I create the report and I put in owner's credentials and then I shared that report with Felipe and Felipe views that dashboard, Felipe will see my data. But if you enable viewer's credentials, when Felipe views that dashboard, Felipe will see Felipe's data and I will see my data. FELIPE HOFFA: And I can control if people want to see my data that I share with them or if I can only-- they can only see data they have the permission to see. The permissions get passed all the way through Data Studio into your source. MINHAZ KAZI: Yes, and the good thing about community connectors is that using community connectors, you can implement any kind of permissions model that you have. If you have your own permissions and own credential setup, you can also flow that through community connectors and add that as a layer on top of all of this. FELIPE HOFFA: Nice. MINHAZ KAZI: That is pretty cool. FELIPE HOFFA: That's how our partners create their own connections. MINHAZ KAZI: Yes, most of our partners have their own credentials which will flow through this. And then they will-- they basically know which user is using the connector. And then they can show that user their specific data. We went through all of these four data sets-- the Wikipedia page views, Stack Overflow trends, Google Analytics for Firebase, and Chrome UX reports. That was pretty interesting for us, I guess. FELIPE HOFFA: Yep. Hopefully, it was interesting for you, too. Let's go to our takeaways. So Big Query is really, really awesome to analyze terabytes of data. MINHAZ KAZI: And Data Studio is more awesome because it lets you dashboard all of your data for free. FELIPE HOFFA: Exactly. But if you want to put them together, you want some rules in between. MINHAZ KAZI: Yes, there needs to be some something in between-- the caching layer, something in between. FELIPE HOFFA: Exactly. And there are ways that you can do this that Materialize will-- you will have soon and will be pretty easy to use. MINHAZ KAZI: And you can use Cloud SQL. You can use community connectors. Adding a caching layer will let you avoid having, like, exploding number of queries in Big Query. FELIPE HOFFA: Yeah. MINHAZ KAZI: So that is basically the end of our presentation. If you want to know more about Big Query or Data Studio, you can view the subreddits. You can ask questions on Stack Overflow. FELIPE HOFFA: If you want to know more about Minhaz, follow him on Twitter. MINHAZ KAZI: If you can't get enough of Felipe, follow him on Twitter. FELIPE HOFFA: Your feedback is our data. MINHAZ KAZI: And we love data. FELIPE HOFFA: Yes. Thank you very much.
Info
Channel: Google Cloud Tech
Views: 24,837
Rating: 4.9157896 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate
Id: KG8CpA3m1q4
Channel Id: undefined
Length: 35min 11sec (2111 seconds)
Published: Wed Jul 25 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.