CARTER PAGE: Thanks
everyone for coming. I know it's late in
the day after a couple of very busy sessions. If any of you want to go
grab a beer and come back, that's cool too. It's been going on for a while. But anyways, thanks for
joining us very much. I'm Carter Page. I'm the Lead Manager for
Bigtable at Google, which includes the Cloud
Bigtable team and we have George McMullen, who's
the Senior Director of Products at iCharts. GEORGE MCMULLEN: Hi, everybody. CARTER PAGE: And
the two of us, we're going to be presenting about
BigQuery federated queries on Cloud Bigtable and
how iCharts uses it. So cutting straight
to the chase, we just announced this
morning that you can now run high-speed SQL queries on
unstructured data in Bigtable. So you don't need to hook
up your Hive or anything else like that. You can just connect BigQuery,
and now you've got SQL. You can basically turn Bigtable
into another virtual table in BigQuery, and do
everything you can do with BigQuery on top of it. So the advantages of
using this approach is you can query row ranges. So if you can select row keys. Basically you start in
row keys in Cloud Bigtable when you're doing
queries through BigQuery, and you can get
very fast results. As long as you're kind
of ranging that data. If you do full table
scans over terabytes, it's not going to be
as fast as BigQuery is. And it's not exactly what this
connector is optimized for, but you can do it in batch mode. I'll talk a little bit
more about the performance tradeoffs, why this is,
et cetera as we go along. The important thing is
that it supports SQL. It supports both the legacy
SQL and standard SQL. There is some
performance optimizations that are coming on
the standard SQL in the next couple of weeks. And I'll also talk
about that a little bit. And all the time while
you're running BigQuery against a live real time
Cloud Bigtable database, you can still use
the Cloud Bigtable interface for the
low latency reads and writes that it provides. And by connecting
Bigtable to BigQuery, it also connects Bigtable
to the Big Query ecosystem. So anything that can read
from a BigQuery table can now read from
a Bigtable table. Here's the agenda. Basically I'm going
to run through real quick how Big Query works. And then I'm going to go into
some detail about how Bigtable works. I'm going to talk then
specifically about the BigQuery Bigtable connector, and
some of the tradeoffs. Like how you use it, when you
use it, when not to use it. Then we're going to
get in and do a demo. We're actually going
to set up the connector and run BigQuery
against a Bigtable. And then iCharts is going to
come in and show how they plug into this connector as well. So stepping back a little
bit, the broader roadmap here. Google Cloud Platform has seven
different storage products ranging from a memcache all
the way up to a data warehouse. And what we're going
to be focusing on today are these two, about Cloud
Bigtable and BigQuery. These two databases
are what we refer to as our analytics databases. And they have different
strengths and weaknesses. There's tradeoffs for them. So what Cloud
Bigtable is designed for is for being able
to scale to petabytes and beyond, and be able to
provide very fast single value reads and writes. A spot reads and writes in
single digit milliseconds, even out on a long tail. Like 99% and beyond. What BigQuery is specialized
for is the ability to take terabytes of data
and do aggregation across it. And we can do aggregation
analysis and return results in just a few seconds. The inverse of those is
not actually true, though. BigQuery cannot do very
fast individual spot reads and writes. And while you can do
scans over Cloud Bigtable, it's not optimized for it in
the same way that BigQuery is. And so large scans will
end up taking minutes. And I'll talk again a
little bit why that is. But the basic gist of it is
that the architecture of the two systems are different. So Cloud Bigtable is row based. All of your data is basically
stored one row at a time. And so the files that underlie
Bigtable are sets of rows. So one file contains
a set of rows. BigQuery on the other hand, is
actually each file is a column, or actually each column
is a set of files. So if you have 20 different
columns in BigQuery, and you read two different
columns from the same row in BigQuery, you're reading
from different files. And I'll explain
a little bit more why that affects the
relative performance characteristics of the two. So quick dive into
how BigQuery works. So fully managed, no
ops, data warehouse. When you hear data
warehouse, store schema, those sorts of things, that's
where your BigQuery comes in. Petabyte scale, it's super,
super parallelized for speed. If you ask a
question of it, we'll spin up tons and tons of
workers to analyze your data, and charge you just for
the second that it's up, give you the response
back, and spin them down. It has the convenience of SQL. So you don't need
software developers to ask simple
questions of your data. And it's durable
and highly available like with similar
characteristics to Bigtable. The basic journal architecture
for its first decade-- I think this was published
in a paper around in the late 2000s-- and basically what
you have is it's sort of like a on spot MapReduce. You essentially have
these leafs that will each be assigned to a
chunk of data in BigQuery. And they will go
analyze it, they will aggregate the results,
send it up to the next layer, to the level of mixers, which
in turn will aggregate again, and send it up to the top mixer. In 2015 we developed the
Dremel X architecture, which simplified and
actually sped it up where you really just
have one master talking to all the shards to
aggregate things up. And underneath, these talk
to this columnar storage, which in 2016, we replaced
the previous columnar storage with capacitor, which is
faster, especially when dealing with encryption at rest that
we do for everything at Google. And as I mentioned here,
the important thing of having a columnar
store when you're doing large aggregate queries is-- say you have a table that
has 100 different columns and you want to be able to
aggregate the values in column A, based on the values in column
J. BigQuery will basically rip through those two columns
and will spread those out and aggregate them, and
return results back. If you did the same thing
on Bigtable however, Bigtable would be
reading every single row as you go along, just because
of the way it's stored. And that's by design, in terms
of what they're intended for. So I'll go into
how Bigtable works. There are some slides that were
in some of the other Bigtable table talks today. I just want to guess
it's here, like, how much I need to go
into deep dive on this? How many people have been
to other deep dive Bigtable sessions today? There's enough people
who haven't, so I'm not going to rush through this. So Cloud Bigtable, it's a
fully managed NoSQL database. Basically kicked off the NoSQL
revolution in the mid-2000s. It led to a bunch of fun open
source solutions that imitated or either took direct clones
of the architecture described in the paper, or were
heavily influenced by it. This includes HBase, Cassandra,
Hypertable, the NSA built one-- I think it's called Accumulo. And it's basically inspired
a whole family of databases out there. The basic important
characteristic about it is it's incredibly scalable. So it can scale petabytes,
hundreds of petabytes, whatever. And it's very, very low latency. So even with large data,
even if you're pounding it with a million QPS you get
very tight low latency, even at the tail. Single digit
milliseconds typically. That supports
sequential row scans. So the row keys are
stored sequentially. Think of it like
alphabetical order. And this allows you to
do some clever things in terms of optimization. So for example,
one of the examples we'll give them the
demo is the idea of stock ticker data over time. And so if you prefix your
row key with a stock ticker, you can tell it, hey I want to
read the data for this stock ticker during this
window, and it will only read that data there. The other thing that's nice
about it is under the covers, because they're stored
together-- so like, we're going to look at the
stock ticker GOOG, for Google-- when you start reading the first
entry, the first row for that, it's going to go
to the File system, and their story together
in the same files. It's already going
to have pulled that file into memory into the
block cache of the file system, and so it's actually
very fast doing scans. Not quite as fast as BigQuery
for the reasons outlined, but still pretty good. And then the other thing
is it learns and adjusts the access patterns. So as you have diagonal
patterns during the day, where you're going hot
or you're going cold, on different areas of the data
Bigtable will learn about this and will rebalance. And I'll show how this works. So here's basically what
a Bigtable looks like. Conceptually, you could
have millions of columns. So it can be very wide columns. You want to have a single row
to have less than 100 megs. But apart from that, you
have of no constraint on number of columns
you throw into a row. It's a sparse table,
meaning that you could have a billion rows,
and each of the billion rows could have a different
million columns than each of the other rows,
and it doesn't affect the space. You only pay for the row column
intersections that you use. You'll notice that the
row keys are sequential. That was the lexicographic
or alphabetical ordering of the row keys I mentioned. And then at the top, there's
this notion of column families. In this case it's
follows and friends. And that's the only
schema definition that happens in Bigtable. You'll define a column family. We may define column families. And then the columns themselves
are actually not defined until insertion time. So when you make
a mutation, that's when that column first appears. And it may only
appear in one row, or it may appear in many rows. That's really up to use
the application developer. So there's a lot of
flexibility, particularly if you're ingesting things
with unpredictable schemas. This is one of the
reasons that it's a great fit for
IoT, where you can get some kind of
surprising properties from different devices. And the last thing I'll
point out in this slide is that the database
is three dimensional. So you have your row,
you have your column, and then we'll store whatever
number of versions you want. You could store
them forever, or you could put a TTL or max and
reversions to kind of manage your data size. So this is the Bigtable
architecture in a nutshell. The left is kind of a
deep dive into what's inside our tablet server. And I won't go into
the details of that. That would take a while. If you're curious go read
the OCI paper from 2006. A Bigtable that goes into gory
detail in terms of how Bigtable works, and how it operates
and scales the way it does. The important thing
to notice here is you've got your
Bigtable client, and it's talking to a
pool of tablet servers. And each tablet server will
represent a certain amount of data that's
stored in colossus, which is our file system. And the important
thing there is the data is not actually stored
in the tablet service. So they're
essentially stateless. And they represent and they
pin the data from colossus. And it's our high
speed networks that allow this separation to
happen without any performance degradation. Then we have a master
disk coordination. And we have Chubby,
which is kind of like-- if you ever
heard of ZooKeeper, it's kind of the same thing. So when you connect to
Cloud Bigtable from outside, you talk to a load
balancer proxy. And then it will actually send
you to the Bigtable nodes. We're not going to talk
about the load balancer proxy right now. It's kind of abstracted out,
as far as you're concerned. And we'll talk about
how things work inside. So we talked about
the nodes being separated from the storage. And this allows you
to do two cool things. So one is that you could easily
move data around essentially. So imagine you have
a situation where you have this node on
the left that's getting a lot of access patterns. Maybe it contains data for a
certain region of the country that just woke up. And what Bigtable do
is it will recognize this in a matter of
seconds, and then it will send a signal to that
node to unload the data. And then it will send a
signal to the other node to load the data and
start serving this. And this happens in a
couple of seconds, sometimes hundreds of milliseconds. It's practically imperceptible. Whoever happened to be
right at that moment, might get a little
bit of latency on the very long tail, like four
nines or something like that. But for all intents and
purposes, it's transparent. And this is happening
all the time. It's a very aggressive
at rebalancing to keep things even. The other nice thing is you can
resize too kind of on the fly. So if you need to
throw more nodes on, unlike Cassandra
or HBase, you're not also throwing more disk
with the nodes themselves. You can spin up
and spin down nodes to increase your throughput,
increase your QPS. You can do that for a batch. And then after the
batch is done, spin it back down again if you like. And because it keeps
the balancing nice and even, and because
the nodes actually don't talk to each
other, you don't get that kind of
exponential price that typically comes
from the distributed systems, or offering
distributed systems. And so it goes in a straight
line from 3 nodes to 30 nodes. By the way, so we kind
of give a guidance of 10,000 QPS per node. And that's kind of what
we show on the console. And that's based off of a load
test of 1 kilobyte payloads, mixed read right 50/50. If you have very large rows
that are like 100 megs, you're going to get
much lower than this. If you have much smaller
rows that are 10 bytes, you'll get much, much
greater than this. So this is just a rule of thumb. The most important
thing to do if you're trying to actually
figure out to scale it is run your own load against it. The important point from
this is that you can trust Bigtable to scale linearly. So as you keep increasing
up, the number of nodes will increase your ear load
throughput proportionally. And in fact, we had a customer
last year who did a load test, just to kind of
prove they could-- and we want to prove we could-- where he ran 3,500 nodes
and got 35 million QPS. Now the thing that's really
cool about 35 million QPS is they were actually
persistent and durable QPS. These were two disks. If you had turned off the power
right in the middle of it, you wouldn't have
lost a single bite when it came back up again. So its performance is comparable
to an in-memory database. So the connector here. So as I mentioned, each
database has its own tradeoffs, so because BigQuery's
column base, it allows you to query
terabytes in seconds, and aggregate that
data really quickly. Cloud Bigtable being row based,
and because of other parts of its architecture,
gives you single digit millisecond latency. So very fast. To use both, sometimes people
want both aspects of this, and that's why we
built this connector, you previously had
to dual ingest. In terms of building
a system, maybe using a Pub/Sub with a data flow,
riding into both systems hoping to stay kind of in sync. And then sending one
program to one or the other, depending on what
your use case is. This Federated connector
that we're releasing today, basically replaces
the column aisle store that's at the bottom of
the BigQuery diagram, and just puts Bigtable in there. So you can basically
tell BigQuery either go to your native store,
or you can point it to the Bigtable APIs. And it basically treats
Bigtable as an application would talking to Bigtable. The benefits of this is you
get SQL on the Cloud Big table, and that's pretty cool. Because if you have a simple
question about your data, previously you would have
to either go instruments, something like Hive. Or you'd have to go, have
a developer write a program and run a MapReduce
and hopefully get that answer to you
in the next few hours or something like that. Now you can do ad hoc queries. You can keep them canned. People who are not developers
can actually can ask questions about the data that's in there. Once you have this
connector in place, you can use the same tools
that work against BigQuery. And that's why we
have iCharts up here. And they're going to show
off a little bit about what that allows you to do. No table partitioning
is required. So if you're familiar
with BigQuery, BigQuery does full table scans. And if you're doing something
like warehousing all your data over time, you'll want
to partition your data into days or weeks or something
like that, so that you're not reading five years
of data when you're just curious about two days
or something like that. Bigtable solves this by
providing row ranges. And rather than defining
it ahead of time, you define your row
range is at query time. And it will just select the
data that you requested, and return that. And so as long as you're
keeping the row ranges small, the query times will be very
fast, one or two seconds. And now you only
have-- for the people who this is a good
connector for, you only are storing your
data in one place, and that simplifies
things a bit too. BigQuery is seeing the same
real time data that Bigtable is. The trade off, as
I mentioned before, is full table scans
are going to be slower. If you still want to do lots of
ad hoc full table scans all day long and return in really,
really quick times, you may still need to
do this dual ingest. If you're doing smaller
subsets of the day, you're probably going be just
fine with a connector on top of Bigtable. And the other thing
I want to point out is that Bigtable
connector is read-only. So unlike BigQuery, you
can read out of a BigQuery, get some results, read it
into another BigQuery table. You can't do that with Bigtable. However, you can
do everything else you could do with a read-only
connector on BigQuery. You can join a Bigtable read
with data on a BigQuery table. You could even join two
Bigtable columns together, or two Bigtable tables together. So let's stop talking
and start typing. Let's see, can we switch
over to the laptop. So I have an instance here. It is called GCP demo. And I have it sitting
in US central 1c. And this is important because
BigQuery resides into regions. It resides in US and
it resides in Europe. And because of the amount
of data that it's reading, you need to co-locate your
Bigtable in the same region. So here we have it in
the zone US central 1c, which is in the same region
as the BigQuery US region. So let's take a look
at the data here. We're going to use
this tool called CBT. If you use Bigtable, and
you haven't used CBT before, then you should grab it. It's a GCloud component,
and it makes it really easy to do simple admin things
for the command line. It's a lot easier than using the
HBase admin tools or whatever. And so here I just
basically said, read the first row of the
data in this table ticker. And start with the first row
that matches the prefix Goog. So we see actually
the opening day on the stock market for
Google, on August 19, 2004, and it closed at
$50,000 that day. We've got a high and
a low and a open. The volume there is 0. Some of the data in here is
based on what year it was or whatever. For both the volume and the
open can sometimes be 0. So it's just an artifact
of the data essentially. But we want to do, if you
wanted to actually start looking at more of this data,
using this interface quickly becomes very clumsy and
hard to actually parse. So what we want to do is
actually make a BigQuery table so we can actually do
something interesting with it. So let's see. So we're going to create
a new dataset first. We're going to call it demo. And as I mentioned before,
we're going to put it in the US. So it's co-located
with the Bigtable that we have running
and US central 1c. So click OK. And that's great. It's pretty fast. So then what we have to do
is we have to tell BigQuery how to talk to Bigtable. The trick here is Bigtable
has no notion of typing. It's all binary data. Bigtable doesn't know if you're
storing an integer or a string or a movie or some
ciphertext, or whatever. You just store bytes in it. So we need so BigQuery when it's
reading from certain columns what the type of that data is. And the way we do
that is this is the only slightly complicated
aspect of the connector as you just define a JSON file. If you get this, everything
else is brain dead simple. So this JSON file
starts with a format. And you have a reserved term
for the format for Bigtable. So you put in all caps Bigtable. And then it will be
looking for a source URI to define where
that Bigtable is. So we point to
the Bigtable API-- to Google APIs. Com/Bigtable. We point to the project
name, project ID. We point to the instance ID,
and we point to the table ID. Then we set up our options. So the first option that's here
is actually pretty important. It's read row key as string. So 90% of the time,
probably more, you want to use strings as
your row keys on Bigtable. It gives you the most
kind of flexibility, particularly if
you're concatenating a bunch of different
types of data together. The important thing in this
case is if you give BigQuery a string row key and you say,
my row keys are all strings, then BigQuery will know
how to be able to select between individual ranges. And that's the key to actually
having a fast connector. So when you're selecting between
to between two particular rows, and you're reading that
100 rows or 1,000 rows or 10,000 rows or whatever, not
maybe the whole billion rows that are in that table. The second thing, only
read latest, you usually want this set to true. Now if you recall, Bigtable is
a three-dimensional database. So each column or
row intersection can have any number of versions. It's interesting for
historical purposes, usually you just
want the latest one. But if you don't, you
can set those defaults, but you typically
want this to true. And then you actually define
out your column families. In the simple example we
were working with today, I have one column family
just simply called T. And so we specify the
column family ID there. And then I specify a type here,
which in this case is a string. And in this case, I'm
saying that all the data in this column family
happens to be a string. That doesn't have
to be the case. You could either set
a default, and then override in columns
that are different. Or you could set each column
to be a different type. You could say this is a
string, this is an integer. But it's important to
set the correct type so BigQuery knows how to
correctly parse your data, or how to turn those bytes
into the correct data you are looking at. And then here is the
individual columns. And you can see these match
up with what we saw before, looking at the
raw Bigtable data. Symbol, close, date, et cetera. So the way we actually
create the new virtual table is fairly simple. We use another
command line tool. This is BigQuery's GCloud
component called BQ. And again, if you
use BigQuery, you should probably have
this tool, because it makes your life a lot easier. We'll call the MK command
to make a new table. And we specify the project
ID of the BigQuery. They're coincidentally the same
place, but just pointing out is not the Bigtable project,
this is the BigQuery project. We specify the external
table definition, the JSON file that
we just looked at. And then you say the
new dataset that we created, demo, and then adopt. And then the name of
the new BigQuery table. Note that's not the name
of the Bigtable table that's specified inside
of JSON file itself. We're specifying the
name of the virtual query that we want to have
queryable inside of BigQuery. And we do that. And after a long
time, it's done. So we go back to BigQuery here. And we'll refresh. And generally, it takes
about five seconds. So pretty fast. And now we have the
BQ ticker table here. Let's take a look
at the schema here. And you can see
it's basically you got a schema that it's got
a lot more kind of layers that are in here. But the things you
generally care about are kind of like you have
your T.symbol, T.close. In fact, what we'll do here
is we'll grab the close price on Halloween. So T.close.Sell.Value
from BQ underscore ticker where row key equals Goog. And it will say
underscore 2016 1031. I'll run this. And so we get a closing price
of $785, basically, on that day. But you know, this
is a BigQuery. So you can already do spot
lookups inside of Bigtable if that's not very interesting. So you can do some more
interesting sorts of things. Let's look at the min and
max for the entire month of last October. So because the
values are strings, you want to cast them to floats. Were going to indicate that the
min of the low for the month is going to be called low, the
max of the high is called high. And we're going to give
it a row key range. It's going to be
between this and this. So we could even do a prefix. You could have basically
a row key prefix, and just say it
goes up to 2016 10, it will basically
do the same thing. So we do that and we
find out that the low for the whole month was
$770, the high was $816. But usually like when
you're plugging this into reporting
engines, you want to be able to do more
interesting things. Like build histograms of data. So let's look at a histogram
of the highest high and the lowest low of each month
ever since Google went public. So you run that, and
after 1.3 seconds we've got our answer back. We've got basically the
range for each month, for all 151 months that
the stock has been trading. So that's just showing
the raw functionality. But I think what
gets interesting is when we can actually
turn this into some kind of visualization. And for that, George is going
to come up and show us iCharts. So can we switch back
to the slides please? GEORGE MCMULLEN: Awesome. Thanks. CARTER PAGE: Thanks, George. GEORGE MCMULLEN: It's
pretty incredible stuff. Hey, everybody. I'm not going to do a demo yet. Another demo. So just for anybody who doesn't
already understand or know what iCharts does,
we help our folks like yourselves
understand the trends that are in massive,
massive amounts of data so that it can help you
answer questions and make decisions and drive actions
that can drive your business. And so some of the things
that our customers really like about us is the
simplicity easy to use. It's a drag and drop
type of interface. And it's not to say
that we are trying to hide any of the power. We actually do give
you some of the power as well to do some really native
stuff within our interface. And I'm going to show
you some of that. Also, multiple
levels of aggregation so you can drill down into
the details of the data, while still seeing those
higher levels of aggregation in a hierarchy that
you define on the fly within the interface. And something else that's really
important is single sign on. Being able to use Google OAuth
to get into our platform. And that's really
important for folks who don't want to have
another username and password. It simplifies the
whole data management. And it also enables
embedability whereby you can use our visualizations
or the entire reporting system within the products that
you're already using. Just simple example is
if you have a dashboard, you could put that
into Google Sites. Or you put a chart
in Google Slides. So how do we work? We start at the
visualization layer. To us, everything needs
to be a visualization because that's the goal. Those are the questions that
you're trying to answer. Those are the decisions
that you try to make. And so that's where we start. Except for this demo, I'll be
going into a little bit more of the meat and the details, and
peeling back cover to show you how it's all working. And then what our
system basically does is it takes that
visualization configuration, and creates a dynamic query that
includes all the calculations and aggregations, segmentation
filtering, windowing, all that wonderful
stuff, and parses it out into the APIs like
Google BigQuery's API. Gets all the data
back, pushes it back to the user where they
can then interact with it. So before I get into the demo,
got a bit of a confession to make. We've never worked with
Bigtable at all before. We've never done any kind
of integration or anything. Yeah, I know, right? But that's one of
the reasons why we're so excited about this. Because now we can offer
this to the customers that we have that
have Bigtable data. And then they can access
all that information alongside their
BigQuery information in one single place. And that also helps
with the simplification of data management. And they can also query on it
using SQL, using the language that they already kind of know. These analysts already know SQL. And most importantly,
I think really is the fact that they can
take this unstructured data that they have in one
system, and combine it with structured data that
they have in another system, and see it all cohesively
in a single visualization and report. But I think the most important
thing for us as a company, is that we didn't
have to do any work. So I really like that. On with our demo. If you could switch
back over here. And a little bit
also about iCharts. We run on standard SQL. As Carter mentioned,
they're still doing some query optimization. So things might run
a few seconds slower. Might have to wait a couple of
seconds for things to load up. But one of the things that
Carter just showed, and I'm going to pick it up
where he left off is that query that was showing
a histogram of the Google stock ticker over months. So I did a little conversion
over to standard SQL. It was really simple to do. Just a few lines that
needed to be changed. Plenty of documentation on that. And I'm already logged into
iCharts through Google OAuth. Just refresh the screen. Make sure that I'm still here. And I have access to all the
data sets and all the projects that I already had access to. So my BigQuery-- there's that
demo data set that he created. Here's the stock
ticker data set. And I can actually use
that query as a data set. I could drive directly
by the tables themselves, but I wanted to use
this particular use case as an example. So I'm just go to paste
the query in there. I've already done
this a few times. Practice makes perfect. I'm going to go
create this data set. Actually you know what I'll
also do is I'll take this query and run it in the
BigQuery console as well. So you kind of see that-- show options. Disable, legacy SQL. Get rid of cash results also. So right now it's basically
running the same query. Actually, I think
it returned already. I'll hide the options. It didn't take that long at all. So we see the same
exact data that we saw in Carter's original demo. It's there. So now I'm in our editor,
and we have the fields that have defined in that query. So we have the month,
the low, and the high. And I could basically
take this and replicate the same exact output
that Carter showed us, and that I just showed
you the BigQuery console. It's going in. It's basically getting
every single month out of that dataset, piping
it through BigQuery's API, and then going down
into the Bigtable through the Federated system. And now it's going to
basically do the same thing and get the highs. And it's also aggregating
it at the same time. So if it were kind of
role level information, not preaggregated
through a query, then you'd basically
be able to see a sum. Or you can also choose
things like account, average min, max, distinct. Or you could do other things,
which I'll show in a moment. So this is great. It's pretty much the same thing. But I promised a
visualization, so I'm going to show visualization instead. And we'll see how basically
all of this, which is kind of difficult to
interpret just by reading it, a few rows, but
this is much easier. And you could obviously see
hey, if I invested in Google in the beginning of time, I'd
be doing pretty well right now. And if you had done
that, you had you'd be 10x your original
investment, maybe even more. But most people, they don't
have just one stock symbol in their portfolio. They didn't just put all
their eggs in one basket. They're diversifying, they
may have multiple portfolios. And if you're building
an application that is for portfolio trading,
or some kind of trading application or
brokerage application, you're dealing with thousands
or even more portfolios that contain hundreds of thousands
of ticker symbols in them, with billions and
billions of traits. Not just purchased, but
every single time you've made a trade, that's being recorded. And so that's kind
of the next use case that I wanted to show you
really about how can we combine this information
here that's coming from Bigtable with some other
data which would represent my portfolio within BigQuery? And I'll just take a
look at BigQuery here. Show you what my
portfolio looks like. And hide this. It's a basic table. Portfolio name,
ticker, purchase date, when I purchased the stock,
how many shares I purchased, what was the price
that I purchased it. So I've got another query below. And you could do this
all within our interface, but I think it's important
here to kind of show the core SQL since
this a little bit more of a technical audience. Basic select command. Taking the information
from my portfolio, taking some of the information
from the ticker table that's driven from Bigtable, doing
the from the portfolio, joining it just like you would
join any two tables together, which is great because
these two tables are in different systems. And just joining it on the cell. And then we basically have
a little bit of optimization here where you're
saying, the row key is from between the time
that I purchased the stock until the present time. And it's doing some
preaggregation for us as well, just to
make things fun. If I could copy and paste this. I can even use
this as a data set. And it just basically create
a user entered query here. Give it a name,
portfolio by month. Create a data set. I'll also show you what it
looks like on BigQuery itself. Give you an example of
what happens on query. Hide options. Wow, that was fast too. So here, now I have-- we're
seeing a portfolio, all the tickers, when I purchased
the shares, how many shares I purchased, the purchase
price, the price, and the month together. So price is actually the
price during that month. And you know what
I'm going to do, is I'm actually going to do a
little something extra here. Because the price of
the shares currently, versus the number of shares
that I have, is great. But I actually want to
see what the value is that is currently
existing in my portfolio, or how it's changed over time. And that's really
straightforward simple calculation. I'm just going to multiply
purchased shares to price. And certainly, you could have
created a view to do this. But I think it's more
important to actually do it directly within
the dataset layer within our semantic layer. Because if this were a
more complex query that was put onto a
preaggregated dataset then you'll lose the capabilities
of filtering and drilling down into the data. So it's more important to have
that directly within our layer, than trying to do it
directly within the data. So I'm just going to copy
that, and edit our data set. And just going to call it value. Value. Going to give it a float. Paste in my thing. And you know what? I'm actually also going to
do a little extra formatting just so it looks good. So now I could create a pivot
table directly on this data. So what I'll do is I'll take
the portfolio name that I have, and what it does
is goes directly into BigQuery automatically,
gets all the portfolio information out,
and then maybe I want to see how each portfolio
has evolved over time. Take that, put that
into the columns. And you know what? I'll also add it to the filter
as my trading started in 2014, but maybe I'm only interested
in maybe the past six months. So I'll just pick a date
range from end to February to October sounds nice. And then I'll take the value,
which is the calculation that I entered in to our
semantic layer, and I'll add it
right within there. So now it's actually going
in, creating that query that breaks it down by portfolio
name over the trading months, and multiplying the current
price for each individual month by the number of shares. So I could see each
individual portfolio and how they've evolved over time. But this is great. But maybe I want to get in
a little bit more detail. What about each
individual stock ticker? So I'll just drag
this into our rows. And it will perform the same
kind of query over again, but drilling into each
individual stock ticker. And now we see
energy, every stock ticker, all the individual
amounts, financial. And what it's doing here is we
have the individual amounts, but we also have the higher
level aggregation based off of that hierarchy
that I defined. If there was some
other data here, I could switch
these things around. Or what I can also do is I
could take the trading month, and maybe I don't want to
see it on the columnar level. I want to basically put
that up to the rows, and show it in kind of a
really more hierarchical level. So I could see now for
the month of October 2016, which is the first month
that I filtered by, I'm seeing that the
total value is $30,000. I'm not doing too bad. I don't know how to double
click on your laptop. And I can see every
single month over time. Cool. But again, what
does this look like? When I just switch this
over to a line chart, or maybe even an area chart,
and so we could actually see how this has come over time. So this is past six months. Maybe I just want to see it
through from the beginning, all the way back into 2014. Back in the day. This might be a little
bit easier to read. I'm looking at my portfolio,
and I'm doing pretty well. Maybe only 6x. Not as good as if I just
invest invested in Google from the beginning. But I'm seeing also
there's a little dip. And I'm wondering where
that's coming from. I could tell that somewhere in
February or April, February, that this dip has. Where is that coming from? Because it looks
pretty drastic here. But if you get into the
details, what we might see is a little bit of
a different story. A more detailed story. So I'm actually going
to take the portfolio and take the same exact
chart, and segment it by the portfolio of
every single trade. It goes in, creates
the query that says give me the current value
of every one of my stocks in every single one
of my portfolios for all the months
that I want to display. So what I'm actually seeing
here is that it's not such a massive drastic drop. What I'm actually seeing is
that the stocks in my health portfolio and even a
little bit of my stocks in my financial
portfolio has experienced a bit of a downtrend for
over the course of a year. And at the same time, I've
got a nice big increase in my tech stocks portfolio. So there was just kind of a
slow moving downward trend, and a nice sharp upward
trend for a couple of things. And this is the exact
type of questions that people have that
until now, they wouldn't be able to dynamically answer. And especially with data that
comes from multiple sources that they want to
combine into one. This is pretty
groundbreaking stuff. Could we switch over back
to the presentation mode? So you can try it
out if you want. Go for it. iCharts.net/Googlepivot. It's available now. Again, we use standard SQL. All that good optimization
is coming real soon. You could ask Carter about it. But really I want to
thank Carter and the team for creating this. It's incredible. I appreciate it. Thanks. CARTER PAGE: Thanks, George. It's a really cool tool. What's really neat is
he didn't do anything. He just pointed at the
BigQuery, and can drag and drop and get a lot of information on
the data that's already there. Where if I wanted to do
that with a MapReduce, I'll be programming
for about two hours to try and get the same results. So quick recap. So as of today you can analyze
Bigtable data using BigQuery. It is row range optimized
for the legacy SQL. The range optimization for
standard SQL is coming soon. That's why his queries were
taking 8 seconds and not 2 seconds. That was our fault, not his. In a couple of weeks
the standard SQL will also be as lightning
fast as the legacy SQL. And that in short you get the
best of both analytical data stores by using this connector. So thank you. There is a URL here that
you're welcome to go take a look at the documentation. It will show you how
to set up a connector and play around with it. And be interesting
to see what you find. It's in beta right now. So we're always eager
to hear your feedback. That is the end of
our presentation, but we have some time
for questions for anyone who's not running off
for beer right now. But if you are
running out for beer, I do not begrudge
you, because I would.