AMY KRISHNAMOHAN:
Thanks, everyone. My name is Amy Krishnamohan. I do product marketing
at Google Cloud. I'm here with really exciting
guest speakers today, Joe and Seph from Kolide
and the 6 River Systems. Today we're going to
talk a little bit more about how you can build
a SaaS application in the cloud-based
infrastructure. So SaaS has changed
a lot, right? When you had more on-prem,
hardware-based environment, having a single tenancy was
actually really expensive but is very secure. But cloud
infrastructure actually made a really drastic
change in terms of making these
instances available very easily at your hand. So SaaS is obviously making
leading changes in that area. So here I'll
introduce our guest. Joe, would you like
to take it from here? JOE HUGHES: Hey, I'm Joe Hughes. I'm director of devops
at 6 River Systems. Six River Systems is
a robotic company. When you get things shipped
to you from the internet, we have a big hand
in making sure that those things can get to
you in two days or faster. And I was previously
at a company called Amazon Robotics in that
space, and you probably know them as Kiva. They solve the same problem. So that's our robot. His name is Chuck. And essentially
Chuck leads a worker around a warehouse using kind
of self-driving technology, identifying objects
in the way and taking that person to fulfill
your order of toothbrushes or whatever you happened
to order off the internet. This is what those
warehouses look like. So this isn't your Home
Depot or your Target store. This is thousands and
thousands of products, million-square-foot warehouses. And so we have algorithms
in the cloud that helps us batch these
orders and get efficiencies for our customers and
make sure that Chuck or the mini Chucks that
are in these warehouses coordinate together and make
life better for our customers. So if you're not familiar
with warehouse robotics, traditionally what
you would be doing is carrying around
a big cart weighing upwards of 50 to 100 pounds. Now you can follow
Chuck around and have a more enjoyable workday. Basically, this is
just a quick look at our high-level architecture. And this is kind of what Amy was
alluding to is we have a Google Kubernetes Engine cluster-- Kubernetes is a
container scheduler-- where we create what
are called namespaces, which basically isolate
customers from one another. We actually run these namespaces
on dedicated nodes, which is like dedicated
compute, and we actually have a dedicated SQL
primary and a read replica for every single customer. So that means that
data from one warehouse never gets mixed with another. So that's a little
bit about what I do and what my company does. I'm going to hand
it over to Seph to kind of fill you in
on what Kolide does. JOSEPH SOKOL-MARGOLIS:
Hi, everyone. I'm Seph. I'm an SRE at Kolide,
which does sort of infrastructure analytics
and security analytics. Previously I've been at
Fastly, Twitter, ActBlue. I like infrastructure as
code, data simplification, stuff like that. Kolide builds an endpoint that
hangs out on devices, primarily laptops, and collects
analytics about that and sends it into
our cloud where we do some analysis on it. We launched Cloud, which is
our SaaS offering, about a year ago now. And in that, each customer has
an isolated namespace, very similar to what Joe is doing. And each namespace has its own
little stack of application, server, and
data-processing elements, and those back call to
dedicated Cloud SQL instances. So kind of did that
because it reduces the noisy-neighbor problems. It also shifts the
burden of data isolation from application developers
to infrastructure. We dedicate
infrastructure resources to ensure data segregation,
and then we can take it out of the application. We do share our edge ingress. So edge is shared, and
it then comes in and gets sort of pushed into the back. We use Kubernetes
quite extensively. We follow the operator pattern. So each customer has a
custom resource definition, and then an operator
goes and reads the CRD, spawns the namespace,
creates many deployments. Each one brings up things,
brings up Cloud SQL. We started running into issues
around the number of namespaces and things that can be in a
single Kubernetes cluster, so we have multiple
Kubernetes clusters now. We found that some
of the spin-up times were a little
slow, so we started precreating unallocated
customers and then, on sign up, would allocate them. We are very soon launching
a new product next week which moves away from the
single-tenant architecture and starts revisiting
multitenant designs. So that uses foreign keys
for customer data isolation, and this is enforced
at a data layer by making sure there are
effectively SQL WHERE clauses all over. So it's basically
our data library, and we could shard that by
customer to separate back ends. So I think that's all
I've got prepared. AMY KRISHNAMOHAN: Great. Thanks. So before we get
started, there is a Dory open in your mobile app. So if you have any questions,
feel free to submit a question. We'll take the
questions at the end. And so there are
many ways that we can define the multitenant
environment, right? Everybody defines it
in very different way. So how would you
define multitenant in the cloud environment? JOSEPH SOKOL-MARGOLIS: I guess
I have the mic, so I'll start. I think there is a whole lot
of ways to think about that. We run in Google's cloud,
so inherently Google's cloud is multitenant. Probably almost everyone in
this room is in Google's cloud. But we run on our own
Kubernetes cluster, but our Kubernetes cluster
is also multitenant because many of our customers
are in the same Kubernetes cluster, so our Kubernetes
cluster is single tenant. But each daemon or pod inside-- and Kubernetes workloads
are grouped by pod. So a pod is like a set
of linked containers. And each pod is single tenant. So only one customer
is in a pod. And our underlying database
is we use Cloud SQL. We have one Cloud SQL
instance per customer. So that's also a very
clear single tenant. So it kind of depends
on where you look at it. I would generally
describe our cloud product as being single tenant since
Google Cloud being multitenant isn't really very interesting,
but our application servers and our database servers are. JOE HUGHES: Yeah, I don't
really have much to add to that. I would say that a lot
of it also comes down to your business problem. And for us, when
we considered what we should build we
were like, well, we have long lead times
with our customers, and they may or may not
want us to upgrade them. And so what is this
architecture called? And it's called single
tenant because we have really tight control over
how our customer environment is managed. That works well
for us, but if you were a building a
social-media platform or something like
that, it's probably not the right
architecture for you. AMY KRISHNAMOHAN: So you're
defining single tenant is having a customer
per pod, and then also each single customer would have
a dedicated database instance. JOE HUGHES: Yeah. And we go a step
further than Seph where we actually have dedicated
Google Cloud Platform nodes that Kubernetes will only
schedule customer pods and customer workloads
onto those dedicated nodes. Where I know Seph,
as he spoke about, his is more multitenant where
a customer's pod can end up on any node, and
they may move around. Whereas ours, we try to
protect our customers by having dedicated compute. Kubernetes is so flexible
that it makes it easy for Seph to do his thing and for me to
do it a little bit different. AMY KRISHNAMOHAN: So when you
designed your architecture, what was your biggest
consideration? JOE HUGHES: For us, it's really
about customer stability. I think everyone here
would be really angry if the socks that
were guaranteed to come to you in two days
were there two weeks later. And if we caused that
because our system was down for some reason
for some amount time, then our customers
aren't going to trust us. So we have to really
guarantee that customer A-- you know, the
noisy-neighbor problem, we can't even let it enter
the equation for our system. JOSEPH SOKOL-MARGOLIS:
We had two major drivers for going single tenant
for our last product. One of them was to reduce
noisy neighbor a lot. So again, one customer
does a lot of stuff, basically starves the
resources of other people, so we started
splitting up resources. The other reason is just
around data isolation. Again, it makes it very easy
for an application developer just not even think about it. You can be like, oh,
there's no way I can select data for the wrong customer. It's just that app server
doesn't have access to that. So that was a big driver for us. AMY KRISHNAMOHAN: I see. So going back more
on the database side, have you guys ever tried
putting a multitenant in a single database, like
maybe separating them by the key or by table, things like that? JOSEPH SOKOL-MARGOLIS:
So we do actually. So I mentioned we're
launching a new product. Our new product has
a shared database. So each customer has their
foreign-key constraints. Every piece of data is
linked to a customer ID, and there are WHERE
classes all over. But again, it's just
this tradeoff of, are you doing the
data isolation work in your application
at the SQL level or at an infrastructure level? So you can go both ways. JOE HUGHES: So our
company, another factor of why we haven't looked into
that situation was just time. We knew that there
would be tradeoffs, and that it's more
developer time to do those sort of things. And so where we can
shift complexity away from our developers and
shift it onto Google, which is what we were
really doing, that made it an easy call for us to
just continue with that model. JOSEPH SOKOL-MARGOLIS:
Actually, as I think about this, one of the ongoing tradeoffs for
us around number of instances is just straight up
number of instances. When we had 10
instances, it was easy. But as the number of tenants we
have increases and increases, we're suddenly now managing-- or Google is managing many,
many, many Cloud SQL instances. And it mostly works great,
but as a single person managing them, it
gets much harder for me to track down problems. So the number of things
to pay attention to-- I don't know. AMY KRISHNAMOHAN:
That's interesting. So now that we
talk about instance so I'm sure when
your customers create their account in
the application, I'm sure they want kind of
instant creation, right? But creating the database
instance or node or the pod, it takes time. So how do you manage
that latency issue? JOSEPH SOKOL-MARGOLIS:
I'll start. So we have-- sign up
is available online. So people can just kind of
go, hey, I want to sign up. And we used to have a slightly
asynchronous sign-up flow. People would click Sign Up. We'd send them an email, and
then they'd click their email. And behind the scenes while
that email was being delivered, we would spin up a database. We'd spin up a tenant. And we had a couple
of issues with that. One of them is that we wanted
to close that loop even tighter. We didn't want people to
have to wait for their email. And the other one
is occasionally instance creation would
fail, and we never really built failure handling. So we solved that by
preallocating things. So we had a little daemon that
runs in the background that says make sure there are roughly
30 free tenants at any given time. So we just sort
of have headroom. And then when someone
signs up it says, hey, give me the next
free one, gets a free. Then it basically says, this
now belongs to this company, and then it's used. And then a little while later
the daemon goes, oh hey, look, I need to create a new free one. So we've sort of
tackled that loop by having a precreated model. AMY KRISHNAMOHAN: How do
you come up with that 30 unallocated instances? JOSEPH SOKOL-MARGOLIS:
Just kind of made it up. AMY KRISHNAMOHAN:
It's a random number. JOSEPH SOKOL-MARGOLIS:
I looked at what do we get from a day of sign ups? What's our peak? How much headroom do we
want to have lying around? I picked a number. We used to adjust it more
often, but now it's just-- AMY KRISHNAMOHAN: No AI and
machine learning predicted? JOSEPH SOKOL-MARGOLIS:
No, not on that one. JOE HUGHES: For us,
it's a little bit easier because you can't just go
online and sign up for Chuck. You have to engage
with our team, and there's a long
integration period where you have to integrate
with customers' systems. So a 10-minute creation
time is lightning fast for our industry, and we're
really happy with that. AMY KRISHNAMOHAN:
So it seems like you both guys are using the
Kubernetes right now. So how do use Kubernetes
in your SaaS application? JOE HUGHES: You saw a little
bit of that in my slide. I think Kubernetes is
really great because of its flexibility where
you can isolate things by namespace, Istio. We have a lot of customers
that are very concerned about security. There's potentially
personal information in some of the things
that we handle. And so that isolation with
namespaces and encryption by default, which the Istio
service mesh kind of gives you, are all really
powerful tools for us that we just get by
using the platform. So we really reach for
all of those things to make it easier to manage. As Seph kind of
alluded to, we've grown our customer
base quite a bit, and so there's a nontrivial
amount of SQL instances to manage, and there's
a nontrivial amount of namespaces and pods. But that's where Stackdriver
and a lot of the other products that Google offers really
come in to help you. But yeah, I would say that
the continuous release of new features in
Kubernetes has made it a really great thing for us. JOSEPH SOKOL-MARGOLIS:
Yeah, we love Kubernetes. It's great. I sort of think of it as it's
a nice abstraction for defining workloads. So we define workloads, and
then Kubernetes just kind of makes them happen. And we heavily use namespaces
as our isolation model. We also have really done a
lot in the Kubernetes operator model, which has a thing,
CoreOS, declared a while ago. And that means we
create custom resources. Resources are like an internal
Kubernetes thing, pile of YAML. And the resource basically
says this is a customer. They should get
this many resources. They're big, medium,
small, and make them go. And then we have
an operator that comes along, reads
all the resources, and creates the namespace,
creates all the pods, creates the Cloud SQL instance. And so yeah,
Kubernetes is great, and I really like
the operator pattern. It lets me take my knowledge
of how to create and manage these things and
translate it into code that just runs and synchronizes
itself and kind of just goes in the background and
fixes, fixes, fixes, fixes. AMY KRISHNAMOHAN: And
the other tools you're using together with
Kubernetes, Terraform? JOE HUGHES: Yeah, so as you
might have guessed from Amy, we use Terraform. We're actually on the
journey to where Seph already is where the operator pattern,
we've watched it closely and kind of seen people
apply it successfully. Again, we're in
an industry where we have to be a little
bit more cautious. So we have to kind of
watch things and make sure that they mature in the way
that we really think that we can make a long-term bet on them. Kubernetes is a no-brainer. The operating pattern, it
is a pattern, and so for us we had to see people
successfully apply it. And there's a lot of great
examples out there now and a lot of great stories
from people like Seph who have applied
it, so that's where we're going to be going next. AMY KRISHNAMOHAN: So Joe, you
touched about the securities. In the multitenant environment,
how do you isolate tenant to avoid any security concern? Is there any specific
thing that you can share? JOE HUGHES: Yeah, I
think just by the nature of having the single tenancy
you alleviate almost all of that concern. There are certain
customers where they want that
data to be on-prem, and that's a whole another
thing that you have to address. But in a multitenant
application, again, I think you really have
to look at your business case. I've worked for companies that
were under HIPAA compliance. And so in a
multitenant application that has to have
HIPAA compliance, you have to have
row-level encryption. And then you have to
reach for some tools like Vault, which is a
HashiCorp product that allows you to have encryption
as a service or other things like that. Because it's not just
enough that you can-- if someone was to access
it, they can't even be able to decrypt it. And so business case
is what you really need to think about in a
lot of these situations. JOSEPH SOKOL-MARGOLIS:
Yeah, I'm not sure I have much to add
on the single-tenant side. It is a great answer for
very strong data isolation. That said, if you
go multitenant, you're basically
pushing that burden to the application layer. In our multitenant world, we
do it via the database adaptor just enforces that, oh,
this is a call for data? It had better have
an appropriate WHERE tenet equals x clause on it. And frequently to
handle the needs of really sensitive
customers, you end up sharding by customer. So you end up in a
bit of a hybrid where most things are multitenant,
but this one is special. AMY KRISHNAMOHAN: How about
the noisy-neighbor problem? How do you do the--
perform the isolation? JOSEPH SOKOL-MARGOLIS:
Well, single tenant makes that pretty easy. In Kubernetes, you still have
to think about it a little bit. So our pods are scheduled
onto the same nodes. So we have to make sure
that our pod resource allocations are correct. So each of our pods is allocated
so much CPU and so much RAM, and then Kubernetes
will just enforce that. And that's how we approach
the noisy-neighbor problem. JOE HUGHES: Yeah, I kind
of gave it away earlier, but we actually had a-- we were doing what Seph
did, and a bug in our code essentially was causing
cascading failures where we were bringing
down an entire node, and there was multiple
customers on that node. And so our tech-support team
had a lot of angry phone calls all at the same time. So I ended up pulling over
to the side of the road and doing a lot of
work on my laptop on the side of the highway. So after that we were
like, OK, in the future when we have more resources
and our company is more mature and we can really get a lot
of great people like Seph and other people like
that, we can attack this. But for now it's easier,
again, to push that complexity to Google, say
we're going to have node pools for every
single customer, and we're going to set up the
Kubernetes rules to make sure that no two customers can
end up on the same node so if something does happen
and we roll out a bug like that again, it's going to
be completely isolated. That is not a long-term
solution, probably, but for now it's something
that we can live with, and it's helped us continue
to build our business. JOSEPH SOKOL-MARGOLIS: I've
actually sort of got a-- we run into a certain
problem around this which maybe is funny if people
want to hear about Kubernetes problems. AMY KRISHNAMOHAN: No. JOSEPH SOKOL-MARGOLIS:
No, it's great. What we've seen is Kubernetes
itself can be a little weird. It's not quite a noisy neighbor,
but it's almost similar. The underlying Kubernetes
resource requirements per node are not small. So they're actually substantial. Each node has a
given load, so you have to be cognizant of that. And the other thing we've run
into-- which, I don't know, I giggle about it, but it's
been a bit of a pain point-- is some operations we do, if
we go to like upgrade every one right, we're basically
running a redeploy on every one of our tenants. And what we found is that
the Kubernetes master can get overwhelmed with
that very easily, and then everything breaks. So it's very bad when your
Kubernetes master breaks. So to sort of approach
that scaling and kind of noisy neighbor, we basically
just rate limit what we do. So we found that we have to
be very deliberate in how we roll out upgrades because
otherwise the Kubernetes master just tries to
do everything at once, and then it melts. AMY KRISHNAMOHAN: So how many
pods do you put per node? JOSEPH SOKOL-MARGOLIS: This is
another problem we've run into. Running in Google,
Kubernetes has a limit of 110 pods per
node, which most people are OK with because you
normally run about 30, and this is kind
of a pod resource to node capacity question. A lot of our services
are very small, and we have a large number of
tenants, so we pack them a bit. But what I've found is
when we exceed about 50, things start going wonky. It's just there's a lot of
auto scaling in the Kubernetes system, and a lot of the scaling
around Kubernetes metrics collection and logs collection
is based on the idea that you will only have
about 30 pods per node. So as you start pushing
that higher and higher, you run into these points
where things you didn't expect could break would break. AMY KRISHNAMOHAN: I see. So you recommend about 30? JOSEPH SOKOL-MARGOLIS: Yes. I would recommend not exceeding
30 to 50 pods per node. AMY KRISHNAMOHAN: So do you
have any scaling experience? Like maybe in Black Friday, I'm
sure you guys are very busy. JOE HUGHES: Yeah, so as all of
you would probably recognize, our customers can have
something like a 20x peak where everyone wants to order
something on Black Friday, and they want it the next day. And so for us, that's
where Kubernetes really shines for us also is,
OK, we all of a sudden got 20 times the orders
that our system normally needs to ingest. And so being able to both scale
vertically and horizontally inside Kubernetes gives us
the flexibility to kind of not have to worry
about that as much. We obviously take a
lot of precautions and we try to be ready for
that, but as a technology, it's really enabled us to,
again, do more with less. And that's really what I
think a startup is all about is you need to keep your
most talented people focused on the product and not focused
on, oh no, we need more nodes. Click the button more. And that's where
Kubernetes comes in. JOSEPH SOKOL-MARGOLIS:
Maybe sort of thinking a bit about
that-- though this might be a future question. I don't know. Sorry, Amy. AMY KRISHNAMOHAN: It's OK. JOSEPH SOKOL-MARGOLIS:
One of the things that helps us a lot
in handling sort of momentary bursts in traffic
is we have a predominantly asynchronous pipeline. We're trying to reason
about things on a minute to couple-of-minute timeline. So if a big burst
comes in, it's kind of just going to hang out in a
queue until we can process it. So as long as our
queues have capacity, it's OK if our processing
is a little bit lagging. And I can't speak for Joe, but
I bet there's something similar. JOE HUGHES: We're getting there. You think about logistics
and these companies, they're often very old. And so we're integrating
with systems that expect transactions, for example. So we don't care that we gave
you 2 million things to do. We're just going to sit here
and wait for you to tell us if you did 2 million things. So a lot of our
infrastructure needs to just really be able to absorb that. We do some batching
and everything else, but it's a problem that
is difficult to solve. And the infrastructure can't
always solve that problem, but good design where you build
queues and everything else can help. AMY KRISHNAMOHAN: Now let's
move on to more customizability. So you guys have a
single-tenant environment, so how similar to each
tenant the characteristic, and if the one tenant needs a
very different characteristic, how do you customize
those things? JOE HUGHES: Yeah, that's
a really good question. For us, it's kind of around
the size of our system. So customer A has 8 Chucks
and customer B has 50. We put those numbers in,
and out the other side pops, oh, well they should get
this size of database, or they should get
this size of node pool or this amount of resources
because we exactly how a Chuck's going to behave. It's a little bit
more difficult if you have users or other things
that you can't really predict. But luckily, we kind of control
both sides of the equation there. So it can be really
prescriptive for us. JOSEPH SOKOL-MARGOLIS:
We probably have about maybe five
variables per customer. I don't remember offhand,
but it's about five. And we implement that by
environmental variables. So our application
says, hey, what's the value of this
environmental variable, and it behaves a
little differently. And then we just control those
in our Kubernetes operator pattern. Our custom resource
definition has some settings. The operator pushes those
in as deployment variables, and then it all just kind
of follows from there. AMY KRISHNAMOHAN: That's great. So now let's move on to
cross-tenant environment. So when you need
to do analytics in the cross-tenant environment,
how do you do that? Are you using any data
warehousing environment? JOSEPH SOKOL-MARGOLIS:
We don't have that need, so we haven't done it. AMY KRISHNAMOHAN: None. JOE HUGHES: We do,
as you would imagine, really want to know about
all of our customers, and so it is an
interesting problem for us. And we rely on BigQuery a lot. And so all of our robots
are sending information through Cloud IoT. That gets pumped into BigQuery. And then we also have
some data pipelines that go and go to all of our
SQL instances and scrape data, shove it into BigQuery. So we have one place where
we can look at all of that. And we're lucky enough to be
involved with kind of an alpha where Cloud SQL is able to be
queried by BigQuery directly. And so we'll be happy
to kind of get rid of that scraping situation. But it's really important. That is something
that is a good thing to reason about is if you're
going to have that need. Seph doesn't, but
we definitely do because we need to
find efficiencies across all of our
tenants, and so that's really important for us. AMY KRISHNAMOHAN: I see. And how do you forecast your
demand, like capacity planning? I know you talked
about the 30 instances, that you keep it in
your back pocket. JOSEPH SOKOL-MARGOLIS: I
mean, our capacity planning is relatively simple. We sort of look at what
our marketing is like and are we going to a
conference, do we expect sign ups, that kind of thing. And then we just kind of say,
eh, about this, and we do it. And overestimate
it a little bit. Don't think too hard. We're still a very
small company. We don't have this. I guess the other sort of
implicit piece of that is we don't have to buy hardware. We're just saying, hey, Google,
instead of 30, give me 32. It's no different to me. It's a minute either way. JOE HUGHES: Yeah,
I think we also have a pretty nice luxury
where we kind of know. Our customers even
tell us how many orders they expect on
their Black Friday. And so we prepare for that by
doing a lot of load testing, and that really
helps us understand, uh oh, this needs to
auto scale this way, or we need to prepare
our environment this way. So a little easier
for us because we're dealing with large
enterprise businesses. AMY KRISHNAMOHAN: Do you
guys do any stress testing in your environment? JOE HUGHES: Yeah, actually
one of the cool things that we do in Kubernetes-- as
you probably can imagine, if we need to test
a hundred robots, we can't really just go with a
hundred robots to a warehouse and deploy them. That's kind of a lot of money. So we actually
run our simulator. So we run the same
code that runs in the robots in Kubernetes. We scale it up to
silly things that wouldn't be practical
in the real world, and we are then
testing our system the same way that it
will be actually stress tested in the real world. So it's another place
where Kubernetes is super useful for
us and another reason why it's a really great
platform in general. JOSEPH SOKOL-MARGOLIS: Yeah,
we do something pretty similar actually. I mean, no robots for us, but
we run an endpoint on agents, and I can just build the
endpoint into containers and spin a hundred,
a thousand of them up in a Kubernetes cluster and
just point them somewhere. So we basically
do the same thing. And it's a really easy
way to just sort of have a really elastic-- oh, you know, cool. Let's point to
hundreds of these then. Well, do we have
enough capacity there, which gives us a good
understanding of the number of endpoints that can
correspond with what size pods and how many pods. What does our horizontal
scaling look like? AMY KRISHNAMOHAN: So
now actually move back to database question. So database is stable. So how do you use
together Kubernetes? Are you putting database
in the Kubernetes pod? JOSEPH SOKOL-MARGOLIS: So
we don't run our databases inside Kubernetes. We use Cloud SQL. It's great. It just runs. We don't think
very much about it. The way Kubernetes
talks to Cloud SQL, it's got a couple
of intermediaries. So we use the Cloud
SQL Proxy which is designed to
proxy between things like Kubernetes and Cloud SQL. Google distributes it. You just kind of tell it to go. And we run Postgres,
which means we also need some additional
connection pooling. So each of our customer pods-- excuse me, each of our
customer namespaces has Cloud SQL proxy instances
and PG balancer instances as like an additional
connection pooling layer there. JOE HUGHES: We actually started
running databases inside of containers, again
being more cautious. Postgres for Cloud SQL's in
beta, and we were like, well, we'll just wait. And so we waited. In that time, we
had decent success. Kubernetes is a pretty
stable platform. Using persistent
volume claims, which allows you to have a disk
that follows your pod around, we had minimal problems. But certainly the
effort was not zero. I spent a lot of
time tuning that. When we had the need
for a read replica, that was a couple weeks out of
my time to make that a thing. And since moving away
from that model-- and then there was
also the problems. And so the problems are like,
uh oh, alert just went off. I'm running out of disk space. Got to go in there and
get more disk space. That's not a problem
on Cloud SQL. Just that auto scales
right up for you. And so that was
actually the main thing that I was running away from. When that finally came out of
beta, I was like OK, great. No more disk-space problems,
better alerting, and looking forward to the day where also
auto scaling on Cloud SQL takes that out of the
equation for me as well. AMY KRISHNAMOHAN:
So whenever you have to create an
instance in the Cloud SQL, are you using Kubernetes
or Terraform to automate that whole process? JOE HUGHES: Yeah, so our whole
thing is driven by Terraform. I've been using Terraform
for quite a few years. It's a great product. And what it allows
me to do is create what's called a
Terraform module, which is much like what Seph was
alluding to for an operator, but that module represents
what is a customer, and that customer is a
Cloud SQL instance, a DNS entry, a namespace, and
all this other stuff. And then there's this
tool called Atlantis which allows you to
manage all of these things through GitHub pull requests. And so essentially an
upgrade in our world is a GitHub pull request,
and then Atlantis runs Terraform and makes
all those things happen. So it's really elegant
for us, but there's always room for improvement
in that sort of thing. JOSEPH SOKOL-MARGOLIS:
Yeah, for us it's all driven by
our operator, which is a daemon that we wrote. It runs in Kubernetes, but
it isn't part of Kubernetes. It could run anywhere we put it. We just run everything in
Kubernetes, so that too. And it goes, it calls
the API for Cloud SQL, creates it, waits
for it to show up, makes sure it does
what it wants, and then it creates
everything else. AMY KRISHNAMOHAN: All right. So I think we are almost
at the time for Q&A, but I want to ask
the last question. How many DBAs do you
have in your team? JOSEPH SOKOL-MARGOLIS:
Just me, really. JOE HUGHES: I would
say zero only because I don't consider myself a DBA. AMY KRISHNAMOHAN: That's great. JOE HUGHES: I am the guy
that's on call, I guess. AMY KRISHNAMOHAN: OK. Awesome. Thank you. [MUSIC PLAYING]