[LOGO MUSIC PLAYING] JAMES MALONE: Good
afternoon, everyone. Welcome. I'm James Malone. I'm a product manager
with Google Cloud. And I'm joined today-- FENG LU: Hi. Good afternoon. I'm Feng Lu. I'm a SWE with Google. And I've been working on
Composer since its inception. JAMES MALONE: And
today we're going to cover Google Cloud Composer. First, before I begin, Cloud
Composer recently went GA. And there was a lot of work
from Googlers to make it happen. So to all of the
Googlers, thank you. But far and above,
beyond that, Composer is a work of love of the
open source community. And we wanted to take a
second before we begin today to say thank you to
everyone who has participated in the development of
Apache Airflow, who's given input, who has
written code, who's used it. Truly, thank you, and
we're excited to be a part of the Airflow
community now with Composer. The first thing that
we want to do today is give an overview of Composer. So Composer's a new service. It's built on Apache Airflow and
we'll also talk about Airflow. So we want to just level
set and talk about Composer, why we did it, what it does. After that, we're
going to go in depth and talk about Apache Airflow. We're going to look at a DAG. We're going to run through
a demo of Cloud Composer. And then, next steps
on how you can get involved with both Airflow
and also Cloud Composer. So before Composer,
there were a few ways to create and schedule
and manage workflows on Google Cloud Platform. And to be totally honest,
they were not the best at all. So on one end, you
had very low cost, very easy, but pretty inflexible
and not so powerful solutions. Mainly, just putting
something in a crontab, scheduling it, and
letting it run. On the more complicated
end, you had customers that were developing
really complicated frameworks to schedule, orchestrate,
and manage things on Google Cloud Platform. Really powerful, but also took
a team of engineers to do. In our opinion, none of
this is an ideal solution. Because people are focusing on
things that are not really what they're trying to do. They're developing
orchestration engines. They're developing
description languages. They're not focusing on
what they set out to do, which was just
having a workflow run and monitoring that
workflow from time to time. So we thought it should be easy
to create, manage, schedule, and monitor workflows across
Google Cloud Platform. I call out all of those
steps individually, because they're all really
important parts when you think about the
lifecycle of a workflow. It's not about just scheduling
something and letting it run. It's not about just
monitoring something. I mean, we really wanted to
think about the whole process holistically together
and find something that would allow somebody
to create this workflow, to schedule it to run, to
look at how it's running, and then manage that workflow
based on what's happening. So we came up with
Cloud Composer. Cloud Composer, if you
didn't catch last week, because we didn't really
make a lot of noise because Next was this
week, Cloud Composer just recently went GA. So it's generally available
in a couple of regions inside of Cloud Platform. And it's based on
Apache Airflow. So the best summary
of Cloud Composer is it's managed Apache Airflow. So there's a few
things that we really wanted to tackle with Composer. First, we wanted an end
to end GCP-wide solution to orchestration and workflows. Second, it was really
important to us that Composer works both inside
of Cloud Platform and outside. To be totally blunt, if
we developed something that just worked
inside of Google Cloud, it really would be
missing the mark. Because hybrid cloud,
multi-cloud, and just not locking people in are
all a fact of life. So if we developed something
that was proprietary, we thought, from the
outset, we would be failing. We wanted it to be really easy. We don't want a really
complicated workflow system. Because then people
are just wrangling with the infrastructure,
the description language, and that just sucks. It's not a great use of time. It was also really
important to us that it's open source,
for a few reasons. Again, we don't want
you to be locked in. We want people to be able
to look under the hood, see what's happening. We also wanted people to be
able to contribute and to be part of a larger community. A common question we've
gotten since we launched Airflow into alpha and
then beta and GAs, what's the difference between Composer
as a managed Apache Airflow service and just running
Airflow on my own on VM or a set of VMs? So there's a few things
that we tried to tackle. First, we wanted a seamless
and integrated experience with Cloud Platform. That means that Composer
is available inside of our command line tooling. There's a Google API. It's not just a
standalone thing that feels like a separate product. Second, we wanted security. With Cloud Composer,
you have IAM. You have audit logging. It's not just a
separate product. It acts and fees, from a
security and auditability perspective, the same as
any other Cloud product. We also wanted it to be easy
to use when things weren't going quite the way you expect. So when you're developing with
workflows or running workflows, some pieces of that workflow may
not always run exactly the way that you want. So Stackdriver integration
was very important to us. And it's a first class
citizen with Composer. We also wanted to make it really
easy to manage your Airflow deployment. So that doesn't just mean
creating and deleting your Airflow deployment. It means doing things like
setting environment variables, or installing and
maintaining Python packages. Things that you could
conceivably do on your own, but aren't a value add when
you're doing it on your own. Because we've built
on Apache Airflow, there's a core set of support
for several products inside of Google Cloud Platform. So for example, there's
data flow operators. There's BigQuery operators. There's Cloud storage operators. And the support for Google
Cloud Platform products is expanding with
each Airflow release. And that's a core part of what
the Airflow team is doing, either inside of the
Airflow team themselves, or with other teams
inside of Google. It's expanding the breadth
and depth of support for Cloud Platform
products inside of Airflow. Really importantly, Airflow
supports a whole host of things outside of GCP. So it supports services. You can go talk to things
like Slack or JIRA. It supports technologies. You can go call REST APIs. You can go hit FTPs. All of the support
for non-Google things is absolutely usable
within Airflow and also Cloud Composer. And we didn't want
to break or limit what you can do outside
of GCP with Composer. So all of the cool
things that you can do with Airflow outside
of GCP are very usable. Since our GA happened late
last week, not a lot of people may have noticed. So we wanted to call out
a few things that just launched last week with our GA. We launched support for a couple
of new regions inside of GCP. There's expanded
Stackdriver logging, which you will see today in the demo. There is expanded
Cloud IAM rules. And we took a
bunch of fixes that will appear in future versions
of Airflow or new additions, and we backported them to
the Cloud Composer release. We try not, really, to modify
the Cloud Composer Airflow version too much from the
mainline version of Airflow itself, but once
in a while, there are fixes, tweaks,
additions that we backport. Our general philosophy
is unless there's a JIRA associated with it, we
won't inject it into Composer. Because again, we don't want
to make Composer a black box. It's just not a
value add, and it's an alternative form of lock in. Since not a lot of people
may be familiar with Airflow or may have just
heard of Airflow, we wanted to quickly
cover Airflow, some of the core concepts,
just to establish a baseline. We're really excited
about Airflow. We love Airflow. And we want other people to
love Airflow and get excited about Airflow. If you are totally
unfamiliar with Airflow, it's an open source project. It's incubating in the
Apache Software Foundation. It's been around
for a few years. I think it's fair to say
it's become one of, if not the leading open source
package to create and schedule workflows. Airflow is really interesting,
because all of your workflows are code. They're Python code. So they're highly approachable. You can do a lot of
different things. And you'll see part of
that in the demo today. You can do things like
programmatically generate workflows, which is really cool. One of the questions we
faced, especially initially when we made a bet on
Airflow about a year ago was, why Airflow? What we really wanted
to do with Composer is tie the strengths of Airflow
as an open source package and open source community with
the strengths of Google Cloud Platform. So Google Cloud
Platform, we were very good at running
an infrastructure, creating services, adding
layers of security, auditability, maintaining costs. But as I mentioned,
we didn't have a workflow and
orchestration solution. Airflow did have a really
strong workflow orchestration solution. It had a whole
bunch of connectors for services inside
of and outside of Google Cloud Platform. It already had a description
language, defined set of APIs. So we wanted to join
the two as a union inside of Cloud Composer. As we've developed
in Cloud Composer, we've also started contributing
back to the Airflow community. The KubernetesExecutor
is a good example-- and I'll talk about it a
little bit later in the deck-- of something that is really
interesting to us, something that we're very interested in. There's a few key
concepts in Airflow you may want to be
aware of if you've never used Airflow in depth before. First, all your
workflows are graphs. So workflows are
a series of tasks and the tasks can
fit into a graph. And that can be a
very simple graph. It can be just one node,
which is a bash script that just says, tell the time. It can be a really
complicated graph, which we'll show you
an example of our build system for Airflow,
which is a good example of a complicated graph. And a graph has a
series of tasks. Tasks are essentially steps,
something that happens. Maybe you're
running a SQL query, running a BigQuery query. Airflow, in those tasks,
has operators and sensors. And operator is essentially
something that tells something to do something. So a good example would be
a BigQuery query operator tells BigQuery to
go run a query. There's also the concept
of a sensor in Airflow, which is essentially a binary
weight until something happens. When it's true, it precedes. Airflow itself also has a
lot of really interesting deep functionality that
we evangelize people use inside of Composer. It's also another
reason why we thought that Airflow is a great bet. You can do things like
define connections and have your workflows
use certain connections. You can set SLAs
on your workflows and see what's
meeting SLA and not. You can do things like pass
information between tasks. There's a lot of really
interesting and advanced functionality inside of Airflow. Generally, we get a lot of
questions of can Airflow do x, y, or z. Often the answer is yes. Airflow actually
can do x, y, and z. And with that, I'm
going to turn it over for a look at Airflow in depth. FENG LU: Thank you, James. So with that being said,
we do want to-- next, we're going to review some
of the product details, the way how we construct a
product and some of the design decisions that we have made, so
you know the capabilities where Composer is. So in Cloud Composer. we introduced this new
concept called environment. Really is very similar to a
Kubernetes cluster or data cloud cluster. Essentially, it means a
collection of managed GCP resources that gives you
the functionality needed to run Apache Airflow. Inside a single GCB project, you
could create multiple composing environments. And all the environments
are integrated was Google Cloud Storage,
[INAUDIBLE] stack travelogues, as well as Cloud [INAUDIBLE]. So the way to interact with the
product, the following-- you could use Cloud SDK. You could use [? Pantheon. ?]
You could use REST API. Functionality wise, all
three masters are equivalent. However, I do want to point
out one difference in Cloud SDK to make it convenient
for the Composer users and so that you don't have to
manage two set of command line tools. There's Composer
command line tools. But at the same time, that's
also like Airflow command line tools. So what we did in the
product is we sort of, like, a tunnel through Airflow
commands through the Composer [INAUDIBLE] Cloud commands. So that in a single place, you
can manage both your Composer environment at the same
time you can interact with the Airflow environment. Earlier I mentioned that a
composing environment is really a collection of GCP resources. So here, I'm going to give
you like a zoom one level in, and then explain
how and why we decided to use the following GCP
resources to construct the Composer service. At a very high level, you
notice there are two projects. One is called a
customer project. The other one is called
a and tenant project. So customer project are probably
the ones you're familiar with. Interact with GCP by
creating a GCP project. That's the customer project. And tenant project
is a concept new. Really, it's the same
as ordinary GCP project. It's just the case that
this tenant project is managed and owned by Google. As we walk through the
detailed architecture, we explain why we decided
to make this design decision that a portion of the
resource actually lives inside the tenant project. Airflow itself, you know, if you
look at the way how it's been constructed, it has a number
of microservice--like flavor. You have an Airflow web server. You have a Airflow apps
scheduler, Airflow metadata database. Those components just
naturally match in a map to the wide range of
GCP services we offered. So for example, I started
with Airflow scheduler and Airflow worker. If you look inside
the Kubernetes class, we decided to host both the
worker and the scheduler inside Kubernetes cluster. The reason we do
that is so it allows you to conveniently package
your workflow application dependencies. Because essentially,
all your tasks can run inside the containers. The worker and the schedule and
the communicator [INAUDIBLE] through the [INAUDIBLE]
executor set up. And then moving next, we
have this Cloud SQL proxy. And then, we decide to
host the Airflow metadata database inside
a tenant project, so that only the
service account you to create a composing
environment has access to an metadata database. It's really for
enhancing the security, as we believe that a Cloud
SQL or the alpha database house all the valuable metadata
information regarding workflow. Think about if you have
connection credentials stored in a database. You obviously don't want
anyone in your project be able to access that
credential information. Walk down the right side. We have the Airflow
web server interacts with the database surface,
all the work flow information. And now, we decided to host
the web server inside GAE so that it's partly accessible. You don't need to have
the clumsy proxy set up to be able to access
the web server. But we do realize
at the same time, you don't want to
make your web server open to anyone on the internet. So as a result, we collaborated
with a another service in Google Cloud-- it's called
identity-aware proxy-- so that only authorized
users would be able to access the web server. Later on in the time
of session, we'll be able to give you a way to
have a sense and a feel of how that works. We also make it extremely easy
for you to configure access to web server. It's exactly the same way as how
you would configure IAM policy. Moving left, we have GCS. We use GCS as a
convenience store for you to stage your decks. Deploying a new
workflow to Composer is as simple as dropping
a file into a GCS bucket. We understand that
there's need, sometimes you need to stage your
workflow artifacts. And we also make it a
managed service for you so that your artifacts is nicely
staged in the GCS bucket, which later allows you to sort of
retrieve those artifacts back out. Finally, everything's
being-- all the interactions, all the logs are being
streamed to Stackdriver logs. Airflow has a very-- the UI itself comes
together with task logs. But you can't really
find out what's happening if, for example,
there's a Airflow worker crash, or there's an Airflow
exception in itself, the workflow scheduler
runs into exception. At that point, you sort
of lose the visibility on what's happening. That's why we decided that
it makes a lot of sense to offer this additional
Stackdriver logging capability. With the architecture--
and now I'm going to explain a little
bit about workflows. Because not everyone is sort
of familiar with workflow and not the way how
Airflow express workflows. So at a very high
level, a workflow consists of a
collection of tasks and their interdependency
relationships. So in this case, for example,
you have some following HDFS. And then, whenever you have
new [INAUDIBLE] addition to HDFS, your
workflow-- you may want to kick off your workflow
such that you will copy the file from HDFS to GCS. Once it's in GCS, or
Google Cloud Storage, maybe you want to trigger
a BigQuery operator that does something, load the data
in, and then subsequently maybe run some query, and make
the result available maybe while a [INAUDIBLE]
notification. So a few things you probably
have noticed in this example workflow description,
there are tasks associated with workflows. For example, you want
to run a BigQuery job. There are also dependencies. You want to wait until the
data is available in GCS and then you start
your BigQuery job. There's also a component
of triggering something. You know, I have a file
that all of a sudden now appears in my on-prem HDFS. Then that disturbs my workflow. So those are some of the
elements or building blocks in Airflow workflows. I'm going to start with
actually give everyone an example how you can build
a very simple workflow that consists of two or three tasks. So really, to Airflow,
unlike other sort of orchestration
workflow solutions, they express workflow as code. So instead of having this
giant configuration file expressing your workflows,
you write your workflow as Python code. Roughly speaking, there
are about five steps for you to define a
workflow in Airflow. The first thing
is, like, import. Form the Airflow system. You import the DAG,
which is the acronym for direct [INAUDIBLE]
graph, which is really another way of saying workflow. And then you try to
import all the operators that you're going to
use in a workflow, in this case, a
BigQuery operator. And then also
trigger rules, which allows you to specify
[? intertask ?] relationship. Once you have all
import statements ready, the next step is for you
to define all the arguments to your task or
to your workflow. You probably noticed that
there's a default DAG args. This is really a
very convenient thing that's provided in Airflow. So that if you do feel that
you have a common arguments to a number of
your tasks, instead of specifying those arguments
at each and every single Airflow task, you can specify them at
the beginning of your workflow, and then just pass
that in automatically by the Airflow DAG model. Imagine if you are going to
use a configuration, that's probably harder. Because you probably
have to copy and paste a lot of duplicated
lines of code. Once you have all the
workflow data specified, now you're going to
define your workflow. There you give a name, you
give a schedule interval, are you passing a
default args assignation. Now within the DAG, you
start to define your tasks. In this case, we have two tasks. You have BigQuery operator task. You also have a BigQuery
to GCS operator. As James mentioned
earlier, Airflow has a wide range of
supported GCP operators. Both BigQuery operator as well
as the BigQuery Cloud Storage operator. They are all
available in Airflow. Once you specify all the
tasks and define all the task arguments, the next
step is actually for you to chain them up
by giving some dependency. In this specific
example, the line simply says, hey, I want to run
bq_airflow_commits_query first. After that, I run the export
to GCS task, as simple as that. Now the reason why we decided
to go with this route in Airflow and Composer is really there
are a lot of nice things that you can do once
DAG or workflow can be specified as programs. It gives you vision control. It allows you to
dynamically generate DAGS. It also give you the choice
of support templates, so that you have a
[INAUDIBLE] of your workflow and then be able to dynamically
instantiate your workflows. Like I said, the first thing
with DAG was, in Airflow DAGs, the language choice
supports Jinja templating. So you can specify
a template command. And every single run, you would
be able to reconfig that task. Likewise, think about
your need to generate 1,000 tasks, or 10,000
task of the same type. It's probably
fairly tedious to be able to do that in a
configuration language. So with DAG as code, it's just
like a simple two line of code will give you 1,000 tasks. The third thing is Airflow
sort of naturally match the programming or the software
model or software development process, where you have
modules and submodules. So here in Airflow, you have
workflows and sub workflows. They're called DAGs and subDAGs. So in the example
on the left side, you could have many
tasks, or if you want to realize that you want
to create a reusable DAGs, what you can do is just package
those DAGs into a subDAG and include a subDAG,
in the [INAUDIBLE] DAG. So with all that, I'm going
to give a composite demo. So in that demo, we're
going to show you how to interact with a product. And also, for example, we just
cover how does that look like? How do you query
a workflow status? How would you trigger
workflow execution from a external system? All right. Now we're going to
switch to the demo. JAMES MALONE: I'll turn
on the screen mirroring. And then switch
to the demo here. This is also known as the
game of how quickly do we know Chrome OS, all right. Screen mirroring is on. So hopefully, we can
switch to the demo now. And wait. There we go. Chrome to the rescue. FENG LU: As you can probably
tell, I'm not a Chrome user. [LAUGHTER] All right. JAMES MALONE: Sorry about that. OK. There we go. Cool. FENG LU: So we have-- let's go back. We have a very simple
and a niche Cloud Console interface for you to interact
with the environment. So to create a
composing environment, that's where you need
to do, specify name-- Google Next. Specify a number
of nodes you want, like three nodes or
10 nodes, how many number nodes that you want. Which location you
want to deploy-- that's an [INAUDIBLE] one. You have the option to
define a machine type. If you do feel that
you're going to have some CPU-intensive
workflows, you can try to config your composing
environment with some more powerful machines. And likewise, you can also
specify network and subnet. This is for the case when you
need to have a shared VPC, or even like reuse
some network we have defined in your project. There's one thing
I want to call out in the configuration is that
we do allow you to provide your own service account. You don't have to rely on the
default computer engine service account. Instead, you can supply any
service account you gave. This allows you to
sort of restrict the possible set of services
that your Composer environment can possibly interact with. We do allow you to have Airflow
configuration overrides. For example, if you want
to increase your DAG load timeout value from 100
seconds to 200 seconds. This way you can specify some
of the Airflow configuration overrides. Once you have all the parameters
input, just create [INAUDIBLE].. Yeah. That's what you need
to do to bring up a composing environment. While waiting for the
environment to be created, right now it takes a while. Because as I mentioned, we
host the Airflow web server inside GAE, and it simply
takes GAE 10-plus minutes to get the application
deployed for you. So we have pre-created a
Composer demo environment. So we can take a look. Those are the service account. Those are the name. You've got a view of all
the details pertaining to your environment. Earlier I mentioned that we
use GCS to deploy workflows. So let's click into
the DAGs folder. We have one simple
DAG, it's the bq demo, which is the example I just
worked everyone through. Deploy a new DAG is as simple. Just copy a file
into this GCS bucket. Meanwhile, if you want to
interact with the service and then try doing interact
with a Airflow web UI, understand what
other workflows-- like, you don't need to run
through clumsy proxy set up. Simple click gives you
the Airflow web UI. This is the demo DAG
I just mentioned. It's really, like,
two or three tasks. I just want to
mention to everyone this is not to
say that you can't create more complicated DAGs. James mentioned earlier
that within Google, we have this complicated DAG that
really help us to run CI/CD, so that we make sure
that all changes are submitted to Airflow upstream
will not break GCP operators. The other part I want to
show to everyone is earlier, I mentioned that the web
server is hacked by IAP and only authorized user
could access the web server. So I'm going to open
an ignition window. All right. Just give me one second. James, I need your magical
power to restore my demo page. JAMES MALONE: No worries. AUDIENCE: [INAUDIBLE] FENG LU: Cool. Thank you. JAMES MALONE: We
also have somebody who knows the Chrome
keyboard commands well. You'll need to copy
and paste that. You should be set. FENG LU: So what
I'm trying to do is I try to log in
with my personal Gmail. JAMES MALONE: And he uses
two factor authentication [INAUDIBLE] idea. FENG LU: Yeah. JAMES MALONE: Yeah. FENG LU: I'm sorry. I forgot to bring my phone. [LAUGHTER] But trust me, it works. I notice that like a few of you
are taking photos of this now. Please try it offline. Hopefully, I'll
guarantee that you wouldn't be able to access
this server, this website. Cool. So the other thing
I mentioned earlier, because we have this
IAP-protected web server, it gives you the guarantee
that only authorized user could access to your DAGs. It opens up a lot
of possibilities. So here we're going to have
a demo where you actually try to trigger DAG
execution from a GCF. You know, you're writing
all function-- really, it doesn't really
matter whether it's GCF or a different computer,
as long as you have the necessary IM credentials. You are being added to
the IM policy of Composer, you would be able to remotely
trigger the execution of a DAG. So I'm going to
test this function. What it does is,
behind the scene, this function will
try to call the URL, as Airflow also hosts its web
server in the same application as the Airflow-- excuse me-- Airflow also
hosts its API server in the same application
as a web server. So behind the scene,
what this really does is send a RESTful
request to the web server I just showed you. Let's do a page refresh. Now you see a task
being triggered. And this is the manual
ID generated by Airflow. So this is what I mean
that, once you have the DAG, you could trigger
the DAG anywhere. It gives you a
lot of flexibility so that you don't necessarily
need to have access to G Cloud, not necessarily need to have
access to proxy, or even Google Cloud Console. The fastest way to
[INAUDIBLE] this, while we're waiting for this-- I just realized this particular
DAG execution has already completed-- manual-- at this time. So the other thing I
want to show to everyone is, in this GA release, we
also support Kubernetes pod operator. So in the past, it has
been a pain for user to manage dependencies. As James mentioned, we do
make it a convenience so that you can any
Python packages. But sometimes if
your dependencies live outside of Python,
then what can you do, right? That's why in this
one, we realized that-- we make it a very
convenient way, backport the Kubernetes pod
operator in Airflow 1.10, which was just released
a couple of weeks back. But it also will make it
just work out of box for you. So you don't have
to worry about it-- what's the service
[INAUDIBLE],, what's the credential, all that stuff. So as a result of
that, I'm going to show you another demo,
which will really just-- Yeah. Let's just wait one or two
minutes for the DAG to show up. This is what I mentioned, that
Airflow has this directory scan interval, which allows you
to specify how often you want to scan for new changes,
new files, or new workflows in your DAG folder. Again, there's
some default value. But through
Composer, we gave you the option to override that
default configuration value. While waiting for the
workflow demo to show up, we can look even more into
details about what this Airflow UI offers to you. You have different
views of your DAG. It is the graph view, true view. You can also see the
code at any time, just in case you need
to sort of switch back and forth between the graph
representation of your workflow and the code representation. You could also conveniently add
connections to your workflows. Only imagine that Airflow
does store all the connection credentials in the
metadata database. And they also conveniently offer
you this Airflow, this web UI, for you to mutate
your connections. Any time you can also see all
your configuration that comes together was this
particular installation. As I mentioned,
those configuration can be later on changed
through the manager service. And let's come back now. We do notice that, as we added
this Kubernetes pod example, it's pretty straightforward. Let's take a look at the code. So as I mentioned, you
have all the-- just to recap-- you have all the
import statements at the top. And then you have all your
definition, all your workflow data that you're going to
[INAUDIBLE] in your workflow. Now you define your DAG. After that, you
define your task. So in this case,
we have two tasks. The task in the
first one is really like a batch
operator [INAUDIBLE].. The second one is
what I mentioned-- you have a Kubernetes pod
operator, which allows you to, in this case, get a
per image and a compute of the value of pi. At any time, you could-- this is the Airflow web UI where
you can inspect the log output. Sometimes there might be a delay
in how soon those logs appear. This is where the Stackdriver
logging is helpful. So let's try to look at the logs
that's viewed at the same time by the worker. As you notice, we have
this pod launcher. That's actually related
to launching the pod. So the nice thing we added
into the Stackdriver log is that we organized
the Stackdriver log, and then provide labels
for you to access the task. So I'm going to-- here's what I'm going to do. I'm going to field out
all logs pertaining to this specific task. There you go. You see all the logs that
are being generated for you. While, at the same time, it
may take Airflow for a while before they have the
logs populated for you. And now the task is done. We can take a look
at the output. As you'll notice, you'll
start a [? GKE ?] pod. And then just wait for that
[? GKE ?] pod to be done. At some point--
wait, let's see-- I'll try to scroll over. There we go. There's a job
pending, job running, and then it prints
the value of pi. And then finally, you'll
get a task successful. So with that, that
concludes the demo session. Feel free to try
the service offline. Like I said, I promise that
you can access my web server. All right. Quick recap on the demo-- we showed you how to create
a composing environment, the various ways to interact
with the environment, how would you monitor
workflows, how do you deploy workflows, monitor
workflows, trigger workflow from a external source. And then inspect
your workflow status with Airflow logging, as
well as Stackdriver logging. Pass to you, James? JAMES MALONE: Excellent. And just to add on, the security
that we use for the identity ware proxy is the same
load balancer level security that's used for things
like the Google Cloud console. So we are paranoid
about security. So the two factor authentication
is a good example of that. It's a very core security. So you would have
seen a rejection, and that rejection is handled
very low down in the stack. So I want to talk about
next steps in terms of our involvement
with Airflow, in terms of where Composer
is going, in terms of how you can get started. There is a lot of Composer
questions that we get. And we just want to go
ahead and answer some of the most common questions. Please if you have
questions, I'll have details on
how you can bug us. Please bug us. We're here as a resource. We are not shy of questions. We love input. We're here to
collaborate with people. These just happened
to be the questions that we get 99% of the time. There's questions on can
people install their own Python packages, Airflow operators,
Python-specific things? The answer is yes. Inside of the Cloud storage
bucket that contains your DAGs, you can either add
custom Python modules-- that's the interesting thing
about everything being code-- there's also specific folders to
add plugins for Airflow itself. There's questions on whether
we touch the environment after it's been created. And the answer is no. One of the soft spots
on Airflow right now is how changes are handled
version to version. Right now we think it's
best that we don't change your environment's
version or version of any of the components
once you deploy it. That may change over time
as the Airflow community, as Composer matures. Will Cloud Composer be
offered in more regions? Yes, we are actively
working on it. GA is a good example of that. Is there a graphical
way to create DAGs? Very, very common
question-- the answer is no. But this is something of
extreme interest to us and also of interest to the
Airflow community. So the answer is no,
but I would expect that it will happen at
some point in the future. Which version of Python
can be used with Composer? This is probably the most
Composer-specific question that we get. Right now it's 2.7. We are actively working
on Python 3.5 support. You can probably expect that
in a future Composer release. Future directions
for Cloud Composer-- so we showed off
some of the work we've done with Kubernetes. The intersection of
Kubernetes and Airflow is exceedingly
interesting to us. Google, in terms of the
Composer team and other teams, have been involved in work
for Kubernetes and Airflow. The KubernetesExecutor is
a good example of that. It's not the last example
or the end of the line, I think, in terms of
Airflow and Kubernetes. We're also working on
additional operators. So we want to support
additional API surface area coverage of the products that
are already inside of Airflow, so things like Dataproc
or BigQuery, data flow. We also want to expand support
for new products inside of Airflow that
are GCP products. Third, resource
usage-- so right now, you can create your Composer
environment with a fixed size. You can also resize
that environment. We're also thinking
of ways that we can increase the elasticity
of that environment, based on workflows that are
executing on that environment. Much like a lot of
our managed services, we want to tightly
constrain the resources that you're using
for an environment to what's actually going
on in that environment. There's a ton of
different things that you can do to get involved
with either Airflow, if you don't like Composer, and that's
totally OK, or Composer itself. Airflow, the Apache website for
Airflow's a really good place to get started. There's links to
the mailing list. They have pretty
active mailing lists. There's a lot of
information for how to get involved in that community. In terms of Composer, we
have our documentation for the product. There's also a Google group
mailing list that you can join. Please ask questions
on that mailing list. There's also a Stack Overflow
tag that we look at as a team. So if you have Composer-specific
questions, please use that tag. Because the more questions
that you ask with that tag, the less likely it is that
other people will need to hunt and peck for that
information over time. And we just can't anticipate
all of the questions that might come up over time. As a plug, there
is a Meetup group, if you happen to be local
to the Bay area for Airflow. There is going to be an
event in September, which is going to be hosted by
the Cloud Composer team at the Google Sunnyvale office. So if you are local,
please sign up. We just put up the
Meetup event for that. So it's something to check
out if you're curious. With that, thank you all
very much for being here. We sincerely appreciate it. And again, thank you to everyone
who used Composer and gave us feedback. We're open to questions. We have five minutes. If you have questions,
please come up to the mic. Happy to answer any
questions you guys have. [APPLAUSE] [LOGO MUSIC PLAYING]