About a year ago I told you about the 100.000$  GCP Prize from the google bug bounty program,   awarded for the best vulnerability  found in the Google Cloud Platform.   Now a year later, it’s time again! And  the prize was higher too - $133,337.   In this video I will tell you the  story and technical details of a new   amazing winning bug. It’s basically a Server Side  Request Forgery attack, but with the impact of a   remote code execution inside of Google. So  let’s meet this year’s winner from Uruguay! “So my name is Ezequiel Pereira.” This is him on our call when we  talked about his bug, but at the time,   he didn’t know that he won the 133.000$ yet. I  pretended to just be curious about his blog post:   “This sounds really insane and I was  wondering if you would be interested in   making a video together.”. He said yes,  and so he started explaining his bug. “They classified it as RCE, because  an attacker could potentially   execute code by calling some  internal google endpoints.   And those internal google endpoints, seeing  that the request comes from the cloud deployment   manager, might allow the attacker to do actions  that an external user shouldn’t be allowed to   do. So it is equivalent of RCE. I never got to  the point of actually executing code on google,   especially because they cut me off. Because  they were treating this as an incident” So his bug allowed him to basically perform  arbitrary requests inside of the trusted Google   network. And he could potentially reach critical  endpoints. We will soon talk about this internal   network, because it’s really fascinating - this  bug combines so much knowledge about google   internals. But as he said, he never got so far  to exploit it further. Google knows what impact   this special server side request forgery has, and  classified it as a critical remote code execution. “If you get into the internal network, and  are able to issue requests, it’s like RCE” It’s not true in any SSRF case. Because  internally in Google, requests between   services have to be authenticated. But  in this case the source of the request   was authenticated with apparently  high privileges, thus in this case,   being able to issue requests, is like RCE That’s why they also treated it as an incident.   This is a bug that needs to be  investigated because maybe somebody   else, an actual attacker used this. Of course Ezequiel also got awarded the   regular bug bounty awarded when he  reported it, so how much was it? “You have a rewards table, and  you can see they always pay 31k   for RCE issues in their main google products.” What Ezequiel didn’t know was, that I had the  Google VRP team on standby, to join the call   at any moment. I acted surprised why somebody  would want to join the call. I accepted them all.  “We concluded that you won the top prize. Which is  a $133,337. We wanted to surprise you in person.”  “Woohoo.” “Thank you. Thank you very much.”  Congratulations! “Well I didn’t really expect   to win. I thought like maybe the firebase bug,  that like sent tons of notifications would win” So I guess Ezquiel needs to update his blog  post with the facts! It’s not just 31k. It’s   a total of 164.674$. Damn!!!! But now I hope  you are ready to hear the story about this bug. <intro> First of all, there are soooo many  Google Cloud Platform Products   and as a bug hunter you might get  overwhelmed not knowing which target   to pickwhere to look into. So I’m curious how  he ended up researching this particular area. “Google Cloud is huge. I can assure  you that I haven’t even touched like   70% of it. 70% of google cloud is completely  unknown for me. I’m no google cloud expert.   I just go through the documentation.  And if I kinda understand something I   begin looking into it. I researched App Engine   a lot, because it’s easy to use, easy to  understand. And it is really interesting” App Engine is one of the big  main products of Google Cloud.   Basically it allows you to host web applications. “So while looking at App Engine flexible  environment, I stumbled upon Deployment Manager.  So here for instance I deployed something  on App Engine flexible environment,   and you can see calls to the Deployment  Manager API. And you can see like the user is   gae api prod.google.com service accounts. Well, I decided to look into it because   if it was being used by App Engine, it will  probably be used by other GCP products. And   when something is used internally at Google, even  if it is a public issue like deployment manager,   they sometimes hide internal settings, internal  stuff, that if they are not well protected   they could be exploited by an attacker.  And that’s what happened here.” So the “Cloud Deployment Manager” is one  of those many products that he didn’t know.   But then saw in the logs, that App Engine  uses it internally to manage the servers.   And I think it’s very clever how he thinks about  google products that they also use internally. “For instance I think every, or almost every,  withgoogle.com website is really an app engine   application. And sometimes you can see that  the website is integrated with the other   google services. So you begin to wonder, if it  is running on App Engine, how is it connected to   internal google stuff. Or for instance they  use GCE, Compute Engine a lot too. So while   using them, sometimes they need to do internal  stuff. So sometimes they build into the public   tool internal stuff, that they just  hide somehow from the public. Because   they are only meant for internal users. If you pay attention sometimes documentation   also references internal stuff. And you say,  what does this mean in the documentation?!   And it doesn’t make sense, because  well, it is intended only for googlers.” I think this is a very valuable tip for aspiring  google bug hunters and probably the most important   takeaway of this video. Approaching the target  with the mindset, that if a product is also used   internally by google, maybe there are undocumented  internal features exposed that could be exploited.   And he saw that the Deployment Manager was used by  App Engine. So he decided to hunt for bugs there. “So I knew nothing about Deployment Manager.  I didn’t understand what it was for.   Even right now I am not pretty sure why it  exist or how it is useful for a developer.   But yeah, I had to read the deployment manager  documentation like 4-5 times until I kind of   got the idea what it was for.” Let me try to give you a brief summary   of what the Deployment Manager is. It all starts  with a configuration or template describing some   resource. Here for example a compute-instance, so  a basic server, being deployed in a US datacenter.   It also has a harddrive attached, with  a debian image on it, and it also has   an external network interface with external  NAT. So this is a whole machine description.  Now you can take that, and send  this to the Deployment Manager,   and it will then setup this server for you. So you  can kinda imagine this like a docker-compose file,   or maybe a Kubernetes Deployment object. This is  just a Google Cloud Deployment Manager config.  Now let’s think about App Engine,  which is used to host web applications.   When you deploy an app, it seems like that  App Engine uses the Deployment Manager,   to describe the server where  that app will be running. “I began playing like creating my own templates.  And creating resources. And Looking how it works.   Looking at the different features. For  instance you know there are two public   versions of deployment manager.  You have the V2 version and v2beta   version. So I looked at the  difference of the two versions.” And one of those differences are TypeProviders.  It’s a very confusing name, but it’s important   to understand. So in a Deployment Manager config  file you have a type field, and that describes   what kind of resource or server you want.  In this case it’s a compute.v1.instance. But   that’s something Google Cloud specific. What  if your company uses, besides Google Cloud,   also your own datacenter with machines you want  to manage too? TypeProviders can be used for that. “A type provider exposes all of  the resources of a third-party API   to Deployment Manager that you can use in  your configurations. These types must be   directly served by a RESTful API that supports  Create, Read, Update, and Delete (CRUD).” So as long as you can provide a simple HTTP  REST API for your datacenter, that implements   Create a Server or delete a Server, then you  can define a TypeProvider describing your API,   and then you can use Deployment  Manager, referencing your own type,   to talk to your datacenter’s API. This way you  can manage all your cloud resources in one place. Anyway. Here is an example API request to create  your own type provider with the v2beta version.   And most important is here the descriptorURL. It  points to a JSON file. And this JSON file is like   a swagger API definition. This is what actually  describes your API endpoints where you implement   your create or delete resources stuff. The options  field is also interesting. You can see here that   it defines an Authorization header. It’s obviously  important that when you implement your own API to   manage your servers, that the API has some form  of authentication. And in this example you send a   google oauth token along, that you can then check. But now let’s send this request. “So you can see the operation was completed. So  the type provider was created. And if I go to   my server I can see that this IP connected to my  HTTP server and retrieved the descriptor document   for my fake API. and it provided the  access token, I told it should provide.” And now maybe you can already  see where this is going.   The bug Ezequiel found is a  server-side-request-forgery attack.   And here we control a URL that the Deployment  Manager sends a request to. So is it as simple   as for example pointing this at localhost,  or some other internal IPs or hostnames? “If I try to create a type provider that talks  to an internal server, like server side request   forgery, it will try at first to create the  type provider. But it will fail. It will say   error processing request. Error fetching  URL localhost. Error excluded ip. It won’t   let me do internal requests just like that. I tried like setting my own domain that will like   point to an internal service. You see that here  it failed on the creation of the type provider,   so I also tried like setting my own domain  that at first points to a valid service,   and once the type provider got created It tried  changing my domain to an internal server to see   Maybe I could bypass it that way. I’m not an  expert on all of this. So maybe someone looks   at this, tries something and finds a way to  get SSRF through here. But I was not able to.” Huh. That would have been too easy, right?!  There is more to come. As a mental exercise,   try to think about what you would try next.  Or just try to guess where this is going.   This is what I do trying to figure out  if I could have found this bug too. “then I moved on, and some days later I decided.  Ok. Maybe I can find an internal method used   by the Deployment Manager. Because remember.  Google when using this public tools internally,   sometimes they hide internal stuff  inside. And sometimes they are internal   hidden methods in the api. So I know a way to  list all the API methods, even undocumented ones.   And it is through the metrics page here in the  cloud console. And funnily enough it doesn’t   only show the public methods. But also some  internal ones. If there are. So here you can see   for instance the GET operation method in the v2  version. But looking at this, I noticed that you   have v2 and v2 beta. But also here for instance  here you have dogfood and you have alpha version.   And those versions are not documented publicly.  And I said, Ok I’m going to look into them.” Mindblowing. Like a detective  finding small puzzle pieces.   So let’s see what happens when you try to  send requests to those different versions. Btw. look at my face during the  call. I’m in total awe right now. “I can get an operation on the v2beta version.  I can also do it on the v2 version. Let’s see   what happens on the alpha version. Can I call  a method on the alpha version? Yes I can. Can   I call a method on the dogfood version. Yes  I can. If I try a version that doesn’t exist.   No it doesn’t default to a  public version. It just says   not found. Now I know alpha  and dogfood are real versions”. And every google bughunter  should get excited when they read   dogfood. Here is a googleblog about testing from  2014 describing their concept of Dogfooding: “Google makes heavy use of its own products.  Because we use them on a daily basis,   we can dogfood releases company-wide before  launching to the public. These dogfood   versions often have features unavailable  to the public but may be less stable.” Now it’s not necessarily a security issue that  you can access a dogfood version publicly,   but if it’s a less stable test version, with maybe  bugs, there is a higher chance for it to have   security relevant bugs too. So it totally makes  sense to now go after this dogfood version of the   API and see if there are new features that are not  in the public release, that could be exploited. “So I begin looking into the requests. This method  is called list types. it tells you the built   in type providers of deployment manager. So for  instance here at first you can see that deployment   manager is able to manage spanner instances . I  was looking into this. I scrolled here. Said okay,   all of this sounds like stuff that is already  documented. Until I got here! I was looking at   the builtin types and suddenly I found with  the dogfood version there is something with   googleOptions. This is not documented.  This is not in the public versions.   So I was wondering what is it doing here. There I  found one difference. There I found one difference   with the public api. yah I looked into  it and said, if it is on the builtin type   providers of deployment managers, maybe I  can set it also on my own type providers” Ezequiel maybe just found an undocumented  internal googleOptions field and was wondering   if he can set it on his own Type Provider,  and maybe it does something interesting. “As soon as I found this, I was really interested  in it. Especially because I saw this. GSLB target.   And I know that GSLB is the internal  Global Service Load Balancer of Google.   And if you read the SRE book you can see that it  might let you send requests to internal servers.” The Google SRE book he mentioned is really  cool. It has been on my reading list for   MANY MANY years. But I cannot read books, so I  never did. Though even though I haven’t read it,   I know it’s amazing. Because as Ezequiel just  said, you can learn about some cool internal   Google stuff. And in this case, the Global  Service Load Balancer (GSLB) is important.  In the chapter about “The Production  Environment at Google” you can read “Our Global Software Load Balancer (GSLB) performs  load balancing on three levels. Frontend, services   and internal remote procedure calls. The frontend handles your typical DNS   queries for domains like google.com. But  internally Google uses their own system.  Service owners (so basically developers)  specify a symbolic name for a service,   a list of BNS addresses of servers [...]. GSLB  then directs traffic to the BNS addresses.” So internally google uses BNS  addresses to identify servers.  Further down we get an example of how an  HTTP request to a google service is handled. “first, the user points their  browser to shakespeare.google.com.   To obtain the corresponding IP address, the user’s  device resolves the address with its DNS server.   This request ultimately ends up at Google’s  DNS server, which talks (internally) to GSLB.   As GSLB keeps track of traffic load  among frontend servers across regions,   it picks which server IP  address to send to this user.  The browser connects to the HTTP server on this  IP. This server (named the Google Frontend,   or GFE) is a reverse proxy that terminates  the TCP connection. The GFE looks up which   service is required (web search,  maps, or—in this case—Shakespeare).   Again using GSLB, the server finds an available  Shakespeare frontend server, and sends that server   an RPC containing the HTTP request. [...] the frontend server contacts   GSLB to obtain the BNS address of a  suitable and unloaded backend server.” So as you can see the Global  Software Load Balancer is   inside the google network. That means, if  you somehow can send requests to a service   with their BNS address, you are  really deep inside of Google.   And you might be able to send really critical  requests to very important internal servers. And now coming back, here you have a  GslbTarget field which sounds like you   can maybe specify a BNS address. Those  addresses are not necessarily secret,   but they are also not really public. Though  sometimes they “leak” out. You can find some   of them appear in logs or API responses. But  Ezequiel also had this funny story to share. “There is this screenshot that I wanted to  show you. This screenshot is from a website.   An internal google website. Not internal. Uhm.  it was a webpage that was exposed publicly   by mistake. Until yesterday. Yesterday they  blocked access to it. But here you can see that   you have GSLB addresses. This is an example of how  someone might find GSLB addresses just by luck.” So fascinating right? All those puzzle  pieces slowly coming together. And we   are slowly getting to the vulnerability.  So we just found this GslbTarget value,   and there is this idea, maybe the domain you  specified here is overwritten by the GslbTarget,   and so you can use this request from the  Deployment Manager to send requests internally. “So for instance here I’m trying to breach the  corporate issue tracker api. So this is the   symbolic name for the issue tracker api. the issue  tracker api is issuetracker.corp.googleapis.com.   So usually you don’t have access to that. Okay  I’ll try specifying this on the gslbTarget.   So these values, I set this to true and   this false. Just because through trial and error  I saw those values worked. So I set them to that.   And Transport again, I saw harpoon being a  transport value. So I couldn’t get SSRF this way,   because as you can see, my server got hit by the  deployment manager request. But my server is not   inside gslb. So the request did not go through  the internal load balancer. So I didn't get SSRF.” Good idea. But it didn’t work.   Would you have given up at this point? I  might have. But Ezequiel had another idea. “I suspect it has to be with Transport here.  Because well it doesn’t make sense for the other   values to have anything with going through  GSLB or not. But I didn’t know what values   to put here. Should I put “internal”? I tried  bruteforcing it, but I just didn’t get it right.” Okay maybe Eziquel just has to find the correct  transport method, and then Deployment Manager   will honor the gslbTarget address and send a  request to this internal service. The problem is,   this is an enum field. So you need to exactly  know the name of the correct transport.   That’s why he tried to bruteforce it. “But at the time I didn't know what value it could  be. And I tried bruteforcing it. Like internal, or   corp, or whatever, And I couldn’t get it working.  I couldn't get SSRF. It would always keep my own   server and not the internal google server that I  wanted. And once you know what value goes there   it’s really obvious. But at the time it wasn’t  obvious for me. So I went through like, I spent   weeks stuck here. And one day finally I got  an idea. I wanted to use protocol buffers.” Wait for it. This is so smart! Protocol buffers, or protobuf,   is a binary serialization format for data. It’s  like JSON, just not readable by humans. It’s   binary data. JSON is what normal people on the  internet use. And protobuf is like what cool   people use. And it’s from google. Thus tons  of google services use, or at least support,   protobuf instead of JSON. Or to be more precise,  many Google APIs do not only support your basic   boring boomer HTTP/1.1, with your neat easily  readable HTTP headers. But they also support   gRPC. Which is a protocol using protobuf  over HTTP/2. And HTTP/2 is also binary. I   wonder how many of you have actually  heard or worked with HTTP/2 before. “So if you know protocol buffers,  they are a serialization format   that google uses a lot. And in that format   enumerations are encoded to binary as numbers.  Instead of strings. So for instanced here instead   of specifying HARPOON or OAUTH or GOOGLE, I would  just have to specify their enumeration number.” This is so clever! In your human readable JSON,  you would have to know exactly the name of the   enum. But in binary protocol buffers, data is  tightly packed, and for an enum with just a few   options, you don’t want to waste space and store  long strings. Because protocol buffer definitions   are compiled and shared between client and  server, you can just encode it as numbers.   AND THEN YOU WOULDN’T HAVE TO KNOW the exact name  for the transport. You can just try out all of   them. And so Ezequiel tried to interact with this  API via gRPC. Or actually he tried to use a trick   to more easily work with it. Google sometimes  supports protobuf over HTTP/1, instead of HTTP/2. “One way of looking that, is like for instance  specifying alt parameter here I want proto. But   it says, proto over http is not allowed for this  service, in this case deployment manager. So in   this case google disallows this fallback.  So I cannot use protocol buffers on   deployment manager. Or that’s what I thought  at first. Another thing google has, that their   staging environment are very often accessible.  So I said, okay if I cannot use proto over http   on the production environment of  deployment manager. Can I somehow   maybe access the staging environment? So  through experience again I know that in   some APIs to access the staging environment you  just need to prepend staging_ in the version name.   Just like that. Okay I did this. I called  the staging environment, and it worked.   I invoked a method on the staging environment.  It says my type provider does not exist,   because of course it exists on production  not staging. So let’s create it on staging” Oh my gosh he found another hidden version! “So here, look. In the staging environment of  deployment manager, I am able to use proto over   HTTP. and well this is binary i don’t understand.  there is another way to get a protobuf response   that is through the content type. Content-type  application x protobuf. And again, the same. But wait, it gets more crazy. “Something else I wanted to mention is that  Google APIs don’t only serve from googleapis.com,   they also sometimes serve from client6.google.com.  But if you try to get a protocol buffer response   from google.com, it will freak out.  Because it says, request unsafe for   trusted domain. I mention this because the  way google bypasses this when they need to   use client6.google.com, is through a header  X-Goog-encode-response-if-executable=base64.   So i run this. And I have the response  in protocol buffers encoded in base64.   If you don’t have the protobuf  definition of a protocol buffer message,   luckily the protoc the protocol buffers compiler  tool, has an option to decode the raw message   and just give you the field numbers. And here  I have the protocol buffer message decoded.   And instead of the field names,  I have the field numbers here.” So what you can do now, is  craft a Create-TypeProvider   gRPC request with this protobuf encoding,  and changing the numbers to the different   transport values you want to try. And once  created, you can then use the JSON API   to list all TypeProviders and get the actual  JSON name of it. And this way he figured out,   the transport name has to be set  to GSLB. Who would have guessed.  And so now he created a TypeProvider,  targeting the internal corporate issue tracker,   specifically the REST API discovery  endpoint, just to prove he can reach it. “So there is a method in deployment manager to  list the types a specific type provider handles.   And okay. This is what the issue tracker has. And  this is live. The deployment manager right now,   sent a request to the issuetracker API through  GSLB and got it’s discovery document and   processed it, and it’s showing us the  results of what it found. this is the bug.  I’m able to create typeproviders that  talk to internal endpoints through GSLB,   in this case an internal API.  But it could be any endpoint.” Wow! Of course google has fixed this bug  now. But this is such an amazing bug to me,   because it involved so many small puzzle pieces.  So many small tricks that have to come together.   I learned a lot about Google internals from this,  and hopefully it can help future bug hunters too.  Ezequiel, congratulations again for  winning. You really deserved it. But of course there were more cool submissions  for the GCP Prize. So go head over to the blog   about this year’s winners, and learn more about  the other amazing bug hunters and their findings.
