Creating a federated schema for a global company by Shane Myrick, SSE at Expedia Group

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right hello everyone my name is Shane and I work at Expedia group and today I'm going to talk about how we went through a graph QL journey creating a federated schema so we've already heard today about what a federated schema means and how that process looks like some of the tooling that's going to be coming from Apollo to create new graph QL schemas in a federated way we kind of came about this approach a little bit before graph cute before Apollo had all these tooling so we've been talking with Apollo through this process and how we've done our schema and we've really kind of aligned in some of the areas so you'll might see some overlap here but I just wanted to talk about some of the challenges that we've had over the past two years doing graph QL so first off I did say Expedia group not Expedia you might have heard of Expedia group is actually a larger company and we own a bunch of different brands you might have heard of some of them are Expedia Orbitz Travelocity HomeAway verbo hotels.com Hotwire all these different brands are under the umbrella of Expedia group and we actually have over 25 different brands that all kind of come together and over the 22 years of our company all these different brands have built lots of different technologies to provide products to our travelers so we office obviously have things like hotels flights we have activities you can book car rentals vacation rentals so you can run rent housing and all these different teams and products were built by teams often in separate areas we have teams across the globe in different countries different offices and it's a large communication process between a large company that we have to deal with and so the whole company can be divided up into basically two large separate organizations we have on the left here we have our travelers who interact with our site and then on the right we have our suppliers so that's be the hotels and the airline's they have to give us our data so really we have the shopping teams which could provide the customer experience and then we also have our supply teams but today I'm just gonna be talking about the shopping side of things so mostly anything you've interacted with sites you've probably seen powering those UIs so the company's scope for just the shopping teams is that we have over 500 developers across the globe and that's just developers I'm that doesn't include our product managers our managers of Engineers our designers we are a very large company who manages these products that you interact with and so this communication is very difficult especially when you go across the globe across time zones we have done have meetings at different times with India I work in Seattle and that's not a very friendly time zone overlap I've taken lots of 7:00 a.m. meetings and we also have lots of API teams that power these different products so we have different api's to power the shopping products like hotels flights booking customer data all those api's are managed by an entire team and then we have that front end clients or apps they're actually consuming that data and we have over 50 different web pages that you might interact with or our mobile apps like iOS and Android we're onboarding new apps every day and all these different apps have teams that manage them and they all have to consume this data so really we started off this journey by taking a look actually at our customer experience this wasn't really around graph QL it started off us looking at what the site looks like today and noticing that there's a discrepancy as a traveler coming to any of our brands you might see different colors different data that whether you're using the mobile app or the website and they're all kind of similar they're all using yellow and blue on Expedia and we're all using the other colors on different brands but really they're there's still some minor discrepancies and so our first actual push for doing this kind of change was that we wanted to align the customer experience so things like the fill filters in sort selection so here on the Left right we just have an option you have that filled filter filters and pills they come up on the top of the screen versus like on the mobile app they're on the bottom and you might think they sure that's fine but really it should just be the same across all our apps and all our designs and often you might even get discrepancies in the data we show so here we have this concept of a message which would tell you that hey you're booking a hotel in Maui and it's pretty heavily booked you might want to book soon that data doesn't even appear anywhere on our mobile sites because they're using a different API to get the hotel data so we wanted to first align the customer experience which also requires that we align the data and so how did we get to this state of all these different experiences well we have our different domain teams on the bottom here we have hotels flights cars etc and they have all the clients at the top that we have to talk to our progressive web apps on web our native apps new alexa and google home chat bots and just like matt talked about earlier we had all these connections all the connections between the clients have to be managed not only on the clients but also on the server are on the service teams so as a client if i want to use some new data or expose a new feature to a traveler i have to first know that this data even exists at Expedia you have to know that there's this team somewhere producing this data because again we're a very large company you might not know that that team is working on this product but let's say you do find them you find their product now you have to work with their documentation maybe they're doing REST API this way maybe they're doing JSON maybe they're doing XML everything's different depending on the API you're interacting with and then as a service owner you also have to kind of do all this client management if you have to deprecate a REST API or maybe an endpoint or maybe even just an individual individual field on your API it can be very difficult to like talk to every single client that's using your API and send out these emails saying hey we're going to turn it off like you better stop using it and just this cross communication across a global company is very difficult we started to kind of reduce some of this effort when we moved to a pattern known as a BFF or back-end for front-end we started doing this for our mobile web sites specifically on the hotels page so BFF just kind of gave us one entry point where the client could talk to this the single endpoint it's go to take care of talking to all those different services so now the client can be pretty pretty dumb it doesn't have any business logic in it and the data returned from the BFF would just be exactly what that client needed to power a UI but if we wanted to roll this pattern out for everyone we would have to essentially create a BFF for every single client and that just really doesn't fix anything because now we just have another service layer in between all the clients we really needed a BFF for everyone and again if we want to align on the traveler experience the UI we also want to align on the data we're showing them so that's where a graph QL came in we realize graph QL is exactly what needed to power this experience if we have one entry point for our clients to talk to they don't have to worry about all these client connections managing these communications with these external teams and then as a service owner you know you can just expose your data into the schema and any client who needs that data can consume it you don't have to worry about managing these deprecations you can just deprecated it in the schema and you can just talk with the clients and then we have our operation registry to look at exactly who's using it but as I mentioned before these hotel teams the flight team these are massive teams I looked at just the other day the hotel's API we have over 150 the developers contributing in just the past few months so there is no way that we can have every single team contribute into a single schema so this is where we almost immediately from the start knew we had to work with a federated schema model so we have at the top here actually or in the middle we have our hotel service and so the hotels team can work on their own graph QL service they expose the schema because they really understand the hotel's domain they know exactly the products and data they need to provide and then all the different teams can provide their individual graphical micro services and then the graphical gateway at the top can just look at these different services and stitch them together and expose still a single schema for our clients but now the teams don't have to worry about conflicting and working in a single codebase so this is what's known as a federated model which again a lot of the Apollo developers re have talked about here we're not quite using the same tooling that they've mentioned but I'm going to talk about the tooling that we are using so how did we go from the REST API model to this graph QL service well it wasn't just in one fell swoop we started with hotels and then we started bringing new clients and new data on board as they start getting familiar with graph QL and as we felt the need to start exposing that data and this requires some technology changes so our existing kind of stack was very heavily powered by Java most of our REST API s are built on Java we have lots of Java libraries are actually still shared across all those API s and we needed to use those libraries and a lot of the servers a lot of these REST API s are built in different technologies so some teams might be using spring boots some teams might be using drop wizard and if we're going to be creating these individual micro services it was gonna be kind of complicated if everyone got to use their own technology and also everyone was just doing different logging and metrics so some teams are using different libraries they were logging in different ways some we're using Splunk to log their data others were just using gravano all these tools you might have heard of everyone was doing something different and we realized if we're gonna really create this small micro service level to expose it we wanted teams to have this simplest way to onboard onto that architecture and so that required is kind of standardizing some of our tooling so since we're pretty heavy on Java we decided that we would use graph QL Java this is just a server implementation of the graphical spec so allows you to expose a graphical schema using Java services and then we actually built confident on top of that so Coughlin is a great language it has some great features and additional to creating the server we actually built a graph QL server SDK if you want to call it that we basically take care of standardizing everything that the developer would normally do to start up a new service and just put that into a template so now you just go to this template you fork it and then you get to just write your simple query on exposing your business logic you don't have to worry about you know how does the server start up what server technology we're using what is the log in monitoring metric library we're using everyone's going to do it the same way and if we scale this across the company it's just benefits everyone because now everyone's using the same tooling also since now we have this SDK how everything's built or everything is coded we also want to standardize our CI CD process so all these tools kind of come together to make sure that everyone is doing the same thing because again if we work in a large-scale company like Expedia Group we just wanted to have the best time available and if there is something wrong with the tooling if one person fixes it from the community then everyone gets the benefits so speaking of some of the tooling we actually open sourced one of those parts of that tooling of the SDK and that is graphic you'll Colin so right now if you were to create a graphic you'll java service you have to define your schema you have to either do that with a schema file or using there kind of other Java templating the way you kind of just define the schema and then you have your resolvers so what you have to implement that schema and for us we just noticed there's a lot up location because now what the types have to match and everything just it gets a little complicated if you miss something yes that can be checked in build time but we just thought hey Kotlin is a great language it has strong types it has the concept of null ability just like graph QL what if we could just generate the schema from your existing code so that's exactly what we do so graph QL Kotlin just looks at like a function you expose and it looks at the arguments looks at the return type and using reflection we can just generate the schema from your existing code so that's how we kind of enable developers to not worry about the server or the SDK or any of that they just have to write a function which is a query and then that function just gets exposed in the schema so feel free to check us out on github again it's graph kilcullen so since we're doing schema stitching we have all these different services one thing we had issue was with those conflicting types when we end up merging the schema so if I'm in the hotels team and I define a type like generic type like coordinate and I merged that into our schema and that gets exposed into the global schema if now the flights team wants to have some type called a coordinate like a GPS coordinate that's actually going to conflict when we do our schema stitching and so we wanted to standardize some of the types that we're using across all our services so this is where we have this concept of a shared type so we just created a simple directive hat shared and this just kind of allows us to look at the schema when we're doing the stitching and know that the the services are exposing this type this is actually coming from the SDK and this is valid this is not going to be a type conflict and it's going to look exactly the same across all the services and we didn't just do this for basic types like coordinates we did it for things like our date and our time so now if you're providing a date it's gonna be exactly the same format you don't no longer have to worry it's a date strain and it's a month month day day year year or is it day first what what's the format everyone's gonna use the same date object as these well defined types the same thing goes for like how we expose our money objects everyone's going to get the same format because we require everyone to pass in locale so now our SDK can take care of doing localization so our money's always going to look the same across the site but also again since we're trying to standardize the way our site looks we have shared types around UI elements so things like images an image should always have a full URL and the text that we have for our accessibility so every image exposed on our schema will always be the SDK image type so how do we know that these types don't conflict in the schema well like I said we're using schema stitching this is kind of our first approach on how we did schema stitching we had our individual micro services down here at the bottom and then we have to get them somehow to the gateway so we actually used the Apollo schema stitching code that we talked about earlier and we say hey let's just put this code in the Gateway we'll run some polling operation and every ten seconds we'll just look at the services if there's an update to the schema we'll just merge the schema back together and redeploy the Gateway and this worked but we obviously ran into quite a few problems polling takes a lot of resources we just might have missed something if there was a miss deployment and like one AWS region but not the other so we pretty quickly realized we need to change this model and we now have this new model where at Build time when change is made it's already approved to a schema like the hotel service they push their schema to an s3 bucket and that s3 bucket is just now being watched by some lambda which has our schema stitching code and every individual service can have their own bucket or just some location which we understand the lambda notifies whenever a bucket update is made it gets the latest version and then it just stitches together the schema with our code and pushes out to another s3 bucket which is the global schema bucket so now we have this history of the schema as well and then that now the Gateway can be notified with a push model instead whenever there is a new schema and we can add more information into this pipeline so now since we build the schema a trunk are at Build time we can publish metadata into the s3 bucket like what was the SDK version this service was using what was the get hash what was some of the other like Expedia specific libraries that we were using when we built this schema and just allows us to have more information whenever we do our introspection or look at into what services are doing also since we are using Apollo tooling we are pushing the schema whenever do the stitching into Apollo engine and primarily just the schema registry so Apollo provides a couple tools operation registry schema registry client registry and really just any schema registry is gonna be almost invaluable for us whether it's Apollo or just building your own but for us having a schema registry has allowed us to do some of these validation things that we've mentioned already in these and some of these talks right so if we have this history or we have this knowledge of exactly what is in the schema so a property or a hotel here has all these fields we know exactly whose query net because everyone has to go through the Gateway we know exactly what how many times per minute this query is being a few they used what operations are being used and what exactly the client names that are using this field so if someone tries to check in code change to a schema and that schema change would either be breaking a client or maybe you're just conflicting with a type name an existing schema we can fail you on an OPR check and github right so this is just almost a valuable for us because before we were trying to build this on our own and as soon as we started using Apollo and generally this is it we wanted to have these checks in the pipeline here and now we don't have to worry about checking it on on build time we're failing the deployment instead we can do it like on PR time so now that we have all these technology changes and we want to start bringing teams on to this concept of graph QL it's going to take a while it took a while for us to start teaching everyone these new concepts which guys this culture change in the company previously teams were pretty silent so that you lose to work on their own teams so the hotel's team actually created their own api's for shopping but then they also created their own web apps so they never talked with the flight's team because they just worked on hotels flights just worked on flights and so on and so on no one ever had this cross communication but now if we're going to be communicating across a single schema that needs to change so how do we have this change across our company culture and the first thing we did is we kind of established these schema managers so I work on the team of these schema managers and our goal is to kind of build the tooling out for the company that we're using so we've been working with Apollo to make sure that our teams get all the benefits and tools that they need to build the schema but we also are kind of the gateway and the gatekeepers per se of the overall graph overall one schema at the top level so as we're bringing more and more teams on to this global schema we need to make sure they understand the process of what graph QL means to them and so we need to coordinate with them make sure that they're following the best practices and after they start onboarding and they maybe start creating a schema now we need to make sure that if they start doing some process that we've seen before and another team we now can actually have this like glue between the two teams so now flights and hotel hotels can talk to each other because maybe they're actually doing the same patterns and maybe we don't have a shared type that they're using right now but maybe there's a new addition for some new share type that we can check into the SDK and that would be really helpful for everyone so schema managers for us has just been a great step to kind of change the culture and also we as the managers we need to define the rules of the schema and so we wrote down one of our big decisions from the start was that we're going to document everything is going to be written down in a single location so that all these teams as they're onboarding again and the teams are huge teams teams of 2030 people sometimes they need to have a single location where they can go and look at what decisions are being made and what that means to use graph queue at Expedia but it also just doesn't help the teams that are consuming this documentation it helps the schema managers ourselves because I have teammates who work in London or Chicago and so I'm not always going to be able to make a meeting with someone from India so if they're going to be doing meeting with some other teams they're onboarding we need to make sure we as the schema managers or having a consistent message that we make sure everyone's doing the same thing so documentation has been really really important for us another part is that we're doing schema reviews so we talked with the team and they know they're good to onboard on to graph QL they got on the hype train and they want to do graph QL let's go so we make sure we actually meet with them first and say is your data actually valuable to the schema because it might not be I'll be talking about this just a little bit on the next slide here but not all the data needs to be in graph QL right you don't need to expose your entire database into your graph qo schema and so we want to make sure any data we expose is actually going to be used by the client and then if we're having this schema review they've already created the schema now we kind of just want to go over the overall structure and make sure they're doing a good schema design and that means that we have a meeting with them but not just with that team so not just the schema managers the team creating the schema since we're all contributing to one graph we actually open up these schema reviews to the entire company we want the hotel's team to be in the meeting with the flight's team to say hey this is a flight schema let's review it like you as a hotel contributor or developer you don't know anything about flights but that's perfect we want someone who knows nothing about the domain to like propose these questions like is this the best design I don't understand what that field means and if I is a developer who work at Expedia don't know what that field means in flights how is someone else who is going to be using this game again understand it so I actually bring in people without the domain knowledge to kind of question their decisions and this kind of creates a community across the whole company - right so no no longer are just the schema managers in charge of creating this great schema for everyone to use it's actually everyone's benefit right we want everyone to be invested just as we are and to making sure we create the best possible schema and the best possible schema for us actually starts with the clients so Matt kind of talked about this earlier you could take a data first approach but for us since we're trying to align on the UI first everything should be coming from the clients so that means that the client actually looks at their design and they create the perfect schema they have no knowledge of what data is provided what service capabilities are there they're just looking at the design and saying hey what data does the traveler actually need to book this product to interact with this product and then that should be the schema it has a basic structure any data that the customer doesn't see doesn't need to be in the schema and then from there you can work with the service teams to adapt the schema to meet the needs and again we want to make sure that only the data that's ended up going to be seen by the travelers is exposed in the schema yes graph QL has this option that clients can select only the fields they need but we kind of start to question that need if only a single client out of these 50 clients is requesting the field maybe they shouldn't be using that field right why why are they different like what's the point of this data if this only one client is using it maybe we can start aligning these clients to actually still use the same fields now the you might have some clients right like i watch app is not obviously going to show all the same data as your mobile app so that's where graphic you all can be powerful but we still want to make sure that data is just the minimal amount of data we need to power a UI so kind of taking a look at an example what that might look like here's just kind of a mock example of what our hotels API used to look like on the left side so he used to return the user data yes the user data was in the API legacy reasons and we used to return a list of hotels so like on a property search you just get the list hotels and some user information and so on the user the user is logged in or they're not either authenticated and then every Hotel might have this concept of a member member deal and so not every Hotel is available to be to have this member deal so and the in the schema if you are an API developer if you're creating this schema powering this API what you might do is just say hey okay we're going to simplify this so no longer could does the client have to check two boolean's to say hey if the user is logged in and this hotel is available for member pricing then we're going to show some element on the UI we're gonna simplify that into just one check so we'll do the business logic beyond on the server less business logic on the client so we'll just rename it to single boolean display the member only badge great that seems like it's a benefit for the client however if we take a look at the client the client actually knows that there is another concept of something that we might show to the user so on the top here is this yellow badge the members save more just that's the member only deal badge but this other Hotel has this concept of a shortage badge or there's only eight Hotel eight rooms left at this hotel so we're gonna offer you a discount but the client knows which priority do you show them they're in the same location so which badge takes priority if the member Deal is there but it also has a shortage well the right now the client has that business business logic and if they still have to do this check to do these different things if one client is doing that check every single client that's using the schema has to also do to that check and that's business logic we're trying to remove from our clients so instead we just have this idea of like an offer batch which just means that the client just looks that that badge it might be null but if it's there they just take the icon and the text and they put it on the screen and then in the server we can define the business logic of what takes priority order and yes this is a fairly UI heavy example but really this just is the concept that you're trying to reduce the business logic on the client so you can apply this to any graphical schema all right so just kind of if you were to take any one thing away from this talk the three things that we found valuable at Expedia group when doing our graphical migration was that we wanted to standardize the tooling to meet our developer needs the developer experience is going to be key when you're using graph QL graph QL doesn't directly benefit your customers right the customer doesn't care if you use graph QL or if you use the rest the data just shows up to the customer but graph QL builds these better developer experiences as we've seen with some of the client tooling some of the server tooling and if your developers work better than they can build features faster which benefits your customers so in the end we're still making sure that everything works the developers to help our customers and making sure that we build a client for schema we want the schema to be designed well and making sure it's powered for clients and then creating a community around that schema creation has been just invaluable for us just the amount of communication we have to do with teams and emails and slack channels it's just it's too much for a single team of schema managers to handle so we really are trying to strive for this community of graph QL at the company and then lastly for us graph QL has not just been this way to standardize the data it's been a way for us to standardize our business all right so no longer do we have the question of like the hotel's team has this API and of this data they have this UI for us now it's just the clients say I have this graph and I want to answer these questions that the traveler has like what hotels are available near the Space Needle what hotels are available in the customers cart these are just questions that your travelers and customers need to know and it should be just up to the graph to answer them and it doesn't matter how you the services underneath that graph you're just trying to still create a better experience for your customers all right so that's all I have today if you want to follow Expedia group you can follow us on social media we can definitely keep up with our graph QL journey I'll be posting more publicly about graph QL and how we're using it at Expedia we also have our careers page down here if you want to follow me on social media I'm just at Shane myrick everywhere here's some links and thanks everyone [Applause]
Info
Channel: Apollo GraphQL
Views: 4,299
Rating: undefined out of 5
Keywords:
Id: MuD3TAP0D9Y
Channel Id: undefined
Length: 29min 31sec (1771 seconds)
Published: Tue May 28 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.