Getting the best out of Spring Cloud, Kubernetes, and Istio by Magnus Larsson

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] okay welcome everybody time for the first session of Jay focus and this will be about a lot of cool stuff I hope so please help me welcome Magnus lotion thank you is there some volume okay good right okay I like to start to say that developing a single micro service is very simple handling a system landscape of cooperating micro services can be a bit harder and to get them ready for production in terms of scalability robustness resilience and such things can actually be really challenging unfortunately a number of open source tools have been evolved during the last few years that can help us release challenges - such as being cloud akyuu beneath this and sto each tool is great by itself but unfortunately none of the tools can handle all challenges so they have to be used together and when you start to use them together you will realize first thing is that they have functional overlaps meaning that two or all three tools can be used to handle a specific challenge you need to decide what tool to use and then when you try to use them together it's not always apparent how to use them in an optimal or at least a good way so so this is basically what the presentation is about how to get the best out of these tools when using them together I would like to start to remind us why we want to use micro-services so we don't forget that and then I want to go into challenges that comes with cooperating micro-services and look into the tools and how they have evolved during the last few years and after that I would like to focus on the functional overlaps and reason about how to handle them and how to decide and then and the presentation with a demo where I would like to show how the tools can be used together in a good way before I start I would like to say a few words about myself and specifically my experiences with micro services I'm working for Kelly's Enterprise Swedish consultancy company with an office here in Stockholm and one office in Gothenburg where I'm from I've been working with distributed systems since 2008 and for the last six years I've been helping customers to develop micro services and in most cases it's been about providing public api's on top of existing functionality in their system landscapes and the micro services have been deployed in the cloud either on ash or Amazon and they have been used to run them using a managed service either for containers or functions in a while ago I started to write a blog series on how to develop micro services based on my experiences and one and a half year ago I was contacted by a publisher Packt publishing and they asked me to write a book on the subject based on this blog series so in September last year and this book was published and most of the material in this presentation comes from this book and specifically the the runtime environment for the demonstration is based on the instructions in this book ok so why do we want to break up our monoliths into small microservices from my perspective it's about two things first it's it's perceived that it's easier to scale a micro service than have a micro monolith and it's also expected to be able to release new versions of microservices then a complex and hard to test a monolith and to make this work the micro service has to be designed as often or most components and being based on a owner sure nothing architecture meaning that they are absolutely not allowed to share data in databases instead if they want to share information they have to do that through formalized interfaces either API synchronous API or by sending messages synchronously to each other and we have to remember and understand that cooperating micro services forms a distributed systems and disability system comes with inherited complexity Solex let's look into that yeah and I will now go through my top ten challenges and that's based on of course experiences from customer projects that comes with cooperating micro services the first and maybe the most obvious one is how how can we keep track of all micro services we need some type of discovery service and if you want to public publish public IP is to be consumed outside of of the micro services we typically need some kind of edge server gateway to route external requests into the proper micro service and when it comes to configuration we need something that can contain all cooperation for micro services and that also can help us to push out changes of the configuration to the micro services and the other way around we need something that can collect log output from one marker services store them century today to make it possible to analyze the log output afterwards we also need something that can help us to to monitor the health of the micro-services in the system landscape and if it detects an unhealthy microservices it must be able to replace it with a new fresh instance and we also want to understand the internal state of the micro-services for example how much horn barrel resources they are consuming and from the other side we want to be able to understand what traffic flows that the micro-services take parts of what traffic is flowing through our system landscape and if we can observe that traffic we for sure also want to manage and control the traffic for example if we want to add a new version of a micro service we don't want to do that you see we don't we don't want and downtime when adding a new version upgrading a micro service to a new version instead we want to be able to move traffic bit by bit to the new version and if we recognize the problem with the upgrade we want to be able to very quickly switch back to traffic to the old version and if we observe unexpected slow response time in and what we call change that goes through the micro service if you want to be able to to track down and to see exactly response times for each each part of the call chain to see what where the response time problem is using distributed tracing and finally we need good tools to handle temporary network problems so they not so they don't affect too much part of the system landscape so they are minimized effective temporary problems I'm thinking of tools like which why try out and circuit breakers that we need to help keep the network Apple running even if there are some problems temporary so these are the most frequent challenges that I meet when I help customers building system system landscapes of cooperating micro services so for each challenge there is a required capability we need something that can handle each each it's challenge so let's an event yeah forgot to say that is not that it's definitely nothing that you want to build on your own it's much better to use the open-source tools that have evolved during the last year so let's look into that the first tool I I learned about was spring cloud when it was released back in 2014 and spring cloud is comes with a number of application libraries that you build into your micro services and then you start a number of spring cloud specific services in your system landscape and the application libraries communicate with these services and then they provide you capabilities for edge server discovery configuration and resilience mechanism such as a circuit breaker and this was at least to me totally revolutionary when it came out back in 2014 but we have to remember that it is a bit restricted in its use it's it's mainly based for micro services based on Java and that already use the spring framework a year before doctor initiated the container revolution and the Google launched its open-source container Orchestrator kubernetes so let's look into that a bit when it comes to containers I assume that you all know about containers from ready they provide an isolation level similar to a virtual machine but totally without the overhead that comes with the virtual machine and that is made docker very very popular to use for developing and testing for example micro-services but when you get to production environment you are in most cases it's not sufficient to have only one server that runs doctor and you need a cluster of servers that run doctor Engine and to manage such cluster you need something known as a container Orchestrator it could be neat this being I would like to say them by 4 most popular contain Orchestrator and it is clustered from a very high level of section or level of abstraction it looks like this so a couple of the nodes in the cluster added or dedicated to being master nodes and their responsibility is to keep the cluster up to state and they do that by allowing a user and operator to define at the side state and that's stored in the database and then the mostest most a node in an infinite loop compares the desired state with the actual state in the cluster and if it detects any differences it's up to the master node to take actions to move the desired state to watch the actual state or maybe the other way around to move their actual state towards the desired state of course so for example if a container dies or working or goes offline it's up to the most you know to detect this and launch containers on on the remaining worker nodes to move the actual state towards the desired State a year later Google was finished with version one of kubernetes and at that time they gave communities away to a standardization organization CN CF cloud native computing foundation and that made the the adoption of kubernetes being very very fast it became very popular and for example all major cloud vendors today have offerings with a managed service of kubernetes for example as rammus or Amazon and Google and and software companies such as Red Hat and VMware have packaged kubernetes into their solutions for for customers that run their workload on-premise so that easily can start up a cluster to make this cluster on-premise as well and in total from my perspective this has made kubernetes being some kind of de facto standard for contain orchestration as of today a few years ago the concept of a service mesh was introduced there exist a number of open-source implementations the one with the highest velocity from my perspective is still the main reason for that is probably that it is backed up by a few large companies IBM Google Red Hat and lyft is doing a lot of work with this year and the service match from the conceptual perspective looks like this the principle here is that at runtime you inject a proxy component in each micro service instance and all incoming and outgoing traffic goes through that proxy and then you have a control plane that control its proxy and it says what how it should work and it also observes all traffic that flows through the proxy of course and then the service mesh typically provides an ingress gateway for handling incoming traffic to the service mesh and an egress gateway to handling out outgoing traffic from the service mesh and this makes it possible to observe the traffic in a very good way obviously but also it makes it possible to at runtime inject capabilities for things like security resilience and traffic management and that makes a service mesh very compelling to me at least so this is in the summary the evolution that have occurred for these tools over the lost years and I just want to refresh remind you on that all these tools are 100% based on on the open source I find that very very good so let's go back to this capability required capabilities and map each tool to these capabilities if we start with spring cloud in gray it can cover the upper capabilities as you see here when it comes to queue beneath this is a contain Orchestrator it's very strong on service management but it can also handle discovery service and edge and the handle configuration for micro services and when it comes to service mesh such as east you it's strong on the lower right side when it comes to monitoring observability traffic management but it also can handle tracing resilience and edge capabilities so as you see there is one area wrong capability not it's not yet covered and that's log analysis and if you have existing log analysis tool in your system landscape you can typically use it for micro services and if not and you are running on kubernetes I find the three elasticsearch fluent Tilly and Cabana as a very good solution maybe you heard of the elk stack before but here I have replaced logs - we with the fluent II which i think is a better tool when you run on kubernetes for collecting a lot records so we have now covered only record required capabilities but as you can see there are some overlaps for some capabilities two or all three tools can handle them so let's look into how to choose I have here on this line I'm extracting the capabilities that have overlaps and then I have marked which tool what capability it can handle so now the question is how how to reason how to decide what tool to use for which capability one simple way is of course to take a capability and then break it down to feature that you think are required with such capability so let's look into for example the edge server that all three tools can can handle so here I have listed a number of features that customers frequently require you from an edge server for example when it comes to security you want to be able to protect your API C using OAuth or Open ID Connect you want to be able to set up TLS protected communication automatic very user tool such as let's encrypt or something like that to automatically create provide and rotate TLS certificates for you in in the edge server and when it comes to routing maybe you want to be able to do both your elbows based routing and had the best routing and of course it would be very convenient if you can observe the traffic and and also manage the traffic all the way from the edge server not only within the micro services in the service mesh to given this set of required features you can map them to what the tools actually can handle and in this case you see that you can clearly see that it's the edge server in the ingress gateway seems to be the best choice given this feature set and you can have your own required features and do the similar mapping that that responds to your requirements another way to reason is to take it from a more principled level and say that I prefer to be able to to add a capability in runtime using a platform such as kubernetes arrest you over having to build in application library into my micro services the way that you do when you use spring cloud if you do with using it at home you don't need to be depending on on the language or framework used in the micro services which you have to when you use spring cloud and two exceptions to this principle in my own experiences first when it handle no when it comes to handling trace IDs for this tribute tracing its micro service involved in in in the cold shanks or responsible to take the trace ID from incoming requests and put them on the corresponding outgoing requests and that is nothing yet that you can do from the outside no one outside of the micro service can guess what incoming outgoing requests or correlated so it has to be done on the inside and to be able to get help to do that you need an application library you cannot do that from a platform and secondly when it comes to implementing resilience mechanism such as timeout retries and circuit breakers my experience is if the best result comes from using an application library and and write your own code for with error handling for example what to do when a circuit breaker opens up and you you will do a fast fail then you get a better solution if you do it with an application library compared to injecting such run such features in runtime so given that reasoning you can end up with the following decisions for example the when it comes to service discovery and handling configuration for the micro services you use Cuba neatest features there the server you use East use and when it comes to distributed tracing you have a shared responsibility between a library and spring cloud and Andy still and finally for resilience you use a library that comes with spring cloud and for you who works with these tools and now wonder what exactly exact features in each tool I am referring to I I have here a very detailed table with those names I don't want to spend time to go through it you will have it in the presentation afterwards instead let's go back to the capability mapping here I have omitted the overlaps the one that I did not choose so I now better get a much cleaner map here it's only in the area of distributed tracing where I have an overlap for Spring cloudiness new shares responsibility on the right side okay so how would this look in in run time then well let's go to our demo system landscape and see how the tools will be used today's demo landscape consists of four microservices one composite service that can aggregate information on three other micro services that contain data in in databases so one that contains data about protein formation and one for recommendations and one for for reviews of products and they store the data in the mix of non-woody B and my sequel so if we deploy this to kubernetes the micro services will run and as containers or as pods as they are called in communities lingo if you use the communities and communities will provide capabilities for service management service discovery and it will hold the configuration for the micro services if we then deploy a service machine meaning is still in this case we will at runtime in the act is to proxies in each micro service and we will put an ad server in East us ingress ingress gateway in in front and we will use easiest control plane to handle traffic management and also inject security features into into the micro-services as I will show you shortly and the control plane will collect traffic information and we can observe the traffic flows using a tool called key ally in East EO and we can use another tool called jagger to show call chains from distributed tracing and here we also have added the spring cloud library sleuth known as the detective that will happen that will help the microservices to automatically take trace IDs from incoming requests and put them on outgoing requests we can handle both synchronous and asynchronous requests and the mix of them or both so it's a very helpful tool or library next when it comes to monitoring SEO comes with two already well-known tools for monitoring Prometheus and Giovanna Prometheus can collect and store times the us-based metrics Anglophone I can visualize them Prometheus understands Akoo beneath this runtime structure so it can find new containers or pods in runtime but it needs help to understand on what port and what path it can collect the metrics from so we need to add some annotated metadata in the pod configurations where we tell from its use where it can find its metrics like the example here when it comes to logging we really is fluent e to to collect log records from from our patron almighty services we will store them in in elasticsearch and we will use key burner to visualize the collected log records and fluently can in the same way as prometheus understand cube latest runtime so it can find new log files automatically and it can also enrich the log records that it reads from Miller from the log files with Q Benitez metadata search yes in the name of the worker node what namespace the micro services were deployed in and the name of the container that the micro service runs in and that of course makes it very much easier to understand the log records afterwards when xx has been reached and with these communities mutilator and spring cloud sluice also have a part in this it can take the trace IDs and that is valid for for one and the Lord record is written and enriched the log record with this trace ID and and that is very good because then we can find correlated log records from different micro services participating in the same request or change or change low using this trace ID I will shoot in a demo in a minute and then the final part for resilience we will use the library pointed out by spring cloud named resilience for yay and you see these arrows from Prometheus and control plane down to the composite service they were actually applied to all microservices and also to the IDI server the ingress gateway but I have not drawn those arrows because the picture would be so cluttered so I only have done it for four for the composite and you have to remember itself yourself that I goes to one microservices okay so this is how a runtime environment can look when you use these tools together and now I would like to demonstrate how they actually can work could that be of interest yes or do you need early lunch no okay so I have to leave PowerPoint so let's see so yes here I'm running a low volume a low test it submits approximately one request per second to this demo landscape and now of course it would be very interesting the reason why I have such low volume is to make it possible to follow the output with your eyes simply and now it would be interesting to see what traffic this results in into the demo landscape so let's jump over to key Ollie that is the tool in East you that can show that the actual call chains and that goes through the service mesh currently so it simply reads information from the control plane and then visualize it like this it's quite neat so here we clearly can see that the track base traffic is coming in from the outside into the edge server is to use ingress gateway and it routes a request to the product composite service and it makes calls to the three micro services it holds data the recommendation review and product service and I read the data from the databases are the MongoDB on my sequel and you can see some statistics here the average response time currently is 47 milliseconds and you can see the padlock here it indicates that I have given instructions to the control plane to automatically protect all communication using mutual TLS so it will create provision and when required rotate certificates to each of these components automatically and lockdown it lock down the communication with HTTPS and that is of course only applicable within within the Service Master the certificates are not valid outside but it's of course very useful if you want to protect your communication in the inside for example to Benitez that's the observability let's move on to analysis here I'm using Cubana and it shows log records from elasticsearch I'm viewing log records for the last five minutes and I have an automatic refresh rate of five seconds hello thank you and we can see up here to the left that we have approximately 3000 log records coming in for five minute it means ten log records per second in average and now the question is how can I find my lord' records well let's start to make our own request I have prepared one here here is a URL to the edge server that I can reach for my computer here is the path that will route the request to the composite macro service and here I asked for information regarding the product with Product ID 114 and to get through eyes I also supply an access token or that access token as a standard bearer token let's make this request there it's done it took 60 milliseconds a lot of people is moving on the scene we can see we got information back from regarding prorack 114 we can see the name of the product here and I recognize his name it is the name that is stored in the database so we got data from the database and then we get some aggregated information that we don't care about okay so how can we now find related log records from this request well we can for sure start to search if there is analogue output that contains product ID 114 and yes we got we got one hit we can see someone has written this who is lat well we can use the information that fluently added to our log records in this case the container name yeah that can be seen so we see the name is comp and that short name for the composite service so it's a complete service as written this log record okay as any of our other Microsoft is written and log records well we can use this trace ID that spring cloud sleuth added we can filter on that one and then we remove the search on the business key here and only search for the trace ID and then we get for record records run one from each micro service so these log records comes from the execution that I did a second ago is this is to me a very nice way to find your not records among thousands and millions of other log records let's move on to distributed tracing here I'm using dagger to visualize Jenkins call chains chain Falls hmm and I can ask for all that have involved the composite service and I can ask for that 20 lost once and here I can see the time serial it comes in approximately one request per second and the response time of that call chains varies between 15 and half to 30 milliseconds I can take one random and see exactly what happened in that call chain so I can see a request coming into the ingress gateway it delegates to the composite service and the composite service calls product recommendation and the review services and you can see the exact number of milliseconds taken for each request and very important here you can see that the request from the composite service to the 300 services were done in parallel there were no sequential processing so it made three concurrent requests and then mate waited for the responses and that short us of course the total elapsed time what's quite interesting here is to see that the product service starts its processing a bit after the other to approximately one millisecond why is that keep that question in your mind and you will see the answer shortly what could be very interesting is to see the cold chain exactly for for the request I did so if we take this trace ID from the long output that we found in the long records and go back to Jaeger and ask for a coaching related to this trace ID we will now get exactly the trace the call chain responding to the request I did earlier on now you wonder how can I know that they all look the same well I can actually dig down here and get some metadata so here I have the URL that initiated this cold chain you can see here that this was initiated by someone requesting information for product the Product ID 114 and that was me no one else in this cluster ask for that Product ID I know that this is my cold chain this I think also is very valuable to be able to work with the tools together in this way the final demo is about monitoring time series based metrics so Prometheus will collect hundreds and thousands of metrics from our micro services and I have chosen one of these metrics it shows the state of a circuit breaker that I have placed into the composite service and it monitors calls to the product service and currently is quite boring it has the value of zero meaning that everything is fine circuit is closed allowing a quest to go through so now I want to to introduce trouble problems network related problems I want to introduce a delay in the calls to the product service and I can do that by using East you the service message and its traffic management so here I tell Estill to handle calls to the product by adding a fixed delay in this case one millisecond and that was why the product service started a bit later than the other two micro services due to this delay but I can increase this significantly so it's notable notable so we can notice it so I increase it now to three seconds but I asked the control plane to fix this for me and he talks to the proxy in product service to add a delay of three seconds why you should do why you would like to do that you can discuss later but anyhow what's important here is now you can see that the load test reports a delay not of three but of two seconds and that's due to that I also have a timeout in in the composite service assets at SS after two seconds we will not wait for a response and after a couple of timeouts the circuit breaker says okay we have a problem there's been several failed requests in a short time period so I will open the circuit end of the law and the request means that I will fast failure as as soon as anyone tries to call the product service therefore I get these short responses and then after a while I get long responses again and that is when the circuit breaker goes to half open state allowing a few requests go to go through to see if the problem has been resolved and if not it opens the circuit again so it continues so let's try to make a new request here and here I only extract the name from the response to don't get too much output we can see that we get a fast response 37 milliseconds but the data is not known is no longer the data from the database instead this data comes from the first fail logic in the in the composite something that it does when it can't yet information from the product so it could be a cash value a hard-coded value whatever so how does it look into into the monitoring them now things have happened here the Prometheus have picked up new states from circuit breaker then going between 1 and 2 meaning open and half open meaning that we have a problem with this circuit breaker and Griffin is configured for noticing this so it sends out an alert here you can see the alert in there not list as well and I have also configured graph ona to send out the mail using SMTP to warn an operator so here I have a mail server also deployed in in the demo landscape and here I can see that I have got a mail that wants that this circuit breaker is now open okay so now let's let's simulate that the temporary network problem is resolved so we lower the response here to 1 millisecond again and after a while the circuit breaker notice that the communication is okay again by going to half-open and submitting a request and it says okay I got the response back and after a while it closes a circuit so I can see I have short response time for the second 47 milliseconds and I got a response back that I recognize from the value stored in the database so the system landscape have self healed from the temporary problem it handled the problem why it was occurring by by open the circuit and then it recognized that the problem is now gone so it's self feels and closes a circuit I find this extremely important to have this kind of features specifically in the large-scale system landscape with cooperating micro services now we can wrap up by looking into into the monitoring tool again yes here we can see the state has gone down to zero again meaning that the circuit is closed and grifone has noticed this and sends out an okay a lot so to say and we got the new mail telling us that everything is fine again you can continue with your morning coffee you don't need to rush away that's good okay let's continue what is this on what time okay so it's time to summarize micro-services promises that they are easy to scale and faster to release new versions compared to him on the lid that's good but cooperating micro-services forms the distributed systems and with that comes inherited complexity and this challenges that comes with the complexity can be handled with open source tools such as spring cloud with kubernetes and East you you have to be aware of that they come with some function and overlaps you need to make up your mind what to use for what capability or challenge and if you know how to use them together they can be set up to work really good together and if you now are interested in to set up runtime environment similar to this demo landscape and maybe you think I did not respond to all your detailed questions on how to do that of course I like I recommend my book on the subject at a very detailed level has has comes with hands-on instructions for how to do how to do that and if you want to and discuss any in depth questions the very interesting from from your perspective you can come over to our booth on the balcony I will be there for the rest of the conference which is a couple of my colleagues yeah that's it I'm done with the presentation are there any questions now if I wanted to use Kafka in such an environment instead of RabbitMQ or how would you integrate that I'm using another library that I didn't talk about spring cloud stream that has an abstraction over messaging so it has two adapters one for RabbitMQ and one for taffeta so it's actually a separate chapter in my book that covers exactly that question thank you for asking this is that the case question a good answer different types so the question was is it always true that different micro-services are not allowed to use the same data or database as long as I don't share data through the database I find it's okay so if they use the same database but for example different schemas it's totally ok and what's also important here is that I'm talking about different types of micro-services not different instances of the same type or micro-service of course multiple instances of the same type of micro-service can share the same data but not different types is that the good answer exactly yeah yeah they should not share the same tables but they could share the same database instance for cost or operations perspective yes you can but you should not in my experience so the question was should you use distributed transactions with microservices and even if it's technically feasible I recommend strongly in order to use it instead you should design your system to handle eventual consistency sorry if you yes if you're coming from a monolith and used to one single database and one big transaction that tells us everything of course it's it's not very obvious how to do it that's really true you think about distributed transactions a eventual eventual consistency comes from design I would like to say not from tooling you have to really think about how you update your data and when you can read the updated data again and how much you depend on that the data is consistent with the update you did one millisecond ago do you have any experience on SEO and other protocols for instance G RPC not yet I only been using it with the HTTP it would be very interesting to also use some for database and messaging systems and the RPC if you use that of course but I am not yet they are working hard on it there's a constant evolution of the protocols that I support any other questions or do we have lunch time okay so then I think we've wrap up thank you - [Applause] [Music]

Info

Channel: Jfokus

Views: 1,525

Rating: undefined out of 5

Keywords:

Id: Ib4CS_j45mY

Channel Id: undefined

Length: 45min 9sec (2709 seconds)

Published: Sun Feb 16 2020