How low can you go? Ultra low latency Java in the real world - Daniel Shaya

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Id be interested in people's feedback on the topics I raised in this presentation.

Particularly if you agree that there is no sense in multi threading low latency applications when you could achieve scalability by striping the jvms instead.

πŸ‘οΈŽ︎ 32 πŸ‘€οΈŽ︎ u/DanielShaya πŸ“…οΈŽ︎ Dec 20 2018 πŸ—«︎ replies

That was a really interesting talk, thank you for posting.

πŸ‘οΈŽ︎ 6 πŸ‘€οΈŽ︎ u/reapy54 πŸ“…οΈŽ︎ Dec 20 2018 πŸ—«︎ replies

Most microservice discussions assume JSON over HTTP(S) or similar. It was interesting to hear about the use of shared memory for microservice communication in low-latency scenarios.

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/lurker_in_spirit πŸ“…οΈŽ︎ Dec 23 2018 πŸ—«︎ replies
Captions
welcome to how low can you go now how do we know we're amongst geeks over here well first of all Jack Sherazi from the Java performance tuning newsletters in the house secondly it's Halloween and you're all here listening to a java talk I mean haven't we got anything more exciting to do than that well hopefully in terms of trick-or-treat it's a treat for me to be able to be here presenting to you and we might be able to learn a few tricks about low latency programming before I start I'd like to thank the ljc for organizing the event and in particular I'd like to thank Dominique who I corresponded with I looked recently there were 51 emails no less than 51 emails to get this event organized unlike the low latency messages these messages each one needed a considerable amount of attention so thank you to dominique as well so what is this talk about first of all it's about ultra-low latency programming now what is ultra-low latency program I'm going to define it as sort of in the area where we're having response times of less than 100 micros and also and most importantly where you care even about your outliers so you care about one in 10,000 that's the four nines and the way we're going to look at this we're going to look at it through the lens of three questions the first question is is Java an appropriate choice for a load latency system then we're going to have a look at what specific development techniques do we have to employ when it comes to low latency systems and thirdly do micro system do micro services have anything to offer low latency systems okay I want to start with a story about the low latency industry this might be familiar to you if you've read the book boys so in June 2010 spread networks laid a straight direct cable between Chicago in New York right no mean feat it was 827 miles long it costs 300 million dollars to lay and if you want to use it it will cost you one hundred and fifty thousand dollars a month now why go to all that trouble and the answer is because it's three milliseconds that's three thousandths of a second faster than the competition okay now just remember a GC pause can easily last over three milliseconds so imagine someone from your organization has just shelled out a hundred and fifty thousand dollars a month and you've got latency spikes of three milliseconds completely negating that secondly another anecdote I was working for a market making market making company and we had a buggy in our system every so often there was a latency spike now how did we know about that we were sitting on the trading floor that's often you do when you're a low latency developer and every time there was a spike this happened right there was a trader and he started yelling why did he start yelling because we've been arbitrage to the tune of thousands of dollars and this happened regularly it took me two months to get to the bottom of the of why this was happening and that's a whole different story it was quite interesting but the moral of the story is latency can cost you money and the finally in finance low latency is a huge business and it goes all across the board so in hardware you can buy overclocked boxes they're fast network network switches acceleration cards they're measurement devices they're timing synchronization devices it's massive and then there's Colo so if you want to write really quick fast code one of the benefits you get is being as close to the exchange as possible now in order to get that benefit you have to pay a large amount to the exchange to get space next though their servers also software software developing and buying specialist software is expensive and you have to pay developers a premium for low latency experience very important for all of us now why should I get into low latency I was the sort of child whenever there was an electronic gadget or anything like that I would start taking it apart I wanted to know how it worked and one of the recurring themes of this talk is there's an awfully long way between when you write a line of code and it getting executed by the hardware on your machine you have to open it up you have to know how it works and the more you know about that process the better the low latency programmer you will become and I enjoy that sort of thing and that's one of the reasons I quite enjoy this area of technology secondly you get to play with all the new toys you're not stuck on an old version of Java and an old version of j2ee on an old server you tend to get the new toys first because you're the one who can make the most difference and second and thirdly if none of those tick your boxes well you've got the possibility of a highly paid job at the very least okay and end of my preamble and how many of you were programming Java 1.0 1.0 3/4 of you so you remember the days when Java was a Noddy little language to write applets and I find it remarkable that we've gone from the days of applet writing to where we are now where people consider it a good choice for a low latency system and there's a bit more about me at working with Java therefore for 20 years and in real time and hft system 10 years done a variety of different applications my career have been through software development startups bigger companies I work for open source for Chronicle you should all check out Chronicle software if you're in the low latency space and now I work as a low latency Java consultant to investment banks so before we get on to the first question I don't want to take anything for granted with what people know or don't know so just an introduction with timescales a millisecond is a thousandth of a second a microsecond is a millionth of a second a billionth of a second is a nanosecond now what about the speed of light light travels roughly one foot per nanosecond and that's in a vacuum if you're going through a fiber-optic cable which we always are it's about two-thirds of that so if you want to go from London to New York it's going to take you tens of milliseconds now you might think well a nanosecond that's not very much time but let's consider this if you've got a CPU running at four gigahertz you're performing roughly four cycles you can do four instructions per nanosecond if you have parallelism you have parallelize and pipelining in the CPU you might and with parallelism you can get roughly around a hundred CPU clock cycles of compute power the time light has travelled a foot now if you've got lots of feet between you and where you want your message to go to the speed of light actually does start to matter now we might think we're very clever here you know we're talking in nanoseconds we really are the bee's knees when it comes to low latency well the physicists just laughed at us in neutrino physics they measure in units called nano barns which are 10 to the minus 3 33 which really puts us to shame nevertheless let's start with the first question question number one is Java a good choice for a low latency programming anyone recognize this premature optimization is the root of all evil one of the most said sayings in our industry anyone heard this before yeah most of you anyone know who said this who's this attributed to anyone smooth yes of course it's Knuth and where did he say it he said it in a paper called structured programming with go-to statements came out in 1974 so this is nothing new and the idea behind this is don't worry too much about how fast your code runs first get it functionally correct does anyone agree with this anyone put your hand up if you do yeah most people agree with this right let's just look because at what he actually said because I you know you take things out of context that light little snippet but what he actually said was we should forget about small efficiency say about 97% of the time premature optimization is the root of all evil yet we should not pass up on our opportunities in that critical 3% and actually in my experience what happens is in most systems you want to tune the latency critical path of your software and that can be just the small part of your system so that's his 3% anyway let's go back to the question should we build and then optimize or should we optimize should we build with the optimization at our forefront and that's a really important question and we have to know the answer to this so let me digress before I answer that question now this is a really fast bike I'm a bit of an amateur cyclists and that's the sort of bike I'd like to own one day now actually I have a reasonably good bike and I could customize it and I could get carbon fiber aerodynamic wheels like the one in the picture I could get an electronic gear set a better hub at more aerodynamic clothing and maybe if all those worked out really well I could add an average 5 kilometers an hour to my ride now I'm not very fit but if I was a Tour de France rider maybe they could bring their average up from 30 kilometres an hour to 40 kilometres an hour but let's say we wanted to average a hundred kilometres an hour it wouldn't help what sort of bike we would need or how much we could customize it or how fit we could get for that we'd need a car and it's exactly the same thing for programming languages and frameworks you need to understand before you start what is possible within your domain so I'm going to look at a very simple system here this is the dma application direct market access your client sends you everything in the box is our application it's made up of three separate processes the client sends in messages we have a fixed parser that passes these messages maybe does some sort of validation it turns it into an object we pass that on to an OMS that's an order management system that does some risk checks it consumes market data and if everything's ok it passes it on to another process which has a fix engine in it that passes it again turns it into a fixed message and sends it off to an exchange so what I'm measuring is from here when the packet hits that edge of our system - when the packet leaves the system let's look how fast we can go in different languages so now these are ballpark figures I don't want to start any flame wars you might have done this sort of thing but these are ballpark figures for Java if you're really really good and you have really good hardware you can aim for roundabout ten microseconds and that's your fastest your that those will be your fastest response times but that sort of ballpark you can sort of aim for that now if I was using C and C++ I'd probably also be looking in that same ballpark right if I was using an FPGA for the same system I could go down by an order of magnitude and be looking around about the single microsecond mark and if I was really really quick and I used an ASIC and we'll talk about this you can go for about four hundred nanoseconds there are people who claim even faster but those are the sort of ballpark figures and and if you want to use Python we'll just forget about it you know you're not you know you're not at the races I'm afraid now there are particular challenges using Java I know that Java is not the cure to everything let's just have a look at them so let's start with the elephant in the room the elephant in the room is garbage collection and it's a managed language so it runs within a container at JVM and every so often we're going to have to do garbage collection the world comes to a stop and we can pause for an arbitrary amount of time and we're going to have to address that and there's the Walmart Java is rubbish at the beginning when you start it up so the first 10,000 iterations just write them off so you need to a way of priming your system in such a way that doesn't take down your bank or hedge or hedge fund by sending horrible orders to the market you have to be careful what you're doing but you do have to do it thirdly there's unpredictable compilation sometimes are not I found this in my experience that you see for whatever reason Java tends to decompile bits of your code recompiled and maybe there's a branch prediction that went wrong something like that it can be unpredictable and cause small amounts of latency and here's a really big one there are no you don't have control over the memory layout you can't do the effect that the same thing as you could do with a C struct you can in c-sharp but you can't in Java they have been talking about putting this value types in the language I don't know for years and years and years and I'm still hoping for them to come in very soon and this if you want to start playing with the low-level concepts a slightly unnatural way of programming it you use unsafe and unsafe is not meant to be safe and finally there certain CPU instructions that are just not available to you in Java what about C++ well there's a reason a really good reason why Java has replaced C++ in in in as the most using programming language in finance and the reason is is because it's way more productive and just easier and they're more developers and they and the whole environment is richer now than C++ and you want to spend your time developing not pointer chasing so that's why C++ may not be your first choice even though it doesn't suffer from a lot of the problems we saw on the previous slide and if you want to be really we fast you don't use C++ anyway you would use FPGA FPGA is now the up-and-coming technology it really is becoming ubiquitous amongst banks and I've just been told that even you can get FPGA is even on Amazon on their cloud services so the space between the utility of Java and the speed of FPGA is constantly contracting and C++ to a certain extent although there will always be a place for it it is being squeezed out slowly but surely how about programming in hardware just spend a brief moment talking about that so an FPGA for those who don't know it's a field programmable gate array so it's a chip that you can basically program it's not as fast as an ASIC which is an application-specific integrated circuit which is a one-off production now the problem with these are although they're very fast it's still difficult to program in them so an FPGA a typical compile will take about eight hours it's slow to market one of the things you want to have with your ultra-low latency system is you want to make a change in your code and you want to deploy it pretty much as soon as possible you don't want to wait hours and hours for for you to be able to do that as I said they're very expensive and it's interesting they found this graph of when do you produce an ASIC this is that for people producing washing machines and stuff it's when you go over about 400,000 units it's worth doing an ASIC rather than an FPGA so it's worth running your own run of your specific chip rather than doing a programmable one now we're talking about a run of maybe one or two so imagine the price of of writing your own AC I don't doubt there are in hedge funds that have their own Asics I've never come across it but the the speed with which some people are reacting in the market today is is pretty fast anyway we're not really talking about that I want to talk about the Aron story and this is something I heard from Martin Thompson he's that guy over there these are three really clever traps who developed a new ultra-fast low latency measured messaging system so originally they wrote it in Java and then they ported exactly the same algorithms to C++ C sharp and go and which one do you think was fastest given they're very good developers and they and they ported the same algorithm someone when I guess well of course the Java one was the fastest yeah the Java one was the fastest at first but then they did a round of optimizations they spent some time and here's the real shocker which one came out faster after the first run well what do you say yeah go you think go no wasn't go it wasn't C++ it was c-sharp and the reason c-sharp gets the nod over Java was because you can optimize the memory layout now martin is convinced that if they spend more time and effort doing optimizations that c++ would be the fastest then that sort of makes sense however the the question is where do you should where should you spend your time how much time do you want to spend getting more and more out of your language as opposed to improving the rest of your code the reliability the measurements the testing the algorithms okay but it just shows where Java is now this slide will be familiar to Jack sure RT who's sitting with us it's taken from one of his newsletters the Java performance tuning newsletter which you should all sign up for and and we categorize Java programming programs into four categories by their target latency one is seconds hundreds of milliseconds tens of milliseconds in under a millisecond and you have to have these different types of approach when you use when you're aiming for those different types of low latency targets what I'm going to say is I'm only looking at number D under a millisecond I'm not sure about all of these because I have tried some I have better effects than others but the really big one is you are not allowed any GC so but that's those one thing you take away from that it's it's over there and so how should you not just a couple of slides talking about not creating garbage so you don't want to force people to have to allocate and one of the ways you can do that is by creating really good api's I'm going to highlight a few good api's but this is from chronicle map for those of you who don't know chronicle map is a memory mapped sort of map where it allows you to write to the map interface but store the data in the memory mapped file so it can be shared amongst different JVMs and normally you'd use the get method in a in a map the problem with the get method is because the object is being serialized from off heap memory you'd have to create a new object every single time you called get so what they've given you is this get using function and the get using takes in and it takes in this name which in this case is a stringbuilder and what it does is it hydrates that string builder so you just keep repopulating or you've got the to keep repopulating the same object so you're not doing any allocation and you should always consider that when you're writing your code as well and here's something else and strings are probably the biggest cause of allocation within Java and you do have an option you don't have to create strings and string implement something called char sequence so often you can work with char sequences instead of strings so over here for example let's say you're reading off a file or reading off a stream a TCP stream you can take your byte buffer you just find the beginning of your word your string and the length and you call a set and no point have we created a string only if you call two strings you create a string and you wouldn't call two string and you just use this instead of the string and what you can do here is let's say you've got an interface say I've got um I've got a piece of text and I want to count all the unique words in that piece of text so I've got a method here called count unique word and instead of passing in the string word forcing the person who uses my API to create a string I'm going to use char sequence instead what am I going to do with char sequence well there's a very very nice library from a guy called Roman Levin Tov he's someone I worked with at Chronicle it's called kolobok and kolobok had produced its own maps and one of its maps is called here these comes from the hash of long maps and it creates an object long map for you and what you can do with this map is you can actually call with key equivalents which means that when I want to look up a word if I want to see if my map contains my word I can use the char sequence and they'll use the equals method on the char sequence rather than the string now if I'm going to have to put in but the first time I put something into that map and unfortunately I'm going to have to create the string because I need the string in the map but subsequently to that so 99% of the time I just go into this piece of the code and I just use add value if I had the word again no string creation and have a look at kolobok if you want to get into this and so the summary to question one is Java is a reasonable choice for a programming language for a low latency system if we're looking at greater than about 10 microseconds now if you're looking below that don't even start with Java would be my recommendation because it will be really really difficult to get there okay question number two how should we approach software development when it comes to low latency systems again let's digress okay there was a podcast I don't know if any of you heard it I can't remember the name of it now they got famous Java personalities to come and talk about what they were doing and as an icebreaker they always asked is computer science a science so I'm going to ask this now is computer science a science those who say yes put your hand up that's most of you those who say no put your hands up okay most of you think it's a science well it's one of these semantic arguments isn't it let's define what a science is and then we can know what we're talking about so what is the science a science seeks to explain phenomena to through theory hypothesis and experiment in an effort to ascertain natural laws so that's what most of you think you're doing all day long really okay now I would say we're more in the engineering field and that's why even though we're called computer scientists were also called software engineers aren't we engineering builds one level above that and it seeks to apply mattre laws to the solution of practical problems probably more what we're doing every day now I would say and here's my argument that when it comes to low latency programming it is a science and that's because you have to use the skills of hypothesizing measuring and then explaining what you're doing so low latency requires a scientific approach and this is these are the steps you need you have to come up with a hypothesis so you think my code is going to run faster because I'm going to do this I have no idea whether that's the case or not I don't know I can't test it in j-unit I have to build a very elaborate experiment probably you know have to test my whole system I can't just look at one piece of code in isolation with a jmh benchmark or something if I really want to measure it properly and then I can draw conclusions so why is that and this is something I mentioned earlier and the reason is because our code is a very long way from from what we write in Eclipse or in IntelliJ for what it for till when it actually gets run on the hardware it's very hard to reason about that and I'll tell you something that happened to me I was working on a system in a bank and I thought look here's a perfect opportunity for an optimization a really good way to make a name for myself right I did make a name for myself in the end but for not not not for good reasons because of this and I said there's this calculation quite an expensive computation which was being done in a tight loop over and over and over again I said why don't we just cache these values right so I I wrote I spent some time wrote an efficient data algorithm a data structure to store all the cached values and then we ran it and it turned out it was three times slower when I actually ran it using the cached value than when it actually computed it each time and the reason for that is I'd made the classic mistake which is not understanding that on the CPU and you can practically say that operations on the CPU are almost free when compared with the memory access so the memory access takes orders of magnitude longer so even if you have an a relatively expensive computation sometimes it's faster to run that than to go and fetch in data especially if it's in the level 3 cache or even worse if it's in main memory into your CPU so that's why I say we need a scientific approach when it comes to low latency programming now just quickly what is a real-time system anyway and here's some ways you can describe a real-time system it could be your speed it could be a function of your throughput maybe you're mean latencies or your slowest latency or being predictable with your latency or even a function over it your reliability and these are the general classifications for real-time system there's a hard real-time hard real-time means when your slowest response is what really really matters that that is the be-all and end-all so if you've got a weapon system or a pacemaker you do not want your slowest response ever to exceed a certain threshold or you could have terrible consequences then the exact opposite to that is first to the bell and that would be lets say a cycling team where you don't really care which cyclist wins from the point of view of the team you just want one of you're cyclists to win even if some of your cyclists are really slow or even drop out completely as long as one of your cyclists is the fastest that's okay then you have web real time which means as long as the user doesn't notice well it's still real time the user doesn't notice how fast as they may not notice certain spikes they may notice others and then we get to what most low latency systems are which is soft real-time we do care about all that all the latencies we do care about them but if there's one particularly slow latency it's not going to kill us and we don't really want to sacrifice our mean latency too much for the very very furthest outliers okay and a typical low latency graph for those of you anyone familiar with that sort of curve anyone seen that while they're measuring low latency where anyone working in low latency should have seen this they always look like this they always have this sort of hockey stick where it sort of looks pretty nice up until the ninety nine and then it goes sort of shooting off the end and this is something I measured when I was doing Chronicle figs and we always measure this way because we care about the outliers and the problem is as you go up up the latency graph the it gets harder and harder to diagnose and fix the issues because it might only happen one in ten thousand runs how on earth you going to fix something that you only see one in ten thousand times it's really hard and it requires sort of specialist tool specialists experience to be able to do that now coordinated emission is something really important normally how many people have done this when you have a system you want to time it what do you do well first thing I do in my system in my start method is I go long start equals system dot nano time so that's my start time wait till the end then I get time equals sister system tie system nano time - start ah that's how long it took but you might be have a system that looks like it's running really well in fact what you've got is an overcrowded train station so what do I mean by that when you take a train somewhere you time latency by the time that the train was supposed to have left note supposed to have left till the time you arrived not from the time your train left and that's the danger of doing that start the way I described it at the beginning you have to be really careful to measure from the time that the operation should have started and too many of our systems to many of our test harnesses don't do that this is exactly what happens you have one spike that one spike what it does is it causes a knock-on effect that's what this line is it can cause it for hundreds and hundreds of further measurements until we reach normal again that's what you have in a train station if you ever use Thameslink I don't know how many of used Thameslink over the past year you'll feel like you're somewhere along that and that line over there and what effect does it have on your percentiles this is something taken from guilt na he he works for a zoo systems and this is what it looks like uncorrect 'add Oh looks like your percentiles are doing rather well but no they're not they're they're over there so always remember to measure using coordinate emission now one thing you should always do is document your latency requirements and I've been in many a dispute between software vendors and cly about you know what did we mean when we said low latency so this is the very worst type of requirement the messages should take a hundred microseconds okay really what what does that mean so a real-time contract looks at you have to have at least these components activity a now be really really specific about what you're measuring when it runs as part of a system s so is it gonna run standalone or you're gonna run within a bigger system and if it runs within a bigger system that's going to have an effect on you and for a throughput of a certain amount of events per second always always consider your throughput your throughput having a higher throughput by the way doesn't always slow you down sometimes it can speed you up because it keeps your software hot for a duration of time to you how long are we going to run the test for are we going to have bursts within the test on Hardware age define exactly what hardware if you have an overclocked box I was just using the other day 4.8 gigahertz overclocked box you get dramatically different results from when you're running it on a two-and-a-half gigahertz machine and how is your operating system gonna be configured are you gonna have admin rights to change critical things are you going to have network acceleration are you're gonna have solar flare Mellanox cards in there and then your latency shouldn't exceed a certain latency at a certain percentile that's really important and you have to go all the way through the percentiles and also make sure where are you going to be correcting for coordinated emission or not and how are you going to time it are you gonna use let's say a kohrville device so you're gonna packet capture or you gonna rely on some sort of printouts or hit HDR histogram watt or something like that and this is an example of something we did when we were at Chronicle fix you can see we were very specific of the throughput the duration the type of box we were running on and we went all the way down to the percentiles even up to the worst now I would recommend never ever put in the worst because it's as I say it's really really hard to get your worse one down but that was in there so for ultra-low latency development you will need now I don't know if you when my kids were young they used to have these comics and in the comic there was always this thing you could make using sticky back plastic and washing up liquid bottles and before you started making it they always told you exactly what you would need to be able to make this model to make sure there would be no disappointment at the end if you were missing one of the parts so what do we need so if you want to be in the sub say 20 microseconds at the night at the three nines what will you need first of all you need one powerful server with admin rights to run the application yeah the second is you need another server to drive the test harness it's always much better if you can replicate what's going to happen in the real world okay you're going to need a very fast switch or a cross connect between the boxes you can only packet capture of some description so how are you going to measure the traffic in and out ideally you want to measure it right at the edge so this this helps with coordinated emission and before it gets even through the TCP layer and a standard industry mechanism nowadays is using a Courville device but you could use Wireshark if you capture your packets and you're going to have a to have a test harness a really good test harness that produces your percentile data and graphs and ideally you want to automate this so you're not going to have to manually run this every time you can just click a button and the whole thing will run and here's some tips on measuring always measure like we've said from edge to edge and that D cut and also decouple your test application from the app from the app test harness from the actual application you don't want to feedback loop so you don't want to say I'm only going to send in the next request when the last one's completed you want to say I'm going to keep sending them in err to regular rate whatever latencies are experienced by the application you need to be able to view the percentile data so have some nice graphs or and especially tracking it over time and as well as your percentile data you want to view your graph time graph data and this is critically important let's say you see a spike at your 99th percentile you say oh there's this rather large spike now it could be that all those points of latency are at the very very beginning of your run and what does that tell you that tells you you haven't warmed up properly however if there are regular intervals well maybe you've got and you haven't optimized for Nagle's algorithm or something like that if they're very regular if they're sort of clumped well that could be something else so being able to view your latency across time is also something I've found very very important now have any of you read the seven Habits of Highly Effective People yeah you've read it yeah always the same people done so one of the one of the things he talks about is only work in your circle of influence and this is so important when it comes to low latency so work out what you can possibly be responsible for so we never build on hardware we're always building at least on the operating system and usually on top of a framework so benchmark your minimum application using your framework so if you were let's say using apache spark or sprung course Plinko whatever else apache offers nowadays and just write a null implementation and see how long it takes look at your latency in that null application and let's say on the three nines it's 500 microseconds well that means you will never ever be able to go below that right so sometimes I see people and what they're doing is they build their whole application they see oh it's seven it's five hundred micros and then three nines let's see and they start optimizing and optimizing and what they realize it's nothing to do with their application it's what they're running on so always work out what you are responsible for secondly if you're responsible for the framework well then measure your operating system because if your operating system is not properly tuned if that's got latency spikes you're not going to be able to go faster than those so the summary in question two is be scientific when developing low latency code now third question micro services for low latency programming and and this is something I'm sort of quite into at the moment know the term low latency micro services isn't an anatomy or is it the way forward and when I say an Anatomy that's what I thought when I first approaches and surely low-latent microservices break up and introduce complexity into our process so something which used to be one per what in one JVM is now in three JVMs can that possibly work as fast and however the idea of messaging and low latency microservices it predates computing it's in fact incredibly ancient and here's a system an ancient system you might recognize this right this is our brain our brain is a low latency Micra service event-driven system unfortunately the be latencies are pretty high compared to you know we're measuring in milliseconds compared to what what we have in our low latency systems where we measure in microseconds I think those are milliseconds there but that's interesting and so I think actually and I know because I've built some that you can build low latency micro services and they can be really really efficient well what do you need what are the features the first thing is you need fast transport between the micro services if you're doing TCP calls between your micro services just forget it okay you're going to have to use shared memory how fast this shared memory well let's look at a couple of the open source projects out there if you look at Chronicle queue in an echo test sending a byte from micro service a to micro service be it being acknowledged and sent back so it's the full round-trip you can accomplish that in half a microsecond if you use a Ron you can do that in 250 nanoseconds which means a single trip on a Ron is only a hundred nanoseconds now even if you're talking about sub 10 microseconds for your whole application a hundred nanoseconds is negligible so the cost of sending messages between the micro services is pretty negligible if you do it correctly second of all make them single threaded you know you have this beauty of having different micro services make them single threaded and get the benefit of having a CPU ping to your micro service now there's been an ongoing debate between large JVMs and small jaebeum's but when I started and Jack started programming in Java we all wrote small JV ohms and the reason for that it's because we didn't have 64-bit JVM Zia so the maximum heap size was two gigabytes we wrote lots of small ones then sixty four-bit came along and people went crazy you know that their JVM is out there with a hundred and twenty eight 256 gigabytes of data with about to our garbage collection times in those and a million threads and we all have to be experts in in in threading multi-threading and your CVS full of oh look I can do this with the answer is no go back to your small JVM use single threaded micro services and you can get masses of benefits so firstly there's no synchronization so when I've done this the word synchronizer synchronized doesn't appear anywhere in my code if I want to do object pooling I don't have to have complicated strategies of well this threads taking this threads putting back all I have to do is use the single thread to put and take into my object into my object pool if I need to scale out all I have to do is partition my data and add more microservices hog the CPU someone once described hogging the CPU as this and it will be familiar for those of you with children it's like you're sitting in a car and your child asks are we there yet and we you say no and they say are we there yet no are we there yet no and we keep going until we say yes and that doesn't stop them but you know that's exactly what happens when we hug the CPU that's what we have to do when we're waiting for a call on our shared memory someone writing to us just spins keep spinning and make sure no one else has got that CPU so block all the interrupts from that CPU so that's important as well also record all your inputs and outputs from from your micro service make sure those are recorded in shared memory so that you're not pay a huge penalty but record everything and it's really important to have all your inputs and outputs recorded when you're talking about low latency because you need to be able to replay what's going on in production let's say in your test system you need to try different things out how you going to do that how you're going to find your ten thousand latent and one in ten thousandth latency spike if you haven't recorded your inputs and outputs so that's important and map your micro-services on to your hardware you have to again know what your Hardware destinations going to be so you don't want to cross new meridians your microservices shouldn't cross new Murray Tunes so if you have a Numa region with 8 CPUs work out which of your micro services are going to run on each one of your CPUs you've really got to think about that when you design your micro services and if you do all that by the way you have to do that even if you don't use micro services but if you do all that you can have really really effective micro services just want to end finally with the hardest areas in low latency systems and these are things you need to consider right from the beginning the first things like wait three data structures you're going to have to do I mean do people know the difference between lock-free and weight free anyone know the difference so a lock free data structure means that your thread can always proceed right nothing from there they're going to be no sleeps in in your system so those are really important when you're when you're doing low latency because a single time that you shed your your thread to sleep for whatever reason it can take a long time to wake it up again but wait free means not only is it lock free but that you can finish your computation within a certain number of cycles right you know how long it's going to you're not just going to keep spinning and spinning so you want to write data structures for that and that's another talk in itself how you do that but but those are that that those can be quite difficult to write fortunately that's quite a lot of open source stuff available and now resilience and high availability going together with failover if you want hot hot system now what we don't want to do we want to pass all our messages across shared memory but we don't want any of our messages to get lost so how are we going to deal with that I mean if we're waiting for each message to go from my machine to another machine and then we check that it's actually been replicated using TCP or even UD yeah UDP across to a different machine it's going to take forever so we need a strategy we've got to work out with our client with our business how are we going to deal with resilience what one good way to deal with it is make your endpoints responsible for keeping state but you do have to worry about those and second of all creating a good measurement and a good test harness can also be pretty difficult um let's summarize so we asked three questions and these are the three answers one Java is a reasonable choice I think to when it comes to low latency be scientific and three if micro services are adapted to for low latency they can be a very good pattern for low latency systems now I want to take questions now I think we have a little bit of time or I'll take questions afterwards and if you want to contact me probably the best way is on LinkedIn look up Daniel Shire thank you you
Info
Channel: London Java Community
Views: 27,787
Rating: 4.9306359 out of 5
Keywords: #java #meetup #lowlatency
Id: BD9cRbxWQx8
Channel Id: undefined
Length: 55min 31sec (3331 seconds)
Published: Fri Dec 14 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.