Java and Performance: Biggest Mistakes

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so before I get started I want to have a quick show of hands who of you consider yourselves at Java developer developing code everyday awesome what else are the others doing that are not what are you doing I'm pointing at you you didn't raise your hand just a hobby well it's a good habit any testers in here any performance engineers yeah performance testers yeah cool anybody that is responsible for operations for running an application like we heard before from gravity being afraid of deploying something if something goes wrong yeah cool nice anybody outside of Java is anybody dare to go towards the Microsoft space yeah it's good progress good stack yeah you should not be afraid of them yeah me neither I run my stuff on Windows that I'm still using ie is my main browser by the way can you hear me the back or do I need to speak up louder fine okay perfect so I want to talk about Java and performance the biggest mistakes that I've seen and the reason why I think I can talk about this is if I look at my brief history the last 15 years I've worked in the performance space I started with a company called Segway it's not the segways but we actually built performance tools load and performance tools so I was a tester actually on the silk performer testing tool which back then and no not many raised their hands when it was about performance engineering but back then we were mainly competing against a lode runner which is not an HP product I did this with Segway it we later got acquired by Boylan might be a company that you're more familiar with so I did this for seven years I started as a tester then became a developer later become you know got promoted to become an architect and then at some point in time I CTO said you're your development times you doing a good job and okay chop but I think it's better if you actually help people using our tools breaking applications and telling them helping them to figure out why they break because back then about 10-15 years ago load testing was not only challenging to generate low but even more challenging to understand the results of these tools I remember my most exciting adventure that I had I was helping the German unemployment office to run a large load test against their new system that they had to implement because they had change in law and I was asked to run a load test of hundred thousand concurrent users and it was very exciting for me unfortunately the whole adventure stopped after 30 minutes because we broke the app with a hundred concurrent users okay and it was very hard for us to tell them what the problem was that's why later on I followed my former CTO who then said you know what we broke so many apps with our load testing tools and load testing become is becoming a commodity but nowadays if you want to load test something you go to one of these load testing services in the cloud and you run as much load as you want for cheap money so he founded dynaTrace because he said I want to be able to trace requests from a load testing tool through a java application so actually being able to tell you why it is slow is it a bad database statement is it because you have miss configure certain frameworks and they are throwing millions of exceptions that you never see is it you're allocating too much memory and therefore the garbage collector kicks in all the time that's why I founded this company which later got acquired by Compuware which may you may also know from different fields there used to be big in mainframe which is something that most of us probably only read in books at least I see some faces that are probably my age and maybe even a little younger anyway since this is kind of my circle of life since end of last year we are back to dynaTrace and as you can see that we're with prides on my shirt but that's pretty much all i want to tell you about my history of company wise but these 15 years allowed me to see a lot of applications out there and what frustrated me about it is that every time they failed they failed for similar reasons and so my mission now is I want to be able to speak in front of people drink beer obviously because that's a nice incentive but I speak in front of people and tell you about the problems that I see so that you don't make the mistake again unfortunately I assume your code that you've created it's maybe yesterday maybe a year ago I'm sure suffers from some similar performance problems okay now also a little thing that I want to tell you now you've heard me talk about two or three minutes and you can probably guess that my accent is not from here does anyone want to take a guess where I'm from close Austria yeah yesterday here we go so I'm from Austria and typically the story is when I'm in this part of the knee of day of the world and I mentioned that I'm from Austria then I'm very often get the feedback all sydnee's such a lovely city and then typicality to remind people that there are no kangaroos in Austria I think the only kangaroo we have is in the zoo in Vienna then typically I say well you know where we're really from is this place it's a very small country in the heart of Europe a 60% of the country is covered by a by the mountains buddy Alps and then people typically say and now I know buddy talking about the sound of music okay who has seen the sound of music I've heard about it a lot of you if I ask the same question in Austria I would not get a single hand just an FYI loss spoiler alert at least I mean I only the people I think that make a lot of money on the tourists that come to Austria know about it it's still very beautiful there so instead of singing in the Alps we actually have our Conchita I'm not sure if you heard about her I think she was actually on Letterman and some others she won the Eurovision Song Contest last year which was a big event it's the largest competition for artists for singers she wanted last year we couldn't believe it that's why we actually hosted this event last weekend in in Austria because the winner always hosts it the next year and I assume you're from Germany based on your accent so typically Germans and Austrians we have a love-hate relationship a little bit like the big brother little brother we like each other a lot but we don't really like to admit it we love each other since last Saturday we have something more in common we are so-called serial pointers because we're the only two nations out of 27 that's achieved zero points in the Eurovision Song Contest last weekend so congratulations I'm not sure if you followed Australia's she did pretty well Australia by the way obviously is not part of Europe but they were asking for 60 years to be part of that competition and this year we finally let them compete they ended up fifth or sixth they were actually doing pretty well these are some other things that we would like the people in the world to know about Austria you know this guy he ruled this state for a little while I let you judge if you did a good or bad job so we have the Terminator we have our Terminator our ski star we have fadec spawn garden who did the space jump or the space capsule sham the supersonic then I saw some redbull cans over there Austrian product even though the guys Paul Rudd stole the idea from a business partner in Thailand but anyway Austrian product our national dish the Vienna schnitzel it's breaded veal we are very famous for our pastries and also for drinking a lot of beer and this is exceptional good beer but a lot of people come to Austria and also what we have those of you that have seen The Sound of Music the scenery there is really beautiful by the way we are hiring and all of these places are within an hour to drive from our office so in case you want to relocate to Austria we are searching for for Java engineers how's that hmm you don't have to now we have a lot of international folks now who said that yeah now we have a lot of international folks in the office now so hi my crater it's yeah we I think we have something comparable to a green card where if you are nit nit it's easier I guess but it's a general European thing setting anything single haha so yeah beautiful ladies all right so I'm not wanted I'm not here to give you a geographical lesson nor am I here to show you a blank slide the reason why I'm here is because I think nobody in here whether you are developer a tester responsible for operations want something like this to happen if you're responsible for a website if you're responsible for gravity for comm or your services if you're responsible for other big services out there that people want to use and it goes down so this is a screenshot of Apple's developers portal to go down then this is not a good thing because that means you're called in and it you broke something and you're responsible for fixing it I guess the only people that are happy if if Apple goes down is if you work for Google and Microsoft it's a fun fact so we don't want this nor do we want things like this is anybody doing mobile development hey do you like your mobile apps to crash exactly this happened today I'm European so when I talk about football I mean the game that they play with the food and the hit the ball this was the FIFA World Cup app one week before the World Cup last year it crashed for 80% of Android users when they opened up the app and refresh the initial screen to see the latest news recent memory leak in an outdated Java library that they use to display the list elements 80% of the of these user crashed your app not a good thing why don't we like this because it leads to things like this the famous war rooms and war rooms look like a party like this a big table with people around it or with not with beers but we sold us and about Pizza is the same but it's not a fun environment has anybody been in the war room and it's still smiling but I am not sure you probably don't enjoy the fact is that you'll be that you are there and potentially it also leads to things like this press coverage but the ones that your marketing department doesn't want you to have like FIFA scores an own goal with Boggy mobile app or Facebook goes down sets off panic or delights depending on how you see it delights probably for Twitter because then the whole social life from Facebook users went over to Twitter and also things like this this is a screenshot of the Google Play Store for the FIFA World Cup app and I know it's a little hard to read especially for the people in the back everything that is Circle T in red has not nice things to say of that app and you that raised the hand with mobile if you are investing if you have a great idea you invest a lot of time and push it out there and you get this feedback this is not good because nobody will use your app but will just go to the other app that is also delivering a similar service yes we know all that the question is why does this happen okay and I told you I Kevin Spacey right in house of cards he also asked the question I'm sure some of you have seen house of cards maybe he asked himself the question in the beginning why wasn't he put in the position that he deserved I told you I started as a developer and then became a developer and I can refer very well to disposition mature and I'm sure you can too I know it's a little hard to see from the guys in the back but I remember when I was a tester I did something that every developer was scared about I found something that kept him from doing the cool stuff from developing new features because I found bugs it filled up his inbox so I was and the other thing is always when I found the bug it was and I'm sure you get a lot of maybe backward defect reports not bugs defect reports and it's very hard to reproduce so you typically say something I found a problem but on the other side well I can't reproduce it works on my machine typical scenario I'm sure you can relate to that why are you testing on Internet Explorer nobody uses IE it works on my Safari oh my Chrome so we're explaining a situation to somebody that needs to fix it but they have a total different view because I'm blocking you and later when I became a developer I was on the app on the opposite side and I hated the testers but I could relate to them because I was a tester before so I think because of this because we are scared of these bugs that they find a little hard to see about the closer we get to release it feels as a developer like this we're chased by the testers and if you're very close to release and we have no chance anymore then we do things like this like the guy from the Dos Equis commercial I don't always test my code but when I do I do it in production who of you can afford to do testing in production ok nobody who is the who knows that who has done this before to deliver code knowing that there are issues in there every day I know because we're in deadlines and we have to push this thing out and we just hope and pray that we get dime later on to fix it the problem is that this attitude leads to 80% of development time in the US alone last year spent in pact fixing and not creating new cool features there's another number 75% of code ever written actually never makes it to production which also means a lot of bugs never make it to production but that's it's amazing and last year alone in the u.s. back defects contributed to 60 billion of defect costs if Amazon goes down if Facebook goes down they make a lot they lose a lot of revenue so it's most of this is not actually your labor cost sitting in the war room but lost revenue lost reputation now what's interesting and this is kind of my transition over to where I want to show you some technical stuff and notice is all marketing bla bla bla or something that you already know I work with a lot of customers a lot of companies out there and as I said in the beginning I'm very I'm very frustrated that every time I do a performance analysis on code 80% of the problems that I've found that I find are only caused by 20% problem patterns it's always the same things and that's what I that's why I'm here for because I want to tell you some of these stories so that you can avoid it so that you can go back tomorrow to your workspace or if you're really too excited maybe tonight difficult and try to find these and make sure you don't put them in oh you haven't put them in already so what I'm telling you stop waiting for back reports it's time to look behind the scenes there's more going on than you think and things may May might be a little different than you actually think so I want to show you six situations on why certain things failed how to avoid it and also I'm a very big fan of metrics I'm a big fan of metrics because metrics tell me something they tell me what the status is and not just a good feeling because right now is a good feeling if I drink five more of these I have a different feeling about it probably metrics never lie if you do if you measure the right things and if you know what you're measuring and if you do it automated you can prevent a lot of things happening without doing a lot of mental work by the way question two the bag can you see anything of this or is this just blur yeah and you can still hear me awesome I always start my present my scenarios with an image that kind of reflects what this scenario is about this guy here was a very in a very let's say creative engineer and he did something that we all like to do because we think we can do it better we can really reinvent the wheel I'm sure most of you Java developers we think we can write the next better hibernate or if you're a web developer we can write the better jquery but on the other side it doesn't really make sense it makes sense to use the existing components that are there and build something new on top of it and a lot of people do this but what I often see is that people blindly reuse existing components blindly reuse frameworks without even knowing what they're doing and the first example happened to myself last year I took over a positioning in within dynaTrace where I said I wonder we have a free trial product you can actually go to our website and download the product and you can play around with it but what I said to my boss because he said we need to push it out more we want to tell people that we have tools I said if I do this if I take over that role I want to be able to go to meet you at the meetup like today and you give me your business card and I want to invite you for that free trial and later on I also want to track how you're doing so I need a report that tells me how many people have invited and what their status is I asked for this very simple nothing fancy I'm the only guy who looks at that report doesn't need to look pretty about what it says what it tells me I invited one two three people this is the status I am the guy who invited them and they already created license key and all that stuff and I asked the developer who implemented the back-end service all of his life he implemented Packard's or back-end services he never implemented a website front end but I asked him and I pushed him because I said in two weeks from now I'm at a conference and one of his feature available so you as a developer if I ask you to do something that you've never done before and you don't want to tell me that what you do this would be one option ask people but I heard something else who said something about Google you google it so my colleague he googled HTML report based on two sequel tables what he found was a sample project on github using hibernate which is a cool thing and he implemented and everything was super fast this repo took milliseconds to generate but every week and every day every day every week and every month which became slower until at some point in time last summer I couldn't use this product anymore the repo took took took minutes to do - to show me three lines I went to him and I said it doesn't work he said it works on my machine ok and what we then did we looked at what actually and I hope that who of you has used that I have Annette who of you knows how to look at the sequel statements generated by having it good similar amount of hints so what we did simple thing we looked at the sequel statements being executed by his code to generate the three lines of report and it executed 4,000 sequel statements to generate three lines because the sample that he used was a four loop we're in the four loop he was iterating through every single object every time he exited an object - it loaded it and then he put an if statement in the loop saying if this user is invited by aim they then printed out to HTML and the thing is hibernate is a great framework but as you all of you know that raised your hand there's different ways of solving that problem in hibernate by having it only executing a single statement to query exactly these objects that I've invited but he reused a framework that he had no clue about and I didn't give him the time to actually look at it wing into it closely and I know even those of you that all raised your hand that you know how to create how to look at the sequel statements I at least want to tell you please do it more frequently because you would not want to miss things like this because this is whether it's hibernate whether it's spring on the dotnet side we also have the entity framework there's a lot of arm APIs out there you can make the same mistakes in every technology and every framework and people simply forget about it because they're put under pressure and then it's close to release and then nobody has time anymore this is something that should not happen and the reason why he couldn't reproduce it on his machine is because he had a sample database with ten rows similar there very simple thing another example and I hope I don't gross you out with this what does this tell us I guess a functional toilet but they did did the architect who came up with that plan he he probably didn't make the right decisions right works well for guys for ladies it's a little harder looks like Australia so what I'm saying is not every architect makes good decisions and this happened to me I take a personal example I told in the very beginning I started with my first company as a developer then piz tester then became promoted to develop and later on an architect so I was an architect for a new project and I was never really educated in architecture I had no clue about it but I was put in charge and I had to make decisions and I'm sure I made the wrong decisions and I'm not sure how it is with you if you're starting in a start-up and you you are now responsible for everything and you need to make the tough decisions but we haven't I haven't learned architecture in university I didn't what I didn't go through five years apprenticeship to know how architectures would really be done not you about you I did not I made mistakes same things here in this case this is an example of a very large company that we all know two letters British sales petrol they had a project where fifteen years ago and in turn created an online room reservation system for the meeting rooms to say I wanna see my 5:00 meeting rooms and want to pick a time where I reserved it 15 years ago worked well for one location it became very popular and the project then in turn started is now used across the whole organization worldwide the developer is obviously no longer there somebody else to go with a code now they have the problem that rendering these pages that show 510 meeting rooms in that location take between one and two minutes in production they see high garbage collection time the first assumption of the engineers that took over the projects it's probably bad garbage collection tuning it has to be read because if garbage collection sides always be garbage collection tuning and they also pointed fingers to the database guy because they said we looked at the code that is rendering this these five ten rooms and it's so simple it has to be the database in order to prove that point and I'm sure you've done this maybe yourself if you want to get some metrics if you want to get some proof if you want to do your own tracing and monitoring you may build in your logging so what they did time stamp before loading the data times them after loading the data printing it out to a log file in production and then their point improve was yeah on average we see 45 seconds being spent in that load data per office so it has to be the problem has to be on the database the guy who came to me was the database admin and he said this is the situation that is faced with on but his tools his database tools tell him every single secret statement execute faster than one millisecond it's the same thing the earliest before with a tester versus a developer and outside Bella / versus a database guy they explain the same situation from a total different perspective using different tools to measure performance what we did we actually looked again closer into what actually happened and I know this is a little hard to see maybe but this is what we call a dynamic trace transaction flow so we are actually we were looking at what is actually happening when somebody creates that report to show me these five meeting rooms the request comes in in Apache goes over to Tomcat and then goes over to this in their cases Sybase database and those of you that sit maybe in the front even though you've never seen the screen maybe before you may make out some numbers that should be very obvious what the problem is can anybody spot a number here so these are the calls between tiers this is the time spent in it here exactly what do you read there exactly they had 24,000 889 calls to the database in order to display five rooms or ten rooms it's a little a little excessive just a little it's a very similar problem pattern what I told you before with my hibernate example but in this case what we actually figured out first of all we we showed them all the database statements this is a classical I call it the N+ 1 query problem the same database statement we called all over again just with a different ID also Sybase has a very does anybody use Sybase ok yeah one guy in the back ah a long long ago Sybase has this very interesting thing is if you take a database connection out of the air out of your connection pool it always sets the security context on it we set client name and then your current user and we saw that this was actually called twelve thousand four hundred and forty four times so basically what is what the code did they took out the database connection executed a single sacred statement and put it back and they did this twelve thousand four hundred forty four times for a single request to show five rooms very efficient excessive to use yogurt and also what's interesting looking at the individual sequel statement execution as seen from the application this was the proof that every sequel statement was really faster than one millisecond so the database guy was right now the equation is why did it come to this thing what we see here is what we call the method hotspot so when when our product or other profiling tracing tools whatever tool you use whether they use dynaTrace whether you use new relic F dynamics I heard Wiley even though I think Wiley is really first generation tool or if you use your profiler these tools tell you which methods actually take a long long time and what we could see here that most of the time was actually spent in hash table we found out that the initial intern that developed that system had no clue about the database but he learned something in school about hash tables for every single request he loaded the complete database into a big hash table yeah and now we know all of about this but think about it how much of the code that you are not responsible for is your own code can happen to you too it's a big company so he loaded everything into a hash table and what we see what we see here this is the reverse call stack so here we can actually see where data is actually called from or where these methods are called from this guy back then he created his own entity framework so he had a room object in an entity abstract entity object and basically all of the his implementation was to first load the whole thing into them into memory and then basically the datasource of the entity framework was the hash table and then everything was disposed again after we request that's why they saw such high garbage collection make sense hmm there's a kind of scan loss oh I hope still he's alive and maybe he learned his lesson hopefully it's better now it was an interim back then he had no clue so lessons learned first of all the custom measuring that they put in in case you've ever put in something like what I shown before with the two timestamps they're they're measuring was actually wrong why because they had such high garbage collection impact they if the garbage collector kicks in between two method calls obviously you're measuring wall time or clock time and they did not include the garbage collection time so there's tools out that they can actually tell you how much time is really spent on CPU or actually in garbage collection so that's one thing and they just measure the overall time but never looked at never had the idea to look at the number of sequence tables being executed also for you never assume that you know what code is actually doing that if inherited and obviously for the guy the intern learn seek well and don't think hash tables are the solution to your database problems okay make sense I know you're shaking your head it is happening and this is a big company ma'am I know I'm a big fan of matrixes I told you the number of sequel executions you should know how many sequel statements are really executed in the code that you're developing also the number of same sequel statements and there's two use cases for it if the same sequel statement is executed multiple times in the singles in a single request you can probably either up you can probably optimize that sequel the n plus one query problem if you look at the longer running time frame and the same statement is called every time good candidate for caching I see a lot of scenarios where frameworks store the configuration properties in a database and for every single time somebody needs to load the framework or need something the framework goes to the database and loads the same value again even though it never changes so cache data and this is another one they I didn't mention this explicitly before connection acquisition time do you know what that is exactly in the connection acquisition time however so a lot of people look at connection pool utilization so if you have a connection pool of 50 and you ever had if you have a hundred percent utilization what does that mean no free connection is that a good or a bad thing why exactly why if there's no next request why if there's no 51st user that needs a connection and it's a perfect system but if not if nobody needs to 51st connection then it would be it would be okay so that's why I'm telling you I mean looking at the connection utilization is a good thing but actually looking at the time it takes to get the next connection out of the pool so how long does it take me to acquire I'm not sure that's the right word acquire connection from the pool this is the more critical measure actually because sometimes pools are highly optimized and you never and then 100% utilization actually perfect I agree with you but still if you look at the pool and if you if you only need what's in the pool and if even if it's a hundred percent utilize them but nobody else needs the next one that's not available this is an additional metric that I tell people to look at do we actually need to wait in order to get the next connection for your web developers out there I mean know as many web developers if you look at your at your browser Diagnostics tools every browser has a set of physical connections to download resources from from a domain there we speak about the wait time how long does the browser need to wait in order for the next physical connection to be available to download this at the seventh or eighth image of the same domain that's similar so you're actually waiting for four resource so please remember these metrics and maybe tomorrow when you go home think about it yet this is a way how how can we measure this because I'm interested in it I want to know if I make one sequel statement of 1,000 next one you call it cheating I call it multi-threading this came in recently and I call it parallel or paralyzed this was an organization and they had a monolith application everything was running in a single app container and then the architect had the idea we need to do so ax we need to rip everything apart and make it service-oriented so that we can scale any service as we like they did it was a financial application this is the transaction flow I know it's hard to see especially from the back but again request comes in here goes to Apache goes to the first Java server goes into a cluster of eight goes to this database and also to an external service to Siebel in this case now here are the interesting numbers twelve minutes for a single web request under high load it's for logging on to your online bank now because they split everything apart everything used to be in a local container so what they did they define certain interfaces that now became callable via RMI now they had 64 RMI calls from one JVM to the next they had 130 calls to Siebel on 13 different background threads because basically what they actually did so they ripped everything apart and now the services to individual ones that were now hosted in this cluster every time when they were called had to go to Siebel multiple times to get current some-some state information or whatever information is needed but they could not share the data anymore that's why every service called it used to be called in a single JVM and they could share the data now was not able to share it anymore and they had to multiply all these calls now what I like and I know it's very hard to see but I'm we're going to I'm going to share these slides this is what we call a pure path so this is the actual execution of these of these of the code we'll be able to trace if you have if you have a distributed application we actually show you the call starts here and then it's bounced off into multiple background threads and multiple JVMs so we show you really everything in a single we call it a pure path and a single trace and what I really like about it we follow calls across our my boundaries so this is the call in one thread making the call going over to the next JVM and here we also see the different thread IDs this is how we could figure out in their case that they were really executing every hour my call asynchronously and because everything basically was then pushed back to Siebel and Siebel was totally overloaded with a lot of multiple with with ten times or how many more time as a traffic everything was basically blocked so they were blocking thread they were consuming much more bandwidth on the network obviously they decided to actually be able to host a different service in two different data centers therefore you have network traffic to consider so what I'm telling you the architect that made that decision to rip everything apart should not just think I can rip everything apart you need to really think about a new architecture Sola doesn't mean ripping things apart on currently defined interfaces also what they had they were using oracle coherence and now coherence was also pounded many more times because these back-end services were all of them going into coherence and here's also the top hotspot object was a actually object weight and so the weight is these view works we show you what is the top method and if you click on it here you see that who is calling it and it was very interesting to see that coherence the a the coherence API basically spent most of the time in waiting because coherence was overloaded so they just basically by making an architectural decision came up with a total different problem it much bigger beast yeah there's more more on this multi-threading example I'm flipping through this what I want to tell you it's very easy this multi-threading it's very easy for you to monitor how many threads you really use I hope you all know how to get access to your gem X metrics of your containers or in.net your windows performance counters they told net is exposing or whether you work on other platforms and runtimes so look at the at the threads in the different containers also I know it's a little hard to see here with the bright light but if you have a web server any type of web server with Apache nginx I is always take a look at the at the threads how many are idle how many are busy I've seen very crazy things here so metrics because I told you a lot metrics the number of threads your code actually uses and also calls to api's how often how often do I call into coherence do you call it once when I often architectural change I call it a hundred times because I split everything apart make sense by the way I know in the big it's it's actually really warm in here at least to me and let you have idea warm yeah you can control how long I go I have three more examples but I don't want to bore you so if that okay keep going okay but you know just tell me next example who knows this guy Jeffery those of you that don't know him Game of Thrones is the evil kids I would say that they put him on the throne it was a there's I think there's good kings and bad kings he was deployed on the throne and I think it was a bad King and he deserved justice oh he got the justice as with bad Kings that do bad rulings that can be played a bad deployment so deployments that go bad in this case this is an example of an eBay like platform I say eBay like because there is one use case we have people like you can put the products on the sell it and people like you if they can buy it this was two weeks before Thanksgiving this company ran a large-scale load test with loadrunner testing both scenarios putting something on there and then looking at the report how much did you sell at the end of the day and like you that browse and buy something and what they had in the test environment two weeks before Thanksgiving the use case of you at the end of the day figuring out how many items to do sell hard to see it took 42 seconds to get that report and they made 1,600 database calls so similar pattern again now the interesting thing is this team said we had two weeks before Thanksgiving and sorry you are not as important as he because he makes the real money and you can wait 42 seconds for your report because you only look at it once a day they deployed this into production with very much confidence that everything else will be good in production the same load the same use case for you now the report took five almost six minutes and what was very interesting is there were three times more database statements executed so who can give me a hint on what the differences between the how can this happen how it happened when they actually executed similar amount of load here but at a different a different version here much slow-mo database statements the data the data was so they took the data of the year before and they haven't read the papers that much more is happening online they didn't anticipate it there was not the reason but could be as well so one thing was that there was different data set and they had a data-driven performance problem so they were just loading executing more database statements but was very interesting with this one so what we always do when we look at performance if you look at an application is a lot of stuff happening in there and we always try to put it into layers so the top layer here says web server the light blue means IO in that in testing most of the stuff was happening in I own the web server which makes sense for a web application because if you look at the web site most of the content is static images JavaScript file CSS and that's all IO that the web server is doing the next thing was hibernate Web Services JDBC and their own code base the same thing in production looked like this first thing is hibernate next thing is Java class loading then custom monkey and XML processing no time spent anymore on the web server it is not a hotspot and the interesting thing is why this was so different could be explained by again looking at this view that we've seen before the top method here the slowest method was Java line class get interfaces was reflection and it was called by hibernates field interception helper same so what they did went on google searched for a field interception helper performance problem and the first thing they found was Ichiro ticket saying well-known performance bug in this version of hibernate why is the difference different version so operations decided on the eve before Thanksgiving that they need to upgrade the production libraries so you're laughing about this now but this is a nod this is an accompany that most of you probably all know it happens because they were scared they said we need to go to the latest versions and therefore we chose upgrade so basically the whole load testing that they did was bogus because they were testing a total different application and the other thing was interesting they are the other the xml stuff it was also class loading but they also flipped the or switched to a different xml parser because somebody read this parts is much better than this one stooop I mean for us stupid things but these things happen and it's a reminder for everybody that even though this seems like common sense people make these mistakes say it again this should be different yeah I mean but the thing is you should only put stuff into production do you really test and that includes your own code and everything around it that's the whole thing yeah you know that's not true as you said it's not true in this case they had a regression in having it in that version at the deployed so metrics time spent in API how long how much time do you spend in hibernate how much time you spent in in reflection and how many calls do we have to the API these are always metrics if your start doing this and checking these metrics along your pipeline before you check in code when you do your unit test your functional has to perform at sets and you look at these metrics you can automatically identify these regressions that might not be obvious yet but if you look at these metrics they can tell you that next example there's two more one more after this and then I'm done what does this tell us board room right this is a room similar to this boardroom but the people in here all earn about ten times more than we do altogether and the guy in the front here says I think we may need a mobile strategy over a couple of weeks before Thanksgiving when he heard in the news that everybody's making business online these days so we need a mobile app and here the guy in the back who is tweeting and facebooking said sorry did you say something kind of half only half hearing it and then basically what often happens is push without a plan so I'm not sure if this happened to you but somebody has a great idea and you have to have it yesterday so please implement it and in this case this is an example that happened to a very large soft drink company they are sponsoring the Super Bowl and what they had their marketing team at the great idea they said when there's parties like this going on around the globe because it's the biggest sports event in the world that this single sports event I believe then what we want people to do we want them to do things like this they should have parties and drink our soda and then take selfies and they should upload the selfies to our website so when then later on they see our commercials everybody goes on our website and then we show them the last 400 selfie uploads in a 20 by 20 grid okay that was their idea to interact with their users it's a cute little great idea if the screen is big like this but if this is your mobile strategy 20 by 20 then this is still a good idea but the probably does rethink it especially how we implement it and I'm showing you one screen of a browser Diagnostics tool with two metrics and the two metrics should tell everybody that this is not going to work in order to download stats cool mobile app web app mobile website I had to download 20 megabytes on my iPhone 4 and it was 434 individual images because the way was implemented I had to download every single image a 20 by 20 as an individual file and not in two by two soles but like in whatever it's 400 times 400 and then my poor little thing had to scale it down okay exactly there was my reaction to so obviously a great idea but the implementation was not the best if you do something like this any ideas how would we do this better I may also I would create what I would do nobody knows the last flight upload so every five minutes I run a batch job and take random flight images and create three images a big one for the big screen one for my laptop and one for the small device then I save 399 round trips and a lot of data here and a fun fact here's I'm as you know I'm not from here this is my business my Austrian business phone data roaming in the u.s. is 15 euros per megabyte so in order to download this when I'm when I'm abroad it's 300 so write 300 euros that's $350 for a page it doesn't really make sense to me yeah make sense and because I don't always want to pick on the people in the US but also the Europeans fifa.com so soccer football they are mobile websites the landing page of the mobile website the largest element on the page was the fifth icon with 300 300 70 kilobytes ok the favicon for those that don't know it's the little 32 by 32 icon on the top left of your browser or if you pin the page to your home screen it's a little logo that you see three mid 70 kilobyte it was the full high-resolution image that somebody probably took from the marketing department and put a graphics department and just converted into an icon it's not a good thing and so what I want to tell you if you are a web developer or I'm also talking to test us a lot there's no excuse anymore to not look into these metrics before you check in code this should not happen not to companies like this not to your company either so things you want to check the number of images on a page number redirect is also good one some other stores and dates and the size of resources okay who are where are the web developers okay only we come on don't be shy are you already falling asleep there's more beer later on if you stick it through the whole thing so please use your Diagnostics tools and make sure these things don't happen last example another football super ball from last general from this year not sure if you remember but Kia had a really cool commercial 90 seconds I had no clue about this 90 seconds to put the commercial near 12 million dollars why would somebody like he invested much money what's the business hope a lot of people see them what what should be the next thing that they should do they should go to the website so half of America that is half drunk at the Super Bowl parties should go to keep calm and figure out how cool their cars is the next day when they are sober by it okay so they're spending a lot of money to create the largest traffic problem in the year and they should be prepared for that right so what they did is what I call not reading a child deployment and I hope for everybody the view that is having production applications you have some type of production monitoring well you know how fast is your website is it available well the website is down you want to see some things like this so we from dynaTrace we have a synthetic monitoring solution as well but you can use whatever you want and what you see is an X screen is our availability chart for key calm throughout the Superbowl okay it was a hundred percent available and then the ad went on the air and it dropped down to zero so that means it spend a lot of money for me having a great story but for me having a great story so what happens is that they had such a great video and they put the video on their homepage so that means all of the people had to download the cool video content from a CDN from a different provider but still everybody else that also spawns up the Super Bowl at the same CD ends they had the same third-party components on there that means if everybody's putting Facebook and Twitter on the same page at the biggest event in the year then they have an impact they are impacted the most and if you are depending on these then it also brings down your sites that would happen here large content cool content but too much now I have a different example an alternative GoDaddy what you see here is not availability but its response time of GoDaddy during the Superbowl it was about between one and a half and two and a half seconds was pretty good yeah what was interesting is one hour before the Superbowl the response time improved Forex how does that work so what we what set it again good idea would be an option what we first thought is probably the website already crashed and they just delivering like a fanny hey come back later that's what we thought what they actually happened we went to the website and what they said if we spent millions of dollars in marketing money and we want people to go to GoDaddy they should see what we sell they don't need to know how many Facebook friends like us they don't need to go know what's going on on Twitter they don't need to see the high-resolution images they can see some static version of the website that's what they did they said for the main landing pages we create a scaled-down version of the website they ran the whole thing on the same hardware four times faster and we're available 100% of the time so if you have a software where you have traffic spikes like this then you need to consider these strategies and one hour after the Superbowl they flipped back the switch and say full-blown version again because now the major traffic is over of course you can yeah it's like eating more medicine if you are sick now I know you're right this is an error this is a more cost efficient solution here nobody all right it would be an option to to scale up or skill out so the interesting thing is the key metric that told me what what the difference was the number of domains Go Daddy went down from 26 domains where they downloaded the content to eight so they got rid of 18 third party domains during that four to six hours that there were no longer depending on Facebook Twitter Google everything they got rid of it and also the total size was much much smaller so what have you learned today besides me being from Austria I hope you learn a couple of metrics and I really a pretty good appreciate if you could go home tomorrow or go home tonight but tomorrow when you go back to the office pick one of these metrics and try to figure out how you can measure it on your local machine so before you check in code you know it oh you're not executing a hundred sequel statements or that you don't have 500 images on that page then you tell your peers and then you tell your test automation engineers that they should automatically figure out they should figure out that we have to automatically capture these metrics in all the tests that you are executing automatically so every time you're executing a test these things are automatically check these metrics because what we want to do is you want to automate as much as possible right because manual doing many things manual all the time is boring and if you have a bad day like maybe tomorrow if you have too many beers manual things are not as good as today but an automation tool never makes a mistake because an automation tool is not drunk and here's here's my point I think is one of my last slides that they have if if you have a if you have hopefully who has unit tests that are executing on a bill to build basis awesome what does the unit test tell you like a leg you broke a leg or you brought yeah you broke the functionality because typically what the unit test tells you is functional correctness right so if you have a build seventeen and two tests and everything is green we're good build eighteen something fails you know you broke something you go back you look at your revision log and say this is what did I change I'll fix it and hopefully everything will be green and what I tell you now I told your list of 10 15 20 metrics why not try to figure out a way how to automatically hook up your test automation with tools that can tell you the number of sequel statements being executed the number of exceptions being thrown the number the sip the time spent on CPU the number of images on that page all these metrics if we capture them first of all it gives you more visibility in what's really going on if something breaks and you see hey there were 45 exceptions being fall in hibernate that I have not seen before because I only looked at my assertions from my X unit test then this gives you a better hint and if you fix everything based on the best intentions and they're the metrics tells you well but now we're executing six times as many sequel statements and we are spending twice as much time on CPU if you push this code forward you know you need to buy more sequels more on what database servers so in the ideal world you fix this right away to come back to functional Guin and all and I call architectural green yeah yeah exactly I agree with it so your point is for unit tests CPU doesn't make sense it's more for first of a longer running test and you need to figure out a way how to accurately measure real execution time that's one of the examples I brought in the beginning words that the developers put in their own measuring code it's impacted by a lot of things yeah so this only makes sense not every metric makes sense for every test yeah yeah yeah yeah exactly we wrote a blog so we use our I mean our product that's this and we use our own product in our own development in team and we actually blow the blog about how they actually do it internally so we have some warm-up tests to make sure that the system is stable and then measure but for most of these metrics number of sequel statements it doesn't matter if it's cheated or not or if it runs on a virtual machine these are hard metrics but these are more does it make sense okay and I guess you can figure it out if you have a built pipeline then you can use all of these metrics as a digital quality gate so if you're responsible for your build pipeline do not only use the result of your je units tests or whatever tests you use as functional quality gateways but look at these metrics as well and block this build if you know you've just introduced an architectural regression because you're executing five times as many sequel statements if you know this already there's no need to waste time in capacity testing because they will find the same problem this is basically with using what what's called in an unplanned hour or wasteful time it's wasteful because we already know it all right you're still awake at least most of you as I can see I hope this was hope at least that something of this was new for you not only repetition I do this on a date day basis so I help people all the time and I know I mean I can look at this data and I know what's going on within a couple of minutes that's also why I offer what I call performance clinics so in case you have you're trying tomorrow you decide on next week you decide to actually look at things like this in order or instead of you going through the pain and figuring out all these metrics on your own I'm offering what I call an online performance clinic every other week so by the way these links I put them all up on meetup in case you're interested so I do a webinar they are also recorded and put on on my youtube channel so you can look at things like how I do Diagnostics of multi-threaded applications and stuff like that how i hook up chin unit with with with diagnostic tools and also in case you decide because the tool that I present is my only marketing thing dynaTrace is a more a production monitoring performance tool so we are we are monitoring applications in production in tremble not only in production by those in test and development so all the use cases the screen shot where from dynaTrace and the cool thing for you is you can sign up for a free trial you get 30 days on a distributed app and after 30 days the tool stays free for you and your local machine in case you want to test it out so in case you said why do I waste time on an evaluation if after 30 days nothing works anymore after 30 days it keeps working on your local machine means you can analyze anything on your local machine as long as you want and if you have no idea what the data tells you if a share your pure-pep program where you can actually send me the data and I'll send you back a PowerPoint similar to what you saw today with bullet points saying this is a problem this and this is a problem so there's a benefit for you and it benefits for me because I have new blog material for next year and maybe you invite me again back next year and I say these are the top performance problems from San Francisco then we can do a competition yeah all right that's it and I hope it was useful if there are any questions let me know or feel free to escape but grab a beer that's still left somewhere any questions what do you do actually so yeah good good question so for your case so you is asking about garbage collection what's the right approach is it about tuning or is it about being smarter with object creation we actually have an online performance book with one of my colleagues he's an expert in Java memory tuning he it's a free book it's on a book taught dynaTrace comm it covers all of this also with use cases not our blog we also have a lot of examples exactly yeah if your object churn rate is really high if you're constantly creating and destroying objects exactly yeah and I'm sorry I'm sticking around for a little longer so those of you that really want to leave feel free in case you wanna if you don't want to remember the URL so wait until I posted I also have some cards because that's can feel free to take them there's some information on there and I'm very I'm really happy to find out new problem patterns all right thank you
Info
Channel: InfoQ
Views: 58,643
Rating: undefined out of 5
Keywords: Andreas Grabner, SF Java, Java, common mistakes, biggest mistakes, examples, code, applications, tutorial, gravity 4, Programming Language (Software Genre), Java (Programming Language), Performance
Id: IBkxiWmjM-g
Channel Id: undefined
Length: 62min 48sec (3768 seconds)
Published: Tue Jun 16 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.