Avoid Using Regular Expressions!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so in 2016 and 2019 stack Overflow and Cloud Flur had a major major outage and down time for more than 30 minutes all of that literally all of that was just by a simple bad regular expression Yes you heard me right regular Expressions can be very harmful they can cause a lot of issues and most importantly they can make your servers down for more than 30 minutes big companies like cloudfare for example or even stock overflow so in this video going to explain how regular Expressions could be that bad and what is regular Expressions denial of service and most importantly we going to see how a server that uses a bad regular expression can be taken down and hacked down with a simple malicious HTTP request so because of regular expressions and how dangerous they could be many companies actually faced huge outage and dumb time just because of a using or developers actually putting together bad regular Expressions starting for example with stack overflow on July 20th on 2016 it was down literally the whole stack Overflow the stack exchange was down for 34 minutes and all of that was just because of a simple teeny tiny regular expression that made the whole website all the web servers go down at the same time which of course it made it a huge loss for the company a lot of issues a lot of developers caus them like 34 minutes is absolutely outrageous or for instance just on 2019 Cloud flare actually had the same issue with another very small regular expression that took them down for almost like 27 minutes and this literally like brought down the whole like HTTP and https suring of cloudfare from the core proxy the CDN and the W functionality which is like the whole firewor that cloud actually relies on on their services all of this literally all of this was just because of a simple regular expression that literally backtracks an enormously and exhaust Ed CPU usage for HTTP and https serving and this caused the whole things to go down and this can indeed confirm that regular Expressions can be rode the wrong way and absolutely bring whole big companies and services and websites down in literally seconds so for instance let's take for example this really simple very innocent regular expression and see how this reg expression in fact is actually very very bad and can cause the whole backtrack or the catastrophic backtracking issue now for our rejects in here we have this simple Group which is matching A+ so it matches one a character or more and it actually puts him inside of a group and of course it repeats the same thing for the group so it matches a or plus and it matches the group as well like it adds a plus in here so if it matches a and it goes out of the group and actually can match as many A's as it should be so for instance in here this is actually pretty valid so if you just a like as many as you want that's pretty pretty good it runs in like 0.1 millisecond seven steps pretty fast pretty performance all good but the bad thing about this one is like for example if you introduce an input that not only has A's but it has A's at the start of the string in here and it has like another character at the end for example for instance let's say x so when I added X in here you see like exponentially went huge for the steps in here the number of steps it took to actually finish and finalize this regular expression or run the regular expression is 24,576 steps which took almost a second to finish that just crazy to think about it because this is actually causing a backtracking issue and I'm only having a few A's let's try to go ahead and actually add a couple of more A's so I add one in here 49,000 and if you add another one it's 98,000 it's like exponentially growing let's add another a and there you go now it went to the catastrophic backtracking bug now this is literally the bug that actually caused the downtime for both Cloud flare stack Overflow and many many other packages services and websites now to better understand backtracking let's go ah and use actually the debugger so if I click on this one it's going to bring me a debugger which going to tell me like how the regular exper is going to run slow motion mode so I click the play in here it's going to start and tell me exactly how it works so he tries to match the whole string and he backtracks ones and he just keeps backtracking over and over like a backtracks three times now then going to backtrack like four times and it keeps going and that's because of the B Str in here we're doing just like a plus inside of the parenthesis which is going to match a group and of course outside so in telling it you can match this group this A+ group one or more time which is like a nested Loop like if you do a nested Loop in JavaScript or Java this is actually going to cause a lot of lot of issues and this will run for 11 19,999 steps and of course if you're run this on a server on a small server that's going to cause a lot of issues it's going to cause some like temporary downtime time it's going to cause other users not being able to actually send your request and get response time in the same time and of course it's going to cause some timeout Now backtracking isn't something new this is actually how every single reject engine actually execute the regular expression because this is actually how it works this is actually how it's able to match a string on different variety of stuff if you like add a string do a Plus on it or you do a question mark so you can match zero or one and y y y this is basically how it works but the way we put or right our regular Expressions we have to be super careful with it we have to know how to write regular Expressions that are running perfectly and of course most importantly we have to test our regular Expressions before we deploy them now when you have a bad regular expression running on your server like cloudfare did for example and you have so many people like so damn much of concurrent users like millions of concurrent users or concurrent HTTP requests at the same time and when that particular regular expression takes forever to complete you're going to have a redos a redos simply stands for regular expression denial of service which is the same thing as dos or distributed denial of service that's literally the same thing but this is actually a type of denial service that is caused by regular expressions or rexes and this just by hitting this particular one this is actually a Cyber secy attack that a lot of hackers ATT tend to use nowadays for doing uh you know denial of service on websites that are using bad or malicious is regular expression now to be understand how redos or Den service actually works using regular Expressions I have a simple Express server in here running on noj of course and I have like a simple controller in here for adding a new block post to the database so simply this control is going to take care of like you know adding the block post to the database is going to accept an HTTP post request with a body of like title on content it's pretty simple but let's say here let's go ahe and try our malicious code that we just explained a couple of minutes ago about the milici ious input in here which you have like a and you have you know a couple of A's repeated then at the end you have something else and for the reject like you know a grouped like A+ in here grouped and you have another Plus for the group so let me save in here I already run the server and let's go back in here and of course remember here we only have 20 repeated A's now this is not going to be very harmful because we don't have a lot of A's in here so it's not going to backtrack a lot it's not going to take a lot of time to do it cuz the CPU is of course is is very powerful Plus go and try it out so if you go back in here it runs and it just like returns in 74 milliseconds that's pretty good now let's go back in here and actually try to go ahead and add a a decent amount of zeros in here for repeating A's go back in here send another request and boom now we've got the server sort of crashed as C it's literally hang on it's like sending request it's never getting a response back because that regular expression is still running till now it hasn't finished running just yet and of course because no GS is single thread that means if you have that riger expression running in one of your controls as we do it in here we have it running on one of our controls in here which means it's literally occupying the whole thread that nodejs has which means the whole server is actually occupied and now it's going to actually render the server unresponsive absolutely now if I go to another R for example we go to you know 9,000 like a homepage in here let's say I'm um on cloud FL try to hit the homepage or something I click Send and the homepage is hanging too not all that particular route but the homepage is literally hanging as well because the whole server is literally hanging and that's exactly what happened for those companies now you're probably wondering that this regular expression in here the A+ it's pretty naive and it's pretty simple and it's like really where somebody's going to use some sort of that example I mean you're probably saying this could never happen to me right no you're wrong if you take for example this regular expression below in here which is a little more complicated it seems a little complicated but it's actually doing very simple stuff all it's doing is actually trimming the white space from a post and this is actually stock overflows bug this is actually what they were using before when they had the first outes in 2016 and this literally this was just simply detecting a white space or a white space character Unicode character in here was introduced at the end in the beginning if it was it's going to actually just trim those white spaces instead of using other functions to trim whes spaces it was using this back regular expression and of course this caused it to go down for more than 34 minutes so of course there is always a real word scenario for it now for more of a real word example in here we have this HTTP cach semantics which is very very popular more than 18 million weekly downloads on mpm and this literally this now it's actually on version 4.1.1 and the prior version the version 4.1.0 had a redos like literally had the regular expression denial of service bug that could any server using this particular package could be running down if it was using the wrong regular expression so this just simple package that all it does actually pares the cache control hitter on each request and response and it tells you if that hit is cachable or if the data is cashable the response is cashable or not and y y y now I have a very solid example here of how this Bas basically works so I have this example here let me just go and comment this one because we don't need it so the package in here the hcp cach semantics package uses this regular expression to basically split the hitter because the hitter has like cache control like something like this cache control then a colon then on the value it's actually each value or each cast control is actually separated by a comma which means that the first value is separated from the second value by a comma so I use this regular expression in here to say oh if there's any space in here or you know less and there is between a comma between them I can just use this reject in here to go ahead and actually split and actually get the split I mean this seems pretty nice and pretty good and everything that's cool of course this is working absolutely fine if you get a good cash control a small cash control hitter in here but if you hit the edges and actually get a quite longer sort of cash control with a white space in between so instead of having a comma in here you have a white space that is repeated more than like 7 million times Well now you've got the problem now if I save the changes in here the server is going to reload everything's good and let's go back to post and here I have the routes that actually adds a new block post which is of course the vulnerable route that has the bad Rejects and let's go ahead and run this one see send a request it's like hang absolutely hang and if I just go ahead and try to access another route like the homepage it's hang as well now the server is fully crashing it's like fully occupied and you cannot accept any other requests now you're probably wondering you're saying oh you've have you have this particular script inside of the server I mean who's the idiot is going to just like put this particular script this vulnerable stupid scripts in here inside of the server let me tell you something this is just for demo stuff and POC but you can actually if if you have a server if you know application that use HTTP cash semantics like the vulnerable version before the fix you can actually run down the server and make make it sort of like unresponsive and time out with a simple request so for example we have this particular code in here and all it does in here just goes use the cash policy which is imported from the HTTP cast semantics which means on our server we're using the HTP custom Antics and of course I'm installing the vulnerable version of this one it is 4.1.0 the 4.1.1 is where the fix is and of course this actually takes just request and response and it takes Hitters from both the request and the response now as an attacker I can actually utilize this and actually can like send a wrong and a bad cash control header with a bad like you know a lot of spaces in that one and can run this server down now what I did in here I you went through and I you created hacker script which is Hackers script. JavaScript which is I'm using a packaging here called autoc Canon which is going to like allow you to run like thousands or as many as you want concurrent connections or HTP request against a single server so for example I'm going to run a th connections inide of the post in here with the cash control which is a bad control hitter in here so for the bad cash control hitter in here I'm using like the repeat for a th000 characters and I'm putting it like a wide space and putting it between like the max age and the must r validate and this literally is going to run the server of course because I cannot put more than like a thousand on a single hitter so it can't put like a million or something because an HTTP protocol doesn't allow that one so I can do is actually I can put each hitter with a th000 characters or th000 kind of like wi spaces and it can run thousand connections so like thousand concurrent e users all of them sending the same bad payload to the server which going to make the server unresponsive so let me just save the script in here I go back in here and of course I'm using on account before so I'm going to do node hackers script. JavaScript and before actually running that one let's go back in here and try to run our server send a request to see if the server works fine it does it returns success and it R turns it in more of like 6 millisecond which is pretty fast the home page works fine as well now after I run the script the script is going to take quite some time in here and I try to access the server again there you go the server is is tumbling Upon A send a request and it's literally unresponsive it's going to do a timeout if we try the home page again same thing happening that's crazy and here it went back it literally went back after 10 seconds the same thing for homepage in here and that's just because like I think the hacker script finished or something but I can always always make the hacker script or add more characters in here or more concurrent connections like a 10,000 connections and this will make sure the server is always occupied which means a denial of service so that's how bad rejects could be and if they are utilized the wrong way they could absolutely turn your servers down so the advice or the best advice I can give you guys always always think twice about using R Expressions because as much as appion they look and as much as good and and the functionality they provide and the user experience and or more of like developer experience as much as harmful they could be or they could be used and exploited by a third party hacker or maybe just just a wrong bug in here that can turn the whole system down well a lot a lot of dangerous possibilities so in my opinion I think the best way to tackle the catastrophic backtracking issue in here is to always use like a re reject kind of engine like Rex 101.com in here so you can actually put your regular expression actually test it out and see if it has any back tring issues or something and test it with different different possibilities and see if it actually works or not and always always avoid to put nest and stuff in here like the plus is always always bad thing and if you especially if you do it like nested way in here with with the group that's pretty pretty bad also there's a really awesome other article in here that how you can actually tackle the catastrophic backtracking issue in here how does it work explained briefly by reject body in here so I'm going to leave a link down description below so you can read and actually know how you can fix your rig expressions and make sure they are good so anyway guys thanks for watching hope you guys enjoyed and catch you all hopefully in the next ones
Info
Channel: CoderOne
Views: 4,512
Rating: undefined out of 5
Keywords: regex, cloudflare, stackoverflow, cloudflare outage, cloudflare redos, redos, regex dos, ddos, regex denial of service, regex DOS, regex backtracking, regex vulnerability, regular expression
Id: 7fXu_SToVrw
Channel Id: undefined
Length: 16min 18sec (978 seconds)
Published: Mon Oct 09 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.