Format String Exploit Troubleshooting Over Twitter - bin 0x11 b

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Sometimes people write to me because they are  stuck in a CTF challenge and they hope I can help   them. In this case it’s a person who is apparently  trying to exploit a format string vulnerability.   There is always a bit of a conflict for me because  I want to help people, but it takes A LOT OF TIME   to do so. You will see how much time it costs,  even with a simple challenge like here. And to   be honest, I rather invest this time into making  videos for thousands of people, than helping one   individual person. But when this person wrote  me, I just happened to stand at the platform,   waiting for my train, and had nothing better  to do. And actually, in the end helping, this   person exploiting this vulnerability, but turning  that into a video to showcase their mistakes and   sharing my thought-processes, it becomes a very  valuable learning resource for a wider audience. So let’s head into the DMs. BTW, voice  actor helping me for this episode   is non other than John hammond! “I'm trying a ctf and I need your help it's  about print string vulnerables and stack   bleeding. I have managed to leak everything but  the flag. Here is the server and port to connect   with netcat”. c  They also share a screenshot. They called it  a “print string vulnerability”, but given the   description and the screenshot, I think it’s more  commonly known as a format string vulnerability.   If you don’t know what a format string  vulnerability is, please watch my old binary   exploitation episode about it. Anyway. As I  said I was outside on my phone, so I couldn’t   check the challenge with netcat myself. So my  first response was, “what’s the question?”,   because I’m already annoyed that this  person didn’t share important details.   Please watch my videos raging about that, BEFORE  you send me questions. Because, this text and   screenshot contains SO LITTLE INFORMATION, that  I have to guess a lot. And my guesses could be   wrong, that would help nobody and would be very  frustrating for all sides. Though this person got   lucky, it is such a simple challenge that I was  pretty confident that I knew what was going on. But just to make sure, and teach  people to ask proper questions,   I’d rather poke back. And I  ask “what’s the question?”,   I mean there was no question. The person just  stated they they leaked everything but the flag. I know it’s a format string vulnerability.  Apparently we have to leak the stack to get   the flag, and in between the task description,  I see leaked environment variables. HOME, PATH,   PWD and some more. This makes sense,  because environment variables are stored   on the stack - at the beginning of the stack. So apparently this person already was successfully   exploiting the format string vulnerability and  leaked stack variables. But they didn’t leak the   flag. The question for me now is, how was this  output generated. There are two possibilities.   This could be a script decoding output and  make it printable text, or the person used %s,   in the format string exploit, to leak strings.  That’s why I asked about more details. I don’t   want to guess what you did. Show me what you did. And so they sent me their script. “Here is my python code. What am I doing  wrong? I connect to the challenge server.   Then iterate over a few offsets, and send  a format string input with %s to it.” Based on this code and the screenshot I was  actually 100% sure about what was going on. So let me explain what I was thinking. Let’s start with a recap how the format   modifier %s works, compared to for example %d or  %x. When you write code like printf(“%x”, value);   Then the compiled assembly code, will place the  value on the stack as a literal value. There is   some 32bit and 64bit differences. 64bit calling  convention passes early parameters via registers.   But doesn’t matter, later values are taken from  the stack. Thus %x, or %d literally print the   value found on the stack (or a register). But  %s is different. printf(“%s”, string); expects   a POINTER to an array of characters. And so printf  takes the value on the stack as an address, goes   to that address, and prints the characters there. Which totally makes sense seeing this screenshot,   because there are pointers to those environment  variables on the stack. The reason why is not   surprising. The third parameter of main()  in a c program, is a pointer to an array   of environment variables. So the code  that setups your program and calls main,   will place pointers to those strings on the stack. So going back to the script, if you now use %s to   print different strings from the stack, this  is done here with the dollar $ iterating over   different stack offsets,, eventually you  might reach such an environment pointer,   so printf follows it and prints the string. Which explains the evidence I am seeing.   But keep in mind, I’m only this confident in my  guesses, because this is a very simple challenge   that I have done many times. A little bit  more complex challenge, and I would have   no clue what is going on. Anyway. If you have a bit of experience   with format string exploits, you might  even already identify their mistake.  Instead of using %s to leak strings, you should  use something like %x to literally leak the   values on the stack. You will leak them as hex  values, and so you have to decode them first,   but you leak the actual values on the stack. Anyway. Now that I know the most likely solution,   I could just tell them, but  I don’t like to do that.  Especially when I don’t know if this is some kind  of active CTF. I’m not your CTF cheat code. I   want people to learn, so I like to have a back  and forth conversation. And that’s why helping   people costs time. So I ask, where is the flag? Because even though my solution idea might work,   maybe the task of the challenge is completly  different. Maybe you need to get remote code   execution. Again, not enough information given. But they respond with:  “FLAG curly braces”. To which I respond:  “Not what is the flag. WHERE is the flag? What’s the goal?  Do you have the binary or  is it blind exploitation?  Is the flag in memory somewhere? Or  do you need remote code execution?” “Here is the post. It’s the  only info i’ve been given”  Stack bleeding - this program will echo  anything you write. Can you exploit this? “Mmhh okay. „stacking bleeding“ sounds  like the flag should be on the stack”. Now we can actually go into the technical  details. I respond that “I have an idea   to try”, because now I think I know their  problem, but first I have another question. “What exactly does %s do?”. I ask this, because I  want them to carefully think about what they do,   so they can figure out their own mistake.  Basically I look for an answer what I just   explained, that %s takes a pointer from the stack  and prints the string at the target location.   It doesn’t print the actual value on the stack. But I get the response “just halts it  if I remember correctly. Doesn’t echo”. “I mean. Your exploit uses %s  So what do you try to do with that? What does that format string do?” “So if I go up the stack using  smth like %2$s if it has a string   it echos. it if not seems to halt / doesn't echo” “Okay. Can you describe to  me how %2$s prints a string?  I know the question might be a bit confusing  I try to get to a specific detail and  difference between %s and for example %x  Why does %s not work sometime? Why does it break? And why does %2$s not break”. You can see me here hinting again at what I  want. I directly ask what the difference is.   I want them to tell me, that %s follows a  pointer on the stack, so if a value on the   stack is not a valid address, then printf will  fail. It will break. But %x would not fail,   because it literally prints that value. But it didn’t work yet. “So basically %i$s tells it where on the stack  to try to print a string from , the author told   me bleed the "stack" so I think it's a play on  words. Why %s breaks it sometime I have no idea” “Did you watch my videos about format strings?” I know it’s a bit unfair to say  that, nobody watches all my videos,   but those are all topics that I have covered. At  least I think so. And so sometimes it makes me   a bit sad when I get questions that  would be answered if they would remember   EVERYTHING I said in videos. I know  it’s a bit unfair. But… urgh… Anyway. Next, I try to ask the same thing in a slightly  different way, because I’m trying to figure out   how to ask, so their brain responds to  it properly and realizes their mistake “Does %s print strings from the stack?” This is a bit of a “trick question”, because  the answer is kinad yes or no. Depending how   you interpret it. But I was thinking about it  this way: %s follows pointers on the stack and   prints strings where they point to. So those  strings could be ANYWHERE in memory. They don’t   have to be on the stack. If you realize  this, you would figure out your mistake. ‘I did but there's a few”. So I  guess they did watch my videos.   Also I just checked my scripts, in  episode 0x1e I briefly mentioned it there. “Also because we already know it will  be about a format string vulnerability,   we can try to inject some characters  such as %x, but again, nothing happens.  We could also try %s, because if you remember,  it will take values on the stack as the address   location of a string, thus if values on  the stack do not point into valid memory,   the program should crash, which would be another  indication of a format string vulnerability.   But nope. Also doesn’t do anything.” But I understand, there is so much information  in so many videos, you might not remember that.   And I guess I wasn’t super clear either. So.  alright. BUT there are also TONS of other   format string exploit tutorials online too. And literally the first result when I search   for “format string exploit  what does %s do” writes this. “The printf in the second line will interpret  the %s%s%s%s%s%s in the input string as a   reference to string pointers, so it will try  to interpret every %s as a pointer to a string,   starting from the location of the buffer  (probably on the Stack). At some point,   it will get to an invalid address, and attempting  to access it will cause the program to crash.” You can see, you can help yourself a lot if  you google around. And so I was hoping this   person would explore my questions further, even  if it means they need to research more. Anyway. “I know it prints the string at  %i$s if there's one there it seems” And this screenshot shows that %s seems  to crash the process. Which makes sense.   Probably the first address on the stack (or  in the first register) is an invalid address. Okay. clearly the person doesn’t  catch what I try to get at.   So let me try to ask differently again, and  this time start even earlier with basics. A: “Ok let me ask differently. What  are strings? What is a string in C? B: “A char array’ A: Cool okay!  Do let’s say you have the char array  „AAAABBBBCCCCDDDD“ on the stack. And you print  1) %s 2) %2$s What do you think it will print? Just try to have an educated guess” Again this is a bit of a tricky question.  Because both answers are, it will crash. But   I deliberately ask it this way, because I figured  they would say the first one prints maybe AAAA   and the second one BBBB. Which would show  me they have a wrong understanding of %s. But their answer surprised me: B: ““1. Would do nothing  2. Would print that string because  2nd offset is it's address?” A: “Aha! Okay!!!! I mean,   That was COMPLETLY wrong! but  you said something important!  1 and 2 would break. BECAUSE  %s looks for an address.  %s Takes the address and prints  the string at that location %s would try to interpret “AAAA” as an address. And %2$s would interpret “BBBB” as address  (On 32bit at least)” B: “Interesting” A: “So. How would you print the actual value   on the stack? And not interpret that  value as an address to a char array” B: “So what do we can do if it thinks  they are addresses if they are strings  I can get the address %x” Oh we are getting there slowly!  I think they almost got it.   Now I wanted to try to check their  understanding, with another question. A: “Let’s do the same example. AAAABBBBCCCCDDDD  is on the stack. What would it print:  1) %x 2) %2$x?” B: “1. Prints the address 2. Prints that addresses contents?” Ah crap. That is completely wrong. “I  think you misunderstand what 2$ does” B: “I thought that meant where on the stack” A: YES! Exactly! So why do you say  that 2) prints the address content?  Why does 1) and 2) do something different? I mean you just said it says WHERE on  the stack it prints the value. So it   sounds to me like that 1) and 2) do  the same. Except they differ WHERE   on they stack. They don’t differ with  WHAT they will print. Print a value. B: “Because 1 is just the first thing  on the stack printf gets and 2 is where” A: “No that’s wrong. And that’s  not what you said yourself”  But then I stare at it again and I realized  that there is a miscommunication happening. “Ahh no yes it’s correct” what you say. “I  see the confusion about the word “where”.  They thought I focus on: `here it’s  only %x`, and `here is the dollar   offset notation`. So they thought I was  asking about the meaning of the dollar.   But no, I just wanted to know what the  expected printf with these conditions is. So I clarify. “%x prints the first value on the stack. And $  decides where. So %2$x is the SECOND value on the   stack.” Maybe it’s the third. I don’t know how  to count. But that’s not important right now.” B: “Yes but it's %x hex and we need a string”. Uhmmmm…. mh! I think they understood now the  problem. But, they still have some confusion left.   They think their goal is to leak the flag, the  flag is a string, so it must be %s. Because %x   would be hex values. So now I try to show them,  that it doesn’t matter how it’s represented. A: “What is 0x41414141 as a string?” B: “AAAA” A: “So……? If you can easily convert hex to string?  Why do you care that it’s hex and not a string?” B: “It should tell us the string in hex” A: “Why? Why do you care - if PRINTF  converts the bytes to readable text for you,   or if YOU take the hex and convert it yourself?  It’s all about leaking values from  memory. And that seems the goal here” B: “Because %x is hex so we  convert to string from hex” At this moment I thought, why are you  repeating the same thing. Didn’t we   just realize that it doesn’t matter if you leak  hex values or strings? So I got a bit grumPy. A: “Why do you care??? Look. When “AAAABBBBCCCC”   is a STRING on the stack. Then you cant leak it  with %s. I explained why. But you can use %x!!!” As you can see, I basically  just stated the solution now. “The output will be in hex.  But if you do that, the leaked   hex bytes are still part of that string? So why do you care that the leak is in hex.   The values you leaked are from a string.  Now you just have to convert them back” B: “I don't care if it's hex, you said so above,   and I was replying to that  and it went down from there” Urgh oops okay. Miscommunication again.  I thought they still don’t want to use   %x because they want a string. But  they were just stating a fact. My bad.   That’s the issue with messages, sometimes  context is lost. And so I got a bit impatient.  Whatever. It’s forgotten. Let’s move  on. “Oki. So what’s the problem now?” B: “I have to test it” A: “I’m waiting” 5 minutes later I get this screenshot.  “Kinda seems like just addresses”.  I think around this time I was also back  at home. So I didn’t immediately respond.   15 minutes later they added. B: “ye so far as i keep going up I'm  not seeing anything that looks like text  when i put it into hex editor” Mh. it’s sometimes not easy to recognize a  flag in hex output. So I thought I suggest   them to convert the expected solution string  “FLAG” to hex, because then they have the hex   values they need to look out for. And might be  able to quickly identify it. Thus I ask them: A: “Can you convert FLAG to hex?” But in the meantime, I was also back at the  PC, and I could now ch eck those bytes myself.   Of course I have to type them and can’t  copy them, because it’s a screenshot.  And I immediately saw some bytes that looked like  text. Checkout my “ey! Look for patterns” video   if you want to know how I knew that. Anyway.  Those first bytes translate to “ ehTssap”. And   that is printable text! And maybe it could  be the string “The pass”. The byte order,   so the endianess is f’ed up, so when we fix  the 4 byte endianess, you can see it too! By the way, I’m using this cyberchef tool for  this. In an exploit you would of course write   that stuff in python code, but it’s great  for playing a cround with data like here. Mh… but it looks like FLAG does not  appear in it. “The password ...”. So maybe   that was their issue. Maybe they saw this, but  not the flag. And at this point I also thought   maybe it’s not the whole leak. Sometimes in  format string vulnerabilities when they appear   in a combination with a buffer overflow, you might  leak the string itself, because you your format   input overwrites the whole stack, rather than the  leaking the original bytes you want. And I noticed   here those bytes at the end are basically %x. So  I thought this is what might have happened here.   maybe it’s not the complete leak yet. The solution  to that is to not have a ling %x string like here,   but have short format strings, and use the dollar  offset to l eak individual values. And so I said: A: “Ahh I see your problem. What do  you get when you convert those hex   values? What kind of string snippets do you see?” At this moment I wanted them to recognize  that they did leak some actual text,   but also there is this issue with the buffer. But look at the screenshot they sent “went from  %1$x - %100$x and didnt see anything hex that   looks like a string”. But that output makes no  sense. You saw me just converting some values.  BUT, the difference between this and that  is that, they said they ran the script again   to iterate through offsets. So  these bytes are not the ones from   here, but from them running their script. So OF  COURSE, I wanted to see that actual script output. A: “Can you show me that output?  The screenshot only shows %x” B: “sure”. And now this screenshot  shows all offsets by hand in one input.   Goddamit. That’s not what I meant.  If there is really that issue with   the buffer overflow destroying the stuff on  the stack, then this doesn’t help. We need   the output from the script that does it in a  loop with short format strings. So I respond: A: “a few mistakes you make there.   Why don’t you use the program you  showed me? Change the s to a x?” And now they deliver the output of that script. B: “ok, the issue with it seems   to skip numbers. beside the hex number  is the i number in the for loop” And that screenshot is indeed  weird. Why are there only a few   outputs. I’m getting confused  too. So now I need more info. A: “can you show me more?  maybe copy it to pastebin or  secret github gist or something” I get this pastebin with the output.  And indeed some outputs are missing,   which is really weird. But some hex snippets  still look like printable strings. So I ask: A: “and have you looked at  some of the output there?  any printable string snippets?” To which they respond with this screenshot,   and YEAH, this looks kinda good. At least  the initial “The” in wrong endian is there. A: “doesn't this sound interesting?” B: “sound interesting i guess…” COMEON! YOU ARE BASICALLY THERE.  PLEASE WALK THE LAST STEPS NOW. A: “The  ... is:  ... 0v3_” i mean... sounds to me like  it's part of the solution. I   could imagine that the complete  string would be something like: The flag is: FLAG{...” So as you can see I was still expecting to  see the FLAG string to appear in the output,   and it wasn’t there, which  meant we have some other bugs.   Maybe the reason for that is that the  script itself is buggy. So I wrote: A: “now we need to just figure  out why some outputs are buggy”” B: “it is, need to fix the python  script first make it a bit easier” So I kept thinking about this bug, and  I needed a bit more experiments to drill   down onto the potential issue.  First of all I wanted to know   if the issue only happens when the script  executes, or if it also happens by hand. A: “does it ALWAYS fail to print those?” Apparently offset 8 didn’t work,   so now I’m specifically asking: “can you do by hand: %8$x” B: “something like that i think the issue is the  remote server expects inpute to give broken pipe  i mean 2nd input like a space like you have to put in what you   want it to echo then hit enter then enter again  to exit and I think that's why my script fails” Because they didn’t do the test and send me the  results, I though, “goddamit. I do it myself   then”. So in the meantime I just tried it by  hand to print the 8th offset. And it worked! A: “I can print it no problem. so something  might be broken with your script or so” B: “i think the 2nd part is the server is waiting  for another enter and the script isn't giving it” A: “yeah dunno.” those explanations sound a  bit weird to me. But I also thought that it   doesn’t matter, because “you can also now  solve it by hand”. If the script fails,   just do the %x by hand! Please just solve it now,   so we are done with this! But I couldn’t help myself, so I   did look at the script again, and thought about  it. And then I realized what an issue might be.  “maybe you try to recv() too fast.  server is not fast enough with   responding. try to add a time.sleep(0.5)”. 20 minutes later no response. “so. any result?” B: “offset 6-20 I get this. If I  clean it I get this. Which is that”. And they even sent a screenshot along!  And look at that! This is readable text!   You just have to fix the byte order,  the endianess. So I give some hints: A: “well... you are getting close  can you fix the endianess? " ehT" should obivously be "The " B: “I thought it was endianess when  I saw it because it was backwards” Very good! You know about endianess, glad  I don’t need to explain that. “so fix that” And then they sent this.   “%15$u6s4t_bf0rm0v3_ I_l is:wordpassThe” “ill try again”.... Uhm yes please. “yea, i don’t know, but the flag has  always been FLAG{} so I don't think its it” Sure, but it doesn’t matter. You are clearly  leaking something that seems important for   the challenge. So I ignored this and mentioned  again “You are dealing with the endianess wrong” And now they fixed it: “The password is: I_l0v3_f0rm4t_bu6s�%15$” Some code they probably used  to swap endianess. Very good.   And then they write thanks for the help. But I  wasn’t sure if they solved it now, so I asked  “is the flag FLAG{I_l0v3_f0rm4t_bu6s}?  did that work?” After spending so much time on this, I need  the satisfaction of having it solved now! “It did but everyone iv done before had  FLAG{...} idk where it was in that one” Urgh okay. Well. It’s solved! “maybe not the best   challenge design ;)”, when the FLAG{}  always appeared before, but whatever. They solved. Congratulations. Clap clap. Btw, maybe you noticed this, I only  noticed it when making this video,   they actually had the solution already at  this output. Which was about 50 minutes   after we started writing. I was just too lazy  to type the bytes, and apparently they were too. So as you can see, really trying to help people  takes time. We started talking around 12,   and it was solved around 3pm. I plotted  all the messages on this timeline here,   just so you can visualize how much back  and forth this is. It’s just messages,   but you can’t do anything productive in parallel.  Constantly you get distracted from those messages.   So this was me spending like 2 hours or more on  this, and in the end I only helped one person.   I hope you understand why I don’t have the time  for everybody and why I generally prefer to spend   those two hours working on videos for all instead. But then there was this idea to turn this   conversation into a video and now  hopefully you learned something too.  Which leads me to the following question.  In my imagination a video like this here   is a really good learning resource, because  troubleshooting and working through issues,   is the best way to learn. Much better than  just me telling you how something works.   So please let me know if this kind  of video style is actually engaging,   or if it’s boring, and if you want to  see more videos like this, or rather not.
Info
Channel: LiveOverflow
Views: 51,118
Rating: undefined out of 5
Keywords: Live Overflow, liveoverflow, hacking tutorial, how to hack, exploit tutorial, pwnable, format string, binary exploitation, john hammond, format vuln, printf, leaking stack, help, mentor, guidance, tutorial
Id: F6UerHkVdLA
Channel Id: undefined
Length: 24min 58sec (1498 seconds)
Published: Thu Feb 25 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.