Forensic Investigation of Emails Altered on the Server | SANS DFIR Summit 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] thank you they for the intro and I'm excited to talk today about investigating emails altered on the server just a little background I'm the founder at Marist pike where we develop digital forensics software for the cloud with a focus in forensic email preservation I'm also a forensic email examiner generally a forensic examiner that does a lot of email investigations so email forensics is near and dear to my heart and this presentation is actually partially based on real-life scenario that I encountered so I'm going to give you a little bit of backstory on that first before we get into the slides so let's go back in time by a few years let's say five years and meet a forensic examiner let's call him Jim and Jim was a great guy a very experienced forensic examiner and he's retained by a law firm to forensically preserve some emails so they tell him hey you know we need you to connect to this AOL mailbox collect a few relevant emails not the whole thing this is not really a forensic investigation at this point just a disclosure exercise so we want you to collect these emails they might be important later on so he does that so what he does is he actually sets up his outlook he gets the credentials from the end user connects to the AOL email account let's Outlook do its thing and sync the emails and gets a local output data file an OST file it's filled with emails from the mailbox so the next thing he does is because he doesn't want to preserve or he's instructed not to preserve the entire mailbox he takes that OST file and he puts into his favorite investigative tool he runs some keyword searches there he identifies those few emails that are really relevant to the legal action he exports tools out of the forensic tool as EML files and this the original applicator file so he's done with the preservation he tells the legal team that they're good to go and a few years past but now the authenticity of those emails gets called into question so Jim's asked to testify now about those emails and he's in a bit of a pickle because he's made a few mistakes number one he did his email preservation work with Outlook which is a general-purpose email client it allows two-way sync so he could he had the potential to make some changes to the original evidence there number two he collected the whole mailbox in and out of data file but he didn't keep it so he just kept a subset of it and discarded the original items that he collected number three the items that he has left EML files are not an identical original copy of those emails as they got transmitted from the server because they've been through some conversions already when he got them into Outlook Outlook converted them and then he got them into the forensic tool and then exported them again to EMF format so they've been through a couple of conversion so they are a little bit different than what they used to look like so Jim goes understand and testifies and his opinion is that he says he collected these emails directly from AOL he's the end-user his ultimate client does not work for Al does not have access to a our servers so he has no way of really altering his emails that sit on the server and he says he kept good Chain of Custody records he hashed the emails who is confident that nobody has changed those emails since he collected them and he is confident that his the end user client does not have the technical sophistication to change those emails on the server so he says these emails are authentic there's no way they could have been changed I found them sitting on the server but there are a couple of mistakes here in Jim's reasoning number one let's think about his assumption that the emails are unalterable on the server that's not right because an email server is designed precisely for that type of interaction so you can connect to an email server there with an API like Gmail API or exchange Web Services API or through a protocol like IMAP and in all cases there are methods API methods or IMAP commands that would allow the type of modification of emails on the server you could import messages to the server you can copy your messages around modify them change their flags so there's a lot of functionality built in that allows you to actually make changes to email messages you don't have to have special tools all you need is an email client really so that's where he makes the first mistake so in this talk I'm gonna generally cover what an end-user could do to very easily alter emails that are on the server as a forensic examiner what you can do to preserve those emails correctly so you have all the information you need to investigate them and finally what you should look at to determine if there is something off these emails if there's something that doesn't add up so let's start with an example and I picked an email here that's a very simple email with a couple of lines in the message body that talks about proceeding with the transaction apparently has been sent july 2019 and it's from a yahoo email account or domain called proximity comm and the email of mail exchanger records of proximity paying to go there a point to GoDaddy so the email business is handled by GoDaddy in this case so what I did was I connected to that GoDaddy account pull down the whole mailbox I have a copy of it in my Outlook and I opened up this very relevant message so let's say that I have some motivation to change this message let's say that proceeding with the transaction doesn't really go very well with my narrative in this case so I want to change that to say whoever Santa smashes message said to cancel the transaction instead so if I wanted to do something like that with something as simple as Outlook then I would go do this I would go to the Actions menu and say edit message and I would just point to where that part of the message body is and change it done and save so let's think about what happened where we have now so this hasn't changed the message on the server yet so I've done now is I have a local copy of this message in my Outlook in an OST file and when I instructed Outlook to edit it and save it then my local copy has changed and outlook actually is inserted a bunch of artifacts into the message body that have to do with outlooks formatting of the message body it added some message headers that say that this message has been sent by Outlook but it hasn't pushed anything to the server yet but what happens when you let Outlook go through the send and receive cycle so if you let it sink with the server it does this so if you listen to the network capture the packets with something like Wireshark for example I'll take a look at what it's doing in the background it issues a few commands back to back first it issues and IMAP a pan command so the apparent command is something you would use to add a message to an IMAP servers one of the folders in the server you would issued a pan command with some optional arguments so you could pass a Flags argument to tell it what IMAP Flags you are the end result to have you could pass an optional timestamp argument so that affects how the end result the resulting message has an internal date and finally would pass a message literal so that's the actual contents of the message that you want to append so essentially what it is what output it is it got the message that I modified and it passed that as an argument to their pan command and added that to the end of the Inbox because I was in the Inbox and then it sends a command UID store command to set the deleted flag on this old message so what it's doing is it's marking my old message for deletion hasn't deleted it permanently yet but it just set a flag there and right after that it issues a new UID expunge command which actually permanently deletes that message to purchase it from the server so if you're like me you might be thinking could I get my hands on the old message because that would be pretty powerful here there's the send user that claims the message about cancellation is the one and only true copy the message if you could dig up that old copy that says proceed with the transaction and present them side by side that's pretty powerful evidence that some forgery took place so as it turns out if you're dealing with a plain IMAP server like yahoo or AOL or something along those lines or GoDaddy then the server doesn't have any mechanism no API methods or IMAP commands that will allow the end user or the forensic examiner connecting to the client to retrieve that purged message but what if you're dealing with a more sophisticated server like an exchange server well the good news is if you are dealing with an exchange server then there's the recoverable items folder so this folder is a special folder if you connect it to a mailbox you'll see that a mailbox has multiple sub trees so one of the sub-trees is the interpersonal messaging sub tree that holds the visible folders that you see like the inbox and things like that and then there's another non IP M sub tree which one of the items there is the recoverable items folder so this is used by some of the services like the litigation hold in place hold single item recovery and one of the folders there is the deletions folder so this this holds the soft deleted items so essential items that got deleted from the deleted items folder or got shipped deleted directly from the mailbox to bypass the deleted items folder so in this case if you look there you would see the original copy of our altered message so for clarity what we did here is we have a message that got changed on an exchange server through IMAP and then we examined the recoverable items folder and found the original copy before the modifications in the deletions folder so this is the new name for the exchange dumpster essentially so this is pretty powerful and it's actually a very good data point to include in your email preservation you wouldn't get the recoverable items folder through regular email clients like Outlook or connecting through I mat but you would access it through an API like exchange Web Services API and there's good reason to include that in your standard exchange preservation workflow for deletions discovery holds and things like that so what if in this case I changed just a little bit of the message body but whatever what if I wanted to make some more substantive changes to my message like I wanted to backdate it for example I don't want to just change the message but I want to roll it back a couple years to better fit the time frame of my litigation so one of the ways you could do that is instead of using Outlook you could just pull down a mine version of that message and put it in a text editor and then you can start editing away you can just change the dates manually so in this case I'm setting them back to January 2018 the dates are in a few places so I'm just covering all those bases so I'm not setting up the origination date in the header and then I'm gonna scroll down a little bit to get that message body - so change that with cancel so in this case I'm not using Outlook but I'm gonna use Thunderbird and I'm gonna manually upload it to the service I'm just gonna drag my new message and drag it over to the Inbox so what happened now well Thunderbird did essentially the same thing it issued that app and command and added this message to the end of my mailbox inbox in this case but it didn't do the rest it didn't do the setting of the deleted flag and purging the message just yet so if you wanted to do that or if the bad guy wanted to do that he could go and delete that message himself and purge it and erase copies of it and now that we've gone through this we actually successfully change the message on the server and it looks like this so if we go through GoDaddy's webmail interface now we have the fake date here that points to January 2018 and we have the message button that says instead of proceed with the transaction it says please cancel so if somebody were to come after him and look at this in Go Daddy's webmail on the server it looks very much like a legitimate message but it's been altered through just a couple of simple steps so let's think now about what we can do to get to arm of this how we would effectively preserve this mailbox to get the information that we need and what we would look at to see what to detect this kind of malicious behavior so as far as preservation goes their three keys as three key aspects to this you need to preserve as close to the original format as possible so when you connect to the server through Gmail API exchange Web Services API or IMAP the message comes over the wire in my own format and RFC 5322 so you want to have a good copy of that message in Jim's case he doesn't have that because he's already run it through some conversions and we'll see in a little bit how this affects our examination if we don't have that good pristine version then there are some things that we just can't do and then in addition to the message itself you want to get server metadata I'm going to talk more about this in a little bit but this is essentially that's some metadata that's out of domestic outside of the message about the message and finally you want context so if somebody says hey get this message and authenticate this you don't want to just cut a preserve that message you want to get its neighbors ideally that whole folder or perhaps the whole mailbox if you can just we have some context when you're examining to look at its neighbors and see if your target message your messages are making sense when you look at them in the context of their neighbors so server metadata what I mean by this is information about the messages but not inside of the message so not within that my message itself but on the server so it's kept on the server and because of that it's a lot of times it's not acquired along with the messages so you have to make an effort to go and capture that and either bundle it with your messages or keep it on the side but have it have a record of it so you can do your examination so for example well let me talk about server metadata I like to think about the levels of access we get to email evidence so in some cases you might start off with just a printout of an email happens to me a lot of times it's sad but common you might have the email message in native or near native format so essentially in the original format where it was created or maybe some type of conversion that's drive from there and then in some cases you might have access to the email account on the server like Jim so somebody gives you credentials for that mailbox and says they you can connect to this and collect emails there and finally you might have access to the entire mail server so it might be a friendly situation where your client allows you to just sit in front of their server you have access to administrative access to the server you can examine the logs preserve the whole thing if you wanted to so in the last two of those scenario you would have an opportunity to preserve the server metadata so for example if we were to think about an IMAP server a few things come to mind in terms of server metadata one of them is the unique identifier message attribute one of them is the IMAP flags and finally the internal date message attribute the unique identifier message attribute is a very cool artifact because it's a 32-bit integer and it's assigned in an ascending fashion so that's very key so what happens is when new message when a new message gets added to the mailbox it gets the largest UID number so that puts things in perspective and you look at the messages in context so if you have a message that has a larger ID then that tells you hey this must have been created here after the other ones and not necessarily contiguous so there can be gaps in the numbers so what that means is if you have a message got that got deleted from the middle the server doesn't make an effort to remove those gaps and compressed the items there are gaps and you can see them so that gives you some clues as to what might have been deleted from where internal date is again a server-side attribute so this is not something you find within the message it's on the server and you can think of this something along the lines of a filesystem creation timestamp so usually it indicates the date and time when a message is created in a folder some factors affect this for example if you get the message directly from SMTP so there's a message that's arriving normally as an email message to the mailbox then you would expect an internal day to mirror the time of final delivery so that would be the last trace field in the message header the last field that says received and there's a date and time the server would usually take that and put that as the internal date of that message that's coming if you migrated dismisses somehow so for example you issue that I'm at copy command which is similar to append but instead of taking a message literal as an argument it takes the idea of an existing message and it copies it from one folder to another then in that case the internal date message attribute will be preserved so the dye map server would take the date of the original message and apply that to the new message and in in the case of the IMAP app and command which is what we used it depends if you issued IMAP app and command without a timestamp argument then your internal date would reflect when you issue that app and command if you issue the command with a timestamp argument and that timestamp would become the timestamp of the new message so we're gonna see how that plays into our investigation so let's take a look at our message and the metadata we have on the server surrounding this message before the manipulation so I have the message here at least the beginning part of it and when I look at the server so I put the you IDs and the internal dates side-by-side so when I sort by UID I see that they're in sequential order there are no gaps in that neighborhood and when you look at the dates internal dates they're also in chronological order and everything is lining up life is good if you look at the highlighted parts you see the last trace field up there which is the time of final delivery and that matches with my internal data on the server so this is good so let's look at this now let's look at what happens to this after manipulation so after I changed the dates and times and uploaded this back to the server now you see that the UID changed and it got a larger your ID so I have some gaps there and after that this message got UID 931 which is higher than its neighbors but the internal date is in 2018 and it matches the last trace field again at the top of the message so there's a discrepancy R you would expect that if this message has the largest UID here it should also have the latest internal date or something something off is happening the other alternative is if we have issued IMAP app and command without a timestamp argument then it would look like this now if you look at the internal date it matches so when you look at the internal date it matches the time when the issue the pan command but it no longer matches the last trace field so it no longer reflects the alleged time of delivery of this message so there's a discrepancy here - we have a message if you look at the server metadata in isolation then the you IDs are sequential the internal dates are in crown order so it looks good within the server by itself but if when you compare that to what you see in the message then the dates in the message and the dates that are on the server about the message are conflicting so you can tell right away that something happened to this message and the minimum it got moved or migrated if not it got modified somehow so when we talk about server metadata I don't want you to think that this is limited to IMAP only all kinds of servers have server metadata in the case of Gmail for example you would get things like thread IDs which is Gmail users which is what Gmail uses to decide on conversation thread there will be label IDs the IDS of the labels that are applied to each message there are history records which are pretty interesting so these are something that facilitates easier syncing for email clients so they would be records that indicate the changes that took place in the mailbox and they're associated with IDs of messages and labels that are affected by those changes so this is pretty powerful because you could look at some history records and try to correlate them with your message and if the dates of those history activity doesn't line up with your apparent date of your message then that could be a problem there as well and finally in Gmail there is also the internal date which is similar to a Maps internal date same thing with the exchange so in an exchange server you would get a large number of server metadata fields all these mapping properties depending on how you preserve from the exchange some of those properties might be in your MSG file or PST file or SD file or in some cases if you got a through exchange web services API through as my message then those properties would be on and you would cure them separately and you want to make sure you get what's relevant to your case and server metadata so now that we talked about server metadata let's talk a little bit about message data and metadata so we looked at the server stuff so this is the traditional artifacts that you find within the message more commonly looked at by forensic examiners so some things from the message header like the trace fields references field DCAM signatures mine boundary delimiters and then the message body itself so let's take a look at our altered messages Heather and let's see if we can spot anything strange going on here so I'm gonna start from the bottom and go upwards a little bit and at the bottom I see a field called content length so this is a field that I like having to give you a brief intro on that so content length is a field that indicates the bite size of the payload of the message so if the message looks like in the figure and if the white part is the header and the yellow part is the message body that follows the header then the contact length is the number of bytes in the payload including the encoded attachments and everything one tricky thing about this is that when you're counting the bytes to do your calculation you want to make sure you count them in UNIX format rather than in Windows because if you count them as two characters each then that's gonna throw off your calculation so let me go back to this message and look at content length we see 700 so that tells me that this message should have 700 bytes in its payload that starts right after that content length marker there but when we do the calculation we see that it's actually 688 why did this happen well if we think about it what we did was we went in there and we removed some characters from the message body so the it originally said proceed with and we changed that with Castle so we replaced the 12 character string with a 6 character string but this the discrepancy here is 12 characters not 6 so how did that happen well that happened because there were two versions of the message body so there's the text version of the message body and there's the HTML version so we replaced both of those so two replacements six bytes each so we made it 12 pi changed in the payload so we see one discrepancy here right away and it's kind of a red flag but if you think about it you see that the contact length field is actually pretty vulnerable because it's just a plain number there and if the bad actor has any knowledge of how this works they could very easily modify that number along with the message body - there is nothing that prevents now from doing it so it's it's useful in those cases where nobody notices and you know about it it's Yahoo and AOL actually like use this field so it's it's good to know about it another thing that we spot here which is a lot more powerful is the deccan signature so the dictums signature again to give you a quick intro to that is essentially a signature that's calculated by the sending entity in this case yahoo.com and what yahoo did here is it took the hash of the message body and a selection of header fields like the two from subject and date for example and it signed them with its private key so we see that signature there so what we can do is the recipient or examiner is we can go fetch Yahoo's public key from DNS records and do the reverse to verify the signature and if the signature verifies then we have some confidence that this message may not have been altered since Yahoo signed it but in this case it fails it fails because we made a bunch of changes it fails because we change the body which is part of the victim signature we change the date field so that's the origination field we made a change there so all in all the dickham signature is not valid anymore and this is pretty powerful because as the bad actor would not have Yahoo's private key they would have no way of recalculating this signature so one thing they could do is they could remove this header field to cover their tracks if they know about it if it is within the time frame where you who started using the keep signatures then you would know you would notice the absence of it and that would be a red flag itself hey how come this messages from Yahoo but there is no decamp signature but if they roll it back so far that Yahoo didn't use to use they come signatures back then then that's a bit of a different story something else here is this got a little bit shifted in the screen here so you should look a little bit higher than that to the left there's a number there right in the if you look at the content type header field there there's a section the boundary that goes part and then there's a number and there's another number so that's a that's an epoch date so there's a hidden date term and let's think about why that happens so there's a boundary delimiter there and the reason for that is the my messages different sections so it has the text message body and has the HTML message body in some cases it might have attachments so to separate those mind parts there are mine boundaries and when the server creates a boundary it wants it to be unique because you don't want that boundary to be elsewhere and the message body would throw things off so when servers are calculating or creating those mind boundaries some servers like to include the current date there and in this case Yahoo server included the current date with millisecond precision in the boundary delimiter so that's a pretty good clue that's actually that points to the sent eight of our original message before the manipulation so this would be something hard to explain for the bad actor how come there's a date from 2019 there and nowhere else in the message do we see that and when you look further there are two more again shifted a little bit but we see the same date in the message ID header field and in the references field it's about the same number but one millisecond later so this ends with 410 so just one millisecond later so this is created when again the server calculated the assigned the internet message ID to that message which is again something that should be unique so there's benefit to using the current time current timestamp there let's see if that might be anything else that I'm missing and one more thing where this epoch date is hidden is in the DCM signature right where it starts with not where it's pointing right now it's shifted a little but where it starts with T the T tag there so the T tag is a number again and that's the time Stan when the Dickens signature was calculated and it's supposed to match this received date in the head or in this case it doesn't match anymore because we modified that receive date so we got a few flags here if we scroll down further to the to the message buddy again we have a few boundary flags there all with the same epoch date so things are not looking very good for this bad guy so we've got quite a few clues from the altered message body so if you think about the verification you might think hey this is something that the email service provider would do how can I incorporate this into my workflow well very easily you could either use a free tool like Deakin verifier for Perl or if you're using Python you could use Deakin PI or you could do it manually if you read the spec it's actually pretty easy to do the calculation yourself pull the DNS record for the public key and verify the dickham and while we're talking about these artifacts in the message header that are kind of hard to change I'm going to talk a little bit about one that is one of my favorites it's the conversation index so this is a value that Microsoft uses for email threading in this case we're seeing this in hexadecimal format but you could also encounter this in the message header in base64 format what it is is that there's a 22 byte header and then it's followed by zero or more five byte child blocks so let's talk about why this could be helpful so if you break it down it looks kind of like this so the red part is the heather and following parts that are in different colors are the child blocks so this tells me that this looks like an email thread with four messages so there's the parent message and there are three children if you look at the header and you pass this out there is a file time value buried in there with pretty good precision 100 nanosecond precision so that's the file time really that represents when the base message the initial message that started email thread was submitted but after that there's also a good so the grid is useful because it's meant to be unique obviously and this is kind of helpful on situation where a lot of times when a bad actor wants to forge a message they start off at a temp message so they take one existing message they make some changes to it and then they try to pass it off as a legitimate message so if you have a message with a grid that matches another seemingly unrelated message with the same good then that's kind of a red flag and if you wanted to detect that and if you're examining a message you can scan the mailbox and pull out all the words from all the messages and see if any of them matches you're good here if they do then you would want to look at them more carefully to see why there is even the case dude are they in any way connected or did somebody take that message and make some changes to it the child blocks that follow have some time differences so they give you some idea as to how much time has passed from the submission date of the header message or the sent eight of the previous child message and it allows you to create like a framework or like a skeleton of how the email thread should look like so we would look at the second child the third child and when you put it all together it would look kind of like this so there's the header message and it's the first child with the time offset second and turn and these times are calculated in local time so that's something to keep in mind they're vulnerable to to time differences so if your computer's time is a little bit off then you would see that reflected in those times and the time offsets can be negative which is another reason why you want to look at those time differences because if somebody's time is significantly often that could cause a negative time difference in conversation index so would I rely on this single-handedly for authenticating a message no because there are a few instances where this may not match exactly what you're seeing in the message itself for example if you're replying to a long thread and if you just clear out everything in the message body and start fresh then what you see in the message body would not match the conversation index so it's not enough to say this message is fraudulent but if it is lining up perfectly then it would increase my confidence that the message is legitimate and likewise if it's not matching then it would decrease my confidence a little bit so takeaways from this presentation I want you to think about how to preserve your messages they should be preserved in the original format or s course as possible so you can do those calculations with the deacon signatures and the contact length in Jim's case because those things have changed so much he could not do those calculations anymore and he had no way to authenticate the emails fully it's good to capture server metadata along with the message metadata so you can do full analysis and acquire an examine in context so don't don't fall into the trap of just examining the one message that's important always captured at least with its immediate neighbors or within the full full folder so you get good visibility in to have your messages looking in the environment [Applause] [Music]
Info
Channel: SANS Digital Forensics and Incident Response
Views: 4,306
Rating: undefined out of 5
Keywords: digital forensics, incident response, threat hunting, cyber threat intelligence, dfir training, dfir, learn digital forensics, learn computer forensics, forensic data, forensics artifacts, free digital forensics, free computer forensics
Id: yaZ3HHyle3w
Channel Id: undefined
Length: 34min 15sec (2055 seconds)
Published: Mon Jun 22 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.