Email Header Analysis and Forensic Investigation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to a special episode of 13 cubed this one's a bit different because it isn't part of the introduction to Windows or memory forensics or malware analysis series nor is it a 13 cube short in this episode we're going to be taking a look at a topic to which many of you have likely been exposed and that is email header analysis and email forensics in the coming sections warned to be taking a look at two different emails both of which are purportedly from Apple we'll take a look at the associated headers in sublime text 3 which happens to be my editor of choice and the first thing I'm going to show you is a plugin that I wrote a while back that will actually color the various header fields to make it much easier to analyze so as we start looking at those two different emails we'll concentrate on some key areas that we can use to determine whether or not they are actually from who they claimed to be from we'll also cover some new terms or at least terms that are likely new to some of you along the way and we'll even learn things about SPF and DKIM and even cover D mark in brief so I'm excited to get started let's jump in okay bear with me for a moment because before we go any further we need to define some basic email terminology let's start with mu a and MT a the MU a is the mail user agent this is the client application running on your computer that is used to send and receive mail examples include Apple Mail Microsoft Outlook and Mozilla Thunderbird an MTA is a mail transfer agent the mta's job is to accept messages from a sender and route them along to their destinations examples include send mail postfix and Microsoft Exchange now back in the old days there was a protocol in wide use called pop3 or post office protocol 3 whose job was to allow a mail user agent in MUA to talk to a mail transfer agent an MTA for the purposes of downloading messages intended for the recipient the protocol is still in use today but has largely been replaced with another protocol called IMAP or internet message access protocol the main advantage of IMAP was that the mail could remain on the server instead of it being downloaded and then removed from the server as it was with pop3 this meant that your mailboxes would appear consistent across multiple devices for example if you checked your email account from your work PC and then later check the same account from your home PC the messages within your mailboxes would be in sync there are a couple of other terms you should also be familiar with and we'll be covering them in more detail later in the episode the first is SPF or sender policy framework in a nutshell SPF defines a mechanism by which an organization can specify a server or list of servers that are allowed to send email on behalf of that domain the information is published via DNS within a text or txt you record type for example as you can see here 13 cubed accom has an SPF record that specifies Zoho comm as being authorized to send email on that domains behalf if an email fails in SPF check it can be an easy mechanism we can use to detect spam the second term I want you to be familiar with is D Kim dkim which stands for domainkeys identified mail D Kim also uses DNS text records but in a different manner the owner of a domain such as 13 cube comm would generate a public/private key pair and publish the public key within such a record mail servers sending mail on behalf of 13 cube comm would hash the message body of a given message and then encrypt the hash with a corresponding private key which only those servers would have access to this creates a signature the recipient could then later obtain the public key via DNS and decrypt the signature the recipient would then calculate that same hash and if the hash matched the value within the decrypted signature confidence will be high that the message originated from the sending domain and was unaltered in transit okay so now that we've defined some basic terminology let's take a look at that plug-in I mentioned earlier it's hosted on package control located at package control do what the website you're looking at now and yes there will be a link in the description below here you'll find hundreds of plugins available for sublime text in the big search blink simply type email header and it should immediately filter the results here is the plug-in we're interested in it's simply called email header by richard davis so let's click on this and as you can see from the description this is a sublime text 3 cent tax highlighting plugin for email message headers nothing too exciting but if we scroll down and look at the screenshot you'll notice that it does make it quite a bit easier to look at the contents of an email header and understand what's what you'll notice that certain header fields and values are highlighted so as lovely as this screenshot is let's go ahead and switch over to the real sublime text 3 and take a look at the first email that we're going to be analyzing we'll take a look at how this plug-in works and discuss which colors mean what and then we'll jump right in and start analyzing our first message header as you can see I have a file entitled Apple email dot EML opened within sublime text 3 the plugin was written such that it would automatically be applied to any dot EML file extension if you'll notice the bottom right at the screen it does indeed say email header and this was automatically selected when I opened a file of that extension if however I opened an email header that had any other extensions such as dot txt or maybe I just pasted one into sublime text 3 all I have to do is click here and choose email header and the syntax colouring will be automatically applied by the way the colors are going to vary dependent upon the color scheme you're using within the editor I'm using for this particular example the dark neon color scheme anything in lime-green is going to be an RFC defined email header field more about that in a moment anything in magenta is going to be an X header field and anything in the pale yellow color is going to be an ipv4 or ipv6 address again these colors will not be consistent across all of the different color schemes that you can choose from the important thing to note is though there will be three different colors representing these three different types of data within our email headers now before we continue our analysis let's take a quick detour over to this I and a website that I want to show you because I want you to see just how many RFC to find header fields there are we certainly aren't going to be taking a look at all of them but rather we're going to be focusing on the ones that will appear in nearly every email that you're going to be analyzing so here's the website the link for which will be in the description below as I mentioned this is maintained by Ayane or the internet assigned numbers Authority and this serves as a summarization for all of those RFC defined standard message header fields now keep in mind that these are these standard headers you've likely seen X headers which are prefixed with X - put simply X headers are headers that are experimental or are an extension of those standard email headers it's common practice for mail providers to add these for spam filter information authentication results tracking and more we are absolutely not going to cover every possible header you may encounter as I page down through here you will notice that there are a ton of them like other 13 cubed episodes we're instead going to focus on the TLDR approach in other words we're going to just focus on what you need to know to analyze a basic email header and we'll concentrate on those most common headers that you're likely to encounter okay so we're back in sublime text 3 and for reference here's what the email were analyzing actually looks like as you can see it at least appears to be somewhat legit though we won't know for sure until we complete our analysis let's start with the received header which will always be present the first thing you need to keep in mind is that the topmost received header is the most recent and the one closest to the destination whereas the first bottom most received header is the one closest to the source so we'll almost always want to start at the bottom and read towards the top the number of these received headers will vary depending on the mta's the message first in transit from source to destination in this case we have three so taking a look at this example we can see that the first received line shows the message originated from RN to - MS badger zero eight one zero for apple comm and was received by mr eleven p zero zero I M dash bulken zero zero one dot me calm you'll also note the IP address in brackets Apple is allocated the entire 17.000 slash eight block of ipv4 address space by the way you can see this with who is so seeing a seventeen dot address at least gives me a little more confidence that it did indeed originate from Apple the next received headers shows receipt from 17.1 33.1 83-76 by 17.1 33.1 eighty 3.36 again both within the apple allocated address space and finally the last received header shows what we can assume our internal Apple Mail server host names with no associated IP address information these received headers are among the most important information you'll focus on when analyzing emails so moving along what about SPF related information well in this case we see none and our second example we will see SPF reference Tim will talk more about it then what about D Kim related information well that we do have let's break it down and see if we can understand this section the V equals field is the version of the D Kim's signature and for now this should always be set to one which it is the a equals field is the algorithm that is used to generate the signature RSA - sha-256 is a common value though you may see the less desired RSA - sha-1 this is not preferred due to weaknesses associated with that algorithm in this case we have sha-256 the C equals field is known as the canonicalization algorithm this indicates any modifications that may be present within the email such as white space or line wrapping it's not uncommon for mail servers to make small modifications to a message when it's in transit this will specify whether or not such modifications will be acceptable a value of relay next means that these kinds of common variations in the message will be acceptable and will not invalidate the signature this algorithm removes all whitespace at line endings and replaces all whitespace would then align with a single space extra empty lines at the end of the message body are also removed a value of simple means that these variations will not be tolerated and that the signature will be invalidated in this case we have relaxed D equals is the domain owned by the sender and s equals specifies the selector together this is used to locate the public key via a DNS text record in this case the domain is inside Apple Apple comm and the selector is inside apples zero five one seven the T equals is an optional field and is the signature timestamp that should indicate the time the message was sent the format is in UNIX epoch time the value one five six five two eight three six six nine will convert to Thursday 8 August 20 1917 zero one zero nine UTC the bh equals is the body hash which is computed based upon the hashing algorithm in use and then encoded in base64 and finally the b equals is the hash computed from the header field specified within the H equals tag in this case that would be date from two message ID subject and content type this is one of the most important fields here and is also known as the D CEM signature itself it's also encoded in base64 okay great so now you know the anatomy of a DCAM signature so what do you do with your newfound knowledge well you could impress your friends or maybe use this powerful information to acquire a mate or you could just verify the information you see and verify whether or not the D CEM signature is indeed valid yeah let's go with that option now you could do this manually and in the description you'll find a link to a fantastic article by meta spike that explains this entire process by the way check out that website because they make some really cool email forensics software now please note however that in our example I have redacted the email address and replaced it with J doe at Mac com it was indeed sent to a Mac comm address but if you try to manually verify it you will get a D Kim mismatch I do encourage you to play around with us though and experiment with your own messages alright so moving along let's talk about the message ID field these are generated by the first MTA traversed by the message which is sometimes referred to as the mail submission agent or MSA even though we didn't previously cover that term now an MTA can perform the functions of an MSA but an MSA could also be a standalone server without full MTA functionality now bear with me because I'm going to get into the weeds a little bit on this one according to RFC 5 3 2 2 called internet message format this field quote provides the unique message identifier that refers to a particular version of a particular message the uniqueness of the message identifier is guaranteed by the host that generates it this message identifier is intended to be machine readable and not necessarily meaningful to humans a message identifier pertains to exactly one version of a particular message subsequent revisions to the message each receive new message identifiers ok so finding a repeating message ID within the same email system could be an indication of forgery right because they should be unique and should not be reused now it's also important to note that the RFC only states that message IDs should be globally unique and not the specifics of their construction the RFC notes that several algorithms could be used to generate the message ID and recommends quote a combination of the current absolute date and time along with some other currently unique perhaps sequential identifiers available on the system for example a process ID number on the left hand side of the at and then on the right hand side we would have the domain name domaine literal IP address of the host on which the message identifier was created so as such different hosts generate different message ID formats the reason is of course twofold number one the RFC doesn't require the use of any specific algorithm and number two even if it did the last time I checked there are no RFC police so just because an email provider is supposed to be compliant with RFC doesn't necessarily mean that they are now this could be helpful in detecting forged emails for example if a message was purportedly sent via a specific mail system but the message ID format within the message did not match the known format something may be a mess lastly the message ID could also be used to subpoena the email service provider in an attempt to obtain additional information about the sender of the message for example the IP address used to send the message and the customer associated with that address allocation during the time the email was sent okay two more items to cover next up is return path in short this is also known as the envelope sender address or bounce address and is not the same as the from address from can be set to any arbitrary value and while it will usually match the return path you may see a mismatch in forged messages and lastly let's talk about those X headers as I mentioned these are experimental or extensions of the normal RFC headers and they can vary greatly there's nothing to stop a mail provider from creating X - something to use for internal tracking or other administrative purposes so always take a look and see if you find anything interesting or a value perhaps one of the most widely used of these fields is X - originating - IP but again the name could vary this is sometimes used to store the IP address of the machine sending the message which as you can imagine can be quite helpful as an example Google does not populate this field with an outgoing email messages but office 365 does at least as of this recording as a side note office 365 also populates X - originate org which will contain the tenant verified domain name okay so now that we know the basics what do you think about this message is it legit well yes indeed it is but now let's jump over and take a look at our second and last example also purportedly from Apple and here's what that email looks like this one looks a little odd so let's investigate further if we compare it to our last legit Apple email starting with the bottom most received header we immediately noticed something odd I'm fairly confident that Apple doesn't use Rose Point apartments calm within their mail infrastructure let's look at the next received header it appears we now transition from the Rose Point apartments calm male provider which apparently is web hosting calm as you can see from this Digg DNS query showing the MX or mail exchanger records to what appears to be office 365 infrastructure associated with the recipient of the message the same is the case with the two received headers that follow this one additionally there are no 17 addresses to be found within any of these what about SPF related information in this case unlike the previous message we do see mention of SPF but it clearly says SPF equals none while Apple does publish SPF records as again seen with this dig DNS query we didn't see any reference to that within the legit message we previously analyzed that said take note of the three include statements you see here showing the SPF records specified for the previous email sending domain inside Apple Apple comm we can actually expand any of these and get the specific IP Sider blocks as you can see here within the SPF - TXN apple.com record the first include notice that they are all within the 1700 0/8 allocation as expected by the way notice the tilde all or - all as you may sometimes encounter this indicates a soft fail or a hard fail condition respectively soft fail means that the message will be accepted and marked if it doesn't match the record whereas hard fail means the mess should be rejected if it doesn't match you'll see a tilde in most situations what about D chem related information well in this case there is none so the message ID format looks to be different from the legit Apple email as well and of course it clearly reflects a non apple domain what about return path well in this case it appears to be Apple at Rose porn apartments com which is clearly not an official Apple bounce address and lastly what about the X headers well in this case we have no X originating IP or X originator org they don't appear to be present so what do you think about this message is it legit yes totally but not now I'm kidding it's not legit it's completely obviously fake and of course in this case they didn't even really try very hard but I have seen some fake emails that really go the extra mile and spoof a lot more of what you see here so this one was admittedly an easy example but at least you're getting that repetition of what to look for when you analyze these headers and remember starting at the bottom going up those received header fields are among the most important things that you're going to be analyzing over and over and over again by the way one last note as we wrap up here Demark also known as domain based message authentication reporting and conformance is an extension of both SPF and DKIM it can be configured such that an action can be taken based upon the results of the SPF and DKIM validations I'll include a link in the description if you'd like to know more about it pretty sure if it didn't get a mention here I'd hear about it in the comments so in closing please remember that technically everything with an email message can be forged or otherwise tampered with the only headers you can truly trust are those placed by MTA su trusts such as those under your administrative control that said DCAM will provide a cryptographic mechanism by which you can verify the integrity and authenticity of a message and is a noted exception to that rule I hope you've enjoyed this look at email header analysis and email forensics be sure to subscribe to the channel if this kind of content interests you and check out the episode guide at 13 cubed comm slash episodes for a complete listing of all of the content so as always thank you for watching thank you for subscribing and I'll see you in the next episode you
Info
Channel: 13Cubed
Views: 67,356
Rating: undefined out of 5
Keywords: forensics, digital forensics, DFIR, how to analyze an email header, how to analyze an e-mail header, how to read an email header, how to read an e-mail header, how to read email headers, how to read e-mail headers, email header analysis, e-mail header analysis, email forensics, e-mail forensics, message forensics, trace an email, trace an e-mail, MUA, MTA, MSA, SPF, DKIM, DMARC, Message-ID field, Message ID field, Metaspike, Hillary Clinton email, Hillary Clinton e-mail
Id: nK5QpGSBR8c
Channel Id: undefined
Length: 22min 59sec (1379 seconds)
Published: Mon Jan 13 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.