Basic Email Analysis

Despite ever increasing security, it appears that email accounts are breached more often now than ever before. Quite possibly this comes down to the user as opposed to the email service itself. The focus of this article includes distinguishing between spam sent from an unknown account and illegitimate emails sent from a known account; a look at some of the technologies in place to reduce spam; and potential courses of action in the case of unauthorized account access.

What is an email anyway?

Let us start of at the beginning – we all use email every day, but sometimes we gloss over how it works. An email, in its simplest form, is just a text file – this file is usually broken into parts – a header (which contains information such as the sender’s email address, the recipient’s email address, the subject, etc), a body (might be more than one part – e.g. one section in plain text and another in HTML), and possibly an attachment (usually not plain text).

Each section of an email is separated by a boundary – a unique string that should not be found elsewhere in the email. Attachments are encoded in base64 (which tends to increase the size by about 1/3), which helps to maintain the integrity of the data (i.e. a binary file which, as text, contains uncommon characters, might be damaged as it passes through various systems). The base64 string is also typically split into 76 character lines.

How does an email get from the sender to the recipient?

On the sending server, there is a special application running that is called a mail server (e.g. postfix, sendmail, exim, qmail, etc). This program typically performs several functions, the main one being the dispatch or delivery of mail. Most of these servers are capable of modifying the email headers to some extent, either directly, or by calling external scripts, more on that later however.

The sending server will determine the domain of the recipient (from the To header). If the email is being sent to a domain that the sending server serves, it will deliver the mail internally. Otherwise it the mail will be sent to a mail server running on the recipient’s domain. Getting to the recipient domain is much like getting to a website – components of the URL are looked up, from right to left, until the entire domain has been parsed and a final address obtained.

Once the email arrives at the server that will deliver it to its final destination, the mail server will look up the username, and deliver it to that individual’s mail box. This can be as simple as saving the file in the correct folder on the server. Some programs will also update an index of the user’s emails to speed things up when the user requests a listing of the emails.

Validating email integrity

Now, the basic model of how email moves sounds simple enough – lookup the domain, send to the correct server, and the server puts the email in the correct mailbox. However, getting to the domain is hardly a direct process. Much like delivering ‘snail mail’, an email usually passes through many servers on its way to its final destination. This is not uncommon, as almost all data on the internet will pass through multiple servers on its way to and from its destination. With an email, one of the problems is that headers are sometimes modified on the way through. While in some cases this can be a good thing (i.e. providing a list of all the servers the email has passed through), in other cases it can be quite bad (altering a header for nefarious purposes). Likewise, the contents of an email can also, theoretically, be modified in transit.

One approach to mitigating this problem is to encrypt the email, using PGP or S/MIME, however, this is not typically available through mainstream email services. (Gmail does however, have a lab that allows verification of PGP signatures, which might be a step towards encrypted email.)

The notion that the email received might not be the same as the email sent raises a question of trust and credibility. How can one be certain that the email was not tampered with en route? One feature that has become increasingly common on large mail servers is to ‘sign’ the email with a encoded string. Essentially, a hash (a numeric representation, which differs with the smallest change to the input) is generated for email body and some headers, and is then encrypted on the sending server. A public key is made available (as a DNS entry) to allow a receiving server to decrypt the hash. Only the sending server though, can encrypt the hash, using its private key. The receiving server will then calculate the hash of the email body and headers, and if it matches the decrypted hash, is able to verify the integrity of the email.

There are two common implementations of this, DKIM (DomainKeys Identified Mail) and DomainKeys. In both of these implementations, the essential outcome is that the server, and by proxy, the sender, is taking responsibility for the contents of the email (since its validity can be proven). DomainKeys is used by Yahoo, however, most new adopters favour DKIM. DKIM which is based on DomainKeys, is used by Gmail and Yahoo, and is gaining popularity with other servers. One thing to keep in mind though, is that these methods do not verify the recipient – it may be possible to alter the recipient without modifying the rest of the email, and have the email successfully validated.

A DKIM signature looks like the following;
pre>DKIM-Signature: v=VERSION; a= HASHING_ALGORITHM; c= CANONICALIZATION_ALGORITHM; d=DOMAIN; s=SELECTOR; t=TIMESTAMP; bh=ENCODED_STRING; h=LIST_OF_HASHED_HEADERS b=ENCODED_STRING

A DomainKeys signature looks like the following:

DomainKey-Signature:a=HASHING_ALGORITHM; q=QUERY_TYPE; c=CANONICALIZATION_ALGORITHM;
  s=SELECTOR; d=DOMAIN;
  h= LIST_OF_HASHED_HEADERS;
  b= ENCODED_STRING

The policy record for DKIM, from the DNS, can be retrieved with:

dig +short -ttxt SELECTOR._domainkey.DOMAIN

In both cases, there are some additional tags that may be included, and some that may be omitted. Certain sites (e.g. Gmail and Yahoo) will identify a signed email. Gmail displays a ‘signed-by’ line if you ‘show details’ (for DKIM signed messages), while Yahoo places an icon (email with a key) beside the sender’s name (for DomainKeys signed messages).

The results of checking DKIM and DomainKeys signatures can be found in the ‘Authentication-Results’ header, which usually resembles the following:

Authentication-Results: MAIL_SERVER  from=SENDER_DOMAIN; domainkeys=DOMAINKEYS_RESULT (DOMAINKEYS_NOTES); dkim=DKIM_RESULT (DKIM_NOTES); spf=SPF_RESULT (SPF_NOTES) smtp.mail=SENDER_USERNAME

Verifying the sender

Since email headers are simply lines of text, they can easily be forged. It is possible to create an email specifying any sender (Return-Path header), regardless of whether sender is on the domain of the mail server, or even exists. This is one way in which you can receive an email seemingly originating from your own email address, despite having not sent it.

Clearly this is a significant pathway for spam emails. Keeping in mind that any computer can be setup to send email (you just need to run a mail server after all) – being able to specify any sender is quite a problem.

One defence against this is to check which servers are allowed to send mail for a given domain. The idea being that each server has an (IP) address, and list of accepted addresses can be specified in the DNS. In some cases, it is desirable to restrict the acceptable addresses to a single address (e.g. for a small website), in other cases a range of addresses may be required (e.g. a large site), or in some cases no restrictions will be set since mail is handled for many, varying, domains. Of course, some sites which should set restrictions do not, but that does not reduce the effectiveness this method substantially.

These restrictions are called Sender Policy Framework (SPF) records. Typically the SPF record is not shown to the email recipient, but, as it is a DNS TXT record it can be obtained easily enough. For instance, Gmail’s SPF record can be viewed by running the following:

dig +short -ttxt _spf.google.com

Email servers that check SPF will typically add a new header (‘Received-SPF’) to the email, specifying the result of the SPF check and what was checked.

The structure of this header is as follows:

Received-SPF: RESULT (MAIL_SERVER: SPF_NOTES)

SPF_NOTES typically includes the sender’s domain and a message stating whether or not the sender’s address is permitted to send mail for that domain. For instance:

MAIL_SERVER: domain of SENDER_EMAIL designates IP_ADDRESS as permitted sender

Hotmail/MSN/Live.com use a variation on SPF called Sender ID, which accomplishes a similar task, albeit through a somewhat different implementation.

Common Spam filtering techniques

There are a few fairly common techniques implemented to flag spam, although, the spam filtering implemented by large mail servers is becoming increasingly personalized. Some of these, include:

Blacklists – a maintained list of domains/addresses which are known to have historically sent large volumes of SPAM
Whitelists – list of senders who are never to be flagged as spam (e.g. individuals in contact lists)
Algorithms – scripts (e.g. SpamAssassin) running on a server to calculate the probability of an email being spam, based on predetermined rules

Different sites differ in their marking of spam emails. Some add additional headers (e.g. X-AntiAbuse, X-Spam, etc) to mark an email as spam, while others do not modify the headers. Yahoo will add a ‘bulk’ header (X-YahooFilteredBulk) if it finds that the originating IP is blacklisted.

Tracing an email

Each server through which an email passes should add a ‘Received’ header to the email. This allows the recipient to trace the email all the way from the sender. Each server typically writes its own domain and IP address, the server name/domain and IP address it received the email from, the protocol used (e.g. POP, SMTP, etc) and the time. Given the originating server and the IP address of the sender, it should be possible to trace the complete path an email has taken. Additionally, each server domain should map to the IP address specified.

To verify that the domain matches with the IP, you can run:

dig +short DOMAIN

Alternatively (on Windows), the output of both ping and nslookup will provide the IP address

Distinguishing between a breached account and spam

Consider a scenario where you have received an email from a known contact, but that contact did not send the email. The question arises as to whether the email contains is simply spam (forged headers) or was your contact’s account breached?

A quick glance at the From, Return-Path, and Sender headers, as well as at the Received headers should identify if the email originated from the domain found in your contact’s email address. For sender addresses on most large sites, you should be able to verify that SPF and DomainKeys/DKIM have passed, by looking at the Authentication-Results header. If not, it is a fairly safe bet that this is simply a case of spam – someone obtained your contact’s email address and sent an email in their name – annoying, but reasonably harmless. Unfortunately, probability is against this occurring. While it is easy enough to get the first email address, it is harder to link your contact’s email with you. Certainly not impossible, but the probability isn’t great.

On the other hand, it is quite possible that your contact’s email account was breached – someone gained unauthorized access to that account. Certainly, this could occur in a number of ways – a virus on your contact’s computer is a possibility, or simply an easy password that was used on other sites. Additionally, keep in mind that a breach of one of your email accounts can often lead to other accounts being breached (using the ‘Forgot your password’ links available on many sites).

In many cases, your contact could simply check their sent items, and might find the email there – that is a sure sign that the account was breached (however, since sent items can be deleted, not finding the email in the sent items doesn’t prove anything).

Some suggestions for dealing with a breached account

Check the sent items on the account. There is some chance that the emails present there If the emails are present, you might be lucky enough to find an IP address in the headers (typically ‘Received’) of the email, which you might be able to trace somewhat.
If the account is a Gmail account, check the activity log (at the bottom of the page there is a line that reads ‘Last account activity… Details’. Clicking Details should provide a listing of the last 10 accesses to the account – check that the IP addresses seem similar (i.e. they belong to her ISP, and if dissimilar, run a WHOIS on them (even dissimilar IP addresses might belong to the same ISP). If you find an access that you have reason to believe is unauthorized, note down the information (time, type, IP, etc).
If you confirm (or suspect) a breached account:
1. Change the password to the account
  - Should be, at very least, 8 characters, preferably around 10
  - Should not be a dictionary word, name, or other common string
  - Consider drawing a pattern on the keyboard or using ‘leet’ speak (i.e 1337 speak) if you feel you will not remember a purely random password
2. Verify that no information (e.g. alternate email address, secret question, etc) has been changed
3. Try and determine how access occurred (i.e. is there a virus on your machine, do you use the same password somewhere else, etc). There are a few likely scenarios:
  1. A virus (e.g.a keylogger) running on a computer used to access the account was able to capture the password;
  2. A very weak/easy to guess password
  3. A password used on another, less than reputable site
  4. Cookies that were hijacked (a recent firefox plugin has popularized this)
  5. Data was captured over an insecure connection (e.g. public computer)
4. Report it to the email provider (might be a challenge)
5. Compile a list of sent emails that you did not send and notify the individuals in question

Just how much information is encoded into an email?

I was recently received an email from someone using a Gmail account through Thunderbird. A quick look at the ‘original message’, showed a) that Thunderbird was used, b) the version of Thunderbird and Windows, c) the sender’s IP address (which easily provided the name of the ISP and the city of origin), d) the local time for the sender (and time zone), e) that the message was sent over SSL

A similar email sent through Yahoo’s webmail provides some information (IP address and the fact that the email was sent ‘via HTTP’), but not information about the operating system. Gmail however, does not include the sender’s IP address.

Moreover, for this particular email, the sender wrote the email in MS Word, and then pasted and sent it using Thunderbird. Due to the nature of the HTML created by MS Word, a brief scan over the body of the email revealed a few other bits of information: a) the sender has MS Word on their computer, b) the version of MS Word present c) a temporary path, which includes the sender’s windows username, d) and the language that was set in MS Word.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30