Gmail Retention and Your Privacy


By Google’s own count there are more than 5 million companies that now use Google Apps for Business. This comprises of Fortune 500 companies, education institutions, government bodies, etc. Each of these organizations will have multiple accounts with, potentially, thousands of users that frequently sign in to Google Mail.

I’m sure none of this is news to anyone. I’ve been a Gmail user since the closed beta started back in 2004 and will continue to use it for the foreseeable future. In my humble opinion it is the best email experience around.

At work we are often asked to collect webmail accounts for users embroiled in some legal matter or other. We are typically provided with the user’s credentials and then download the entire account using forensic software.

In the last couple of years we’ve seen this kind of request spike as more individuals and organizations move to a more central and (I hate to say it) cloud-based infrastructure.

While conducting some research on Gmail a few days ago I happened upon a feature with which I was unfamiliar, Google Apps Vault. I’m going to assume you haven’t of this either so let me give you Google’s own explanation:

“Google Apps Vault is an add-on for Google Apps that lets you retain, archive, search, and export your organization’s email for your eDiscovery and compliance needs.”

This sounds incredibly simple. It allows an authorized administrator to search, filter, and export the mailbox of any user in the organization without the need for that user’s credentials. For companies such as ours this is wonderful. Often users will stone-wall us with credentials making collection of mailboxes virtually impossible. Other times we are left waiting for hours, or even days, for the user to grant us access to their account. This can make for some very tight deadlines. It appears that Google Apps Vault can really make a difference to the way we conduct webmail collections.

The service is $5 per month per user but Google offers a 30 day trial.

As I was keen to try the service I immediately signed up for the trial for testing. Once signed up I was asked to visit ediscovery.google.com to use Vault.

The tool itself is simple to use. You can set up retention rules, create legal holds, set up various legal matters on a custodian-by-custodian basis and also generate reports. This is certainly eDiscovery made easy.

I decided to run some tests against my own hosted email account. I started a new matter and then searched for all email in my own account from the last week. I chose to export all of the results.

Google offered four files for download:

  1. CSV of the results
  2. XML file with the email metadata (including Gmail tags) for each message
  3. Zipped MBOX with the resultant messages
  4. MD5 file with hashes for the above three items

I downloaded each of these and loaded the MBOX file into mboxview (http://mbox-viewer.sourceforge.net/) for review.

Here is where I started to see some items of interest. Bear in mind that this was all run retrospectively which means that I hadn’t set up any retention rules or placed a hold on my account. This was simply run “as is”.

The first thing that I noticed was that there were several items that I had deleted. These were not items that I had simply placed in the Gmail “Trash” but email messages that had been permanently (or so I thought) deleted. I’m not naive enough to think that once you delete something that it is gone forever (I work in digital forensics after all) so I was not surprised to see these deleted items, especially from a tool such as this given its overall purpose. I did, however, find something that completely shocked me.

Last week I composed an email. I was trying to address a delicate situation and so finding the right words often takes several attempts. I wrote and rewrote the email several times before finally giving up and discussing the issue face to face. The message was never sent and was immediately discarded from the “Drafts” folder in my Gmail so imagine my horror when the downloaded MBOX contained this message. Not only that but nineteen different iterations of the email were saved and available for download from my account. Each iteration had a slightly different time-stamp associated. As I reviewed them in sequence I could see where I had written a something, then gone back and changed it upon review. So I had nineteen iterations of an email message that I had never sent or actively saved in my account.

Moreover I found that this was not only the case for unsent and discarded email. This was indeed the same for every email message that I had sent during that week. I could trace how each one of my emails was formed and edited before sending them on to their eventual destination. In one instance I found over 40 versions of the same email message.

Can you imagine a company executive writing an email message on a sensitive topic and then reviewing it several times before sending? Can someone really be held responsible for something they’ve written if those portions of the message were never actively saved or sent to any recipients? That argument is certainly for the courts to decide but it should worry everyone else in the mean time. The potential consequences are quite staggering. Why does Google need to keep this data? Why does unpublished/deleted material need to be discoverable?

Now, I understand that Gmail for businesses is handled somewhat differently to that of the ‘standard’ Gmail account but I have to wonder if Google stores this data for business accounts, is there any reason for it not to store the data from standard, free accounts? What data exists on GMail servers that you thought was long gone?


3 responses to “Gmail Retention and Your Privacy”

  1. Your post definitely makes one wonder what is out there that can be retrieved that you may have never thought possible. By the same token, I had read some time ago (I forget when or where exactly), that terror suspects were possibly writing instructions in cloud based email services, never sending them, but leaving them in drafts for their minions to read and act upon, and I suppose possibly delete later without a trace of it being sent or read. Is it possible that Google made those “discarded” drafts available for just that, or a similar reason?

Leave a Reply to Jeffrey A. Katz Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.