This article discusses the benefits of using backup and archive tapes as an evidential source in investigations, litigation and regulatory enquiries. The document outlines the difference between back-up and archive data, why tape is used and the challenges it can pose to a modern IT department. It then discusses the benefits of using a non-native restoration solution, the reduction in cost that this delivers and other uses.

Overview
Electronically Stored Information is becoming central to virtually all types of corporate investigation be they litigation, regulatory or audit driven. The adoption of email as the standard communication and document distribution mechanism, coupled with its exponential growth, has increased focus on stored data like never before.

When conducting an investigation, where you look for that information can be critical to both the success and cost of the project. Accessing data from the ‘live’ computer systems is the most obvious place to start. However, where an investigation requires copies of emails and/or documents from the past – usually anything more than 3 months or more – then live computers cease to provide good returns.

Where older emails and documents are required, then the best – and often only – place to look is on backup and archive tapes. These tapes are routinely used by virtually all business to retain archive copies of their critical business information – often for many years. They provide complete and regular snapshots of a business, with copies of all recorded communications and documents, over an extended timeframe.

Using conventional methods familiar to an IT department, tape can be difficult and expensive to process and is often not considered as an evidential source. This can be especially so in litigation where the perceived cost is often cited as being disproportionate to the value of the information likely to be uncovered.

However, there is an alternative way of accessing information stored on tape that can quickly and cost effectively produce the information being sought. Non-native restoration means that data can be restored directly from tape without needing to use the backup software that initially created the tapes. Furthermore, the non-native solution can also remove the need to restore tapes in any sequence and also allows individual tapes from a set to be restored.

This means that much of the cost and complexity of accessing data on tape is removed, making legacy and archive data much more attractive as an evidential source.

Tape use in the IT Infrastructure
Ever since the introduction of the delete button there has been a need to keep copies of data stored on electronic systems. Today, the need to keep copies of critical business data is more important than ever, with companies generating ever increasing amounts of information every day. In particular, the adoption of email as the de-facto inter-personnel communications tool has led to a massive increase in the amount of documents being created and in the amount of electronic storage space needed to keep these.

The regulatory landscape is vastly different in the post-Enron era, sharply focussing the need to retain records and ensure that they can be easily accessed.

There are two fundamental reasons that a business keeps copies of critical data. Firstly, regular backups should be taken because a business needs to know that if some unplanned incident interrupts normal working, be it an accidental deletion of a file, a system crash, component failure or a catastrophic event, then they will quickly and efficiently be able to recommence operations using very recent copies of that critical data.

The other reason for taking copies of business data is to retain them, long-term, as archive copies for legal and regulatory purposes. Essentially, archive copies of data are full backups taken at set periods – weekly, monthly, quarterly etc. which are then stored away.

In both of these cases, once the copy of the data is taken, the tapes are usually kept in a storage location away from the main area of operation. This could be a separate building or could be a different location completely. The reason for doing this in the case of backup data is to ensure that the copy is protected from damage if the main site is hit by a flood, fire or other invasive incident. In such a case the backup could quickly be retrieved and be used to get the business operational in a very short space of time. In the case of archive data, the tapes are moved to a separate site, both for protection and also because they can take up a considerable amount of space.

To retain either backup or archive data, a storage medium is required that is capable of holding high volumes of data, can be reused regularly over a long period, can easily be transported and is sufficiently low cost to allow multiple copies to be made and retained long term. Magnetic tape meets all of these requirements providing a highly durable, portable, low cost bulk storage medium.

In summary, tape is routinely used to retain copies of all business documents and communications that are stored long term and away from the main place of business.

What Types of Tapes Exist Within Your Organization?
To make a copy to tape of the business documents and information stored on computer essentially uses three components. The actual configuration and complexity of these systems can vary greatly depending on the scale of the company and the volumes of data involved, but essentially these components are a tape, tape drive and backup software.

The backup software is used as an interface between the computer itself and the tape drive and copies the information from the computers disk to the tape in a form that makes it possible to read back – or restore – when required.

This process, with such straightforward roots, has been complicated in a way that probably only the IT industry can achieve.

As the market for backup and restore products started to grow the IT industry spied a cash cow that it simply could not ignore and many different manufacturers brought out a stream of new and different format tapes, drives and of backup software. As the volume of data that needed to be backed up also grew – especially as email gained a hold as a communication standard – then the pace of new product development and release accelerated. Companies needed – and were actively encouraged by their IT suppliers – to reduce the time spent backing up data (the “backup window”) to limit the time data was unavailable due to this process.

Since many of these products were produced by companies in direct competition with each other there was virtually no commonality between them – and often a single company would release software that had no ability to use the data created using earlier versions.

In order to chase an ever narrower backup window, users would often switch software or tape technology regularly – with little or no consideration to the need to access the data stored on an ever growing pool of tapes stored off-site. When new backup technology – drives, tape or software – was employed, the old was often retired without thought.

The last 10 years or so have also seen an appetite for corporate mergers and acquisitions greater than any previous period. This means that companies have regularly joined together and in the process have merged their archive and legacy data – almost exclusively stored on tape.

What all of this means is that there are millions of tapes stored by companies in archives that hold data that is still relevant and still within retention periods, making it absolutely relevant to current litigation and other investigations. These tapes have been created using a huge number of different and incompatible systems which have long since been retired by their owners – making vast swathes of this data pool inaccessible.

Additionally, since the creation of tapes and their movement off-site is a back-office function often carried out away from the main place of business then users are often completely unaware of their existence. This means that they will often take only sufficient steps to remove evidential traces from their own systems, oblivious to the fact that regular copies that can evidence their actions over time are being retained and stored for years.

Where an investigation needs be covert, may involve a member of the IT staff or needs be carried out without being general knowledge, then the fact that tape is routinely stored off-site at a third party vault means they can easily be collected and investigated without anyone outside of the immediate investigation team ever knowing.

Where current or recent email is required, convention dictates that an image of the ‘live’ email system should be taken and interrogated and this is a reasonable course. However, not only might the most recent backup be easier and more cost effective to restore, if imaging is intrusive and causes too much disruption to the operation of the business – or the investigation needs be covert or low key – then as above, tape that has been stored for DR purposes presents an ideal alternative.

Accessing Data Off Tapes in Legal Scenarios
A typical scenario might be that the legal team in a large financial institution requests a single copy of eMails sent or received by each of 10 individuals during the period 1st January 2002 to the 31st December 2004. The only mails that are required are those that contain any of 35 words or phrases and the mails must be returned to legal within 20 days. All that is known is that the mail is in Exchange 5.5, would be spread across 3 servers which were backed up using a Veritas product and Arcserve and the tapes include DLT III, LTO and DDS formats. A total of 350 tapes exist across the required period.

This would pose multiple challenges to virtually every corporate IT department.
Do we have all of the skills necessary to carry this out in house?
Are the various drives available to process the tapes?
Can we make sufficient spare storage available hold Terabytes of extra data?
Can we get that volume of data off the tapes in a short space of time?
Do we have the software, licences and skills to rebuild the old Exchange?
How do we search through this volume of data to locate all of the mailboxes?
There will be huge numbers of duplicate mails, how do we cope?
How can I search for specific words and phrases?
Most importantly – where is the budget coming from?
Non-Native Data Restoration
The key to cost effectively accessing this evidence-rich media and of removing proportionality as a counter-argument lies in adopting an unconventional approach.

Rather than assuming that the only way of restoring from the tapes is to use the original – or native – environment, eMag Solutions have developed a Non-Native Restoration Engine that removes the need to employ any of the original hardware, software or computer system when restoring legacy data.

Over 20 years development has delivered a solution that can restore directly from tapes with no consideration for the tape type, history or backup systems used. Additionally, this restore engine can selectively restore just files that contain emails or the document types in which you are interested.

Additional features mean that the engine can recognise the dates on which tapes were created which can quickly identify which tapes are likely to hold the information you need and can then restore them without needing to know their sequence or any additional system information. Since the data is being restored outside of its original environment, the need to use any set passwords is also removed.

As files are restored from the tapes, the engine incorporates a sophisticated system that recognizes if a particular file or email already exists and then removes it (retaining a record of its existence).

Only emails from people in whom the investigation are interested need be retained and these can then quickly be searched for specific words, phrases or combinations of both. Furthermore this communication record can be presented pictorially to show patterns and volumes of email traffic to quickly profile the way a user communicates and to show irregular or unusual contacts.

This means that you can receive for review just a DVD or disk with a single copy of each email or document which meets your very specific search criteria.

The time and cost that this saves can be massive, removing any financial argument that the other-side might use to prevent this superb information source being used.

The Cost Benefits of Non-Native Restoration.
Using non-native means of restoring data from tape can significantly reduce the cost of accessing archive data both in terms of the time it can save and the resource and skill required to deliver the required information. The areas where the use of non-native solutions can avoid cost include:

Software & Licences: The data being restored may have been created using software that has since been replaced or retired or may have been an older version of a product still in use. The user performing the restore may also opt to create a new system separate to the main ‘live’ IT environment to receive the data. In these cases, the user may have to purchase new software or additional software licences before this can be done.

Hardware: Similarly, the archive data may be on a type of tape that has been superseded and is no longer in use. In this instance the user would have to locate and purchase suitable drives – and enough to restore large volumes in a specific time. If the decision is made to create a separate restore system, then additional server and systems may also have to be found or purchased.

Storage: There is likely to be a very large volume of data that has to be restored before any specific information can be found. It is unusual for a company to have this amount of spare storage capacity available and will normally have to purchase new capacity. It is likely to be unused once the restore project is complete.

Systems: Using conventional methods, the environment in which the data was created will have to be rebuilt. This means that an appropriate (for example) Exchange or Notes environment will have to be created to receive the data. It is also probable that new software or utilities will have to be purchased to assist with things like the selection of mail for certain people, removal of duplicate mails and documents and the selection of mails that contain key words and phrases.

Skills: The restoration of archive data and the extraction of specific information for use in possible legal proceedings require a skill set not usually present in most corporate IT departments. Additionally, it is likely that the continual upgrade process that corporate companies engage in will have seen changes to the hardware, software and operating environments in use. All of these mean that additional skills will have to be acquired or brought in before a restore project can commence.

Time: When a request is made to access electronically stored information and produce selected emails or documents it is likely that the results will be needed within a defined timeframe. To identify the data sources, build a suitable restore system, recreate the original environment, restore the data, search through the results and identify the responsive data will take significantly longer that the time allowed. The cost to the client of that time and of missing deadlines will also be significant.

Other Uses for Non Native Restoration
Non-native restoration provides a means to liberate data that would otherwise be stored on tape in an inaccessible format. This makes the stored information available. This document has concentrated on the use such information in litigation and regulatory enquiries, but the rapid and cost effective access to this media has a number of other key business benefits.

Internal & External Audit: It is now routine for Audit Teams to fully embrace the availability of retained data within their remit. They will now insist on seeing that not only retention policy information, but also evidence that all data retained by an organisation can be accessed on demand and that these access methods are regularly exercised. Using a conventional approach, this means that systems capable of restoring from all tapes held within an organisation need to be retained and maintained along with the necessary hardware, software, licences and skills. This will include all tapes on-site, in storage and those that have been inherited via M&A activity.

The non-native solution from eMag not only provides immediate access to data on virtually any tape, it also replaces the fixed overheads of maintaning legacy access with operating cost incurred when needed.

Backup Software Replacement or Retirement: When an organisation changes the corporate backup system – or inherits one via M&A, then it is often difficult or impossible to consolidate the existing archive data into the new system. As a result, licences for the older or retired may need to be retained along with catalogue and backup servers etc. to ensure that access is retained in case the data is ever needed.

The non-native solution from eMag overcomes the problems normally associated with a change in backup software. Not only does the system provide immediate access to data backed up in the older or retired format, it also provides the ability to interrogate and adopt management of the catalogue of that system. This means that all aspects of that retired system can be removed.

E-Mail Vault Population
An increasingly popular method of helping ensure compliance of corporate email systems is to implement an email vault. These help retain all future email messages in line with retention policies and also remove some of the storage overheads associated with email. However, what these systems cannot do is to apply those policies to emails that already exist or that are held in archive.

eMag provide the means to directly access email held on archive tapes and then produce a copy of each unique email – removing duplicates – and present this for ingestion into the email vault of choice.