This article provides practical advice for European electronic discovery projects for litigation support and IT professionals. Collection from a technical perspective, there is little difference between collecting electronic documents in the U.S. and within the EU. Products such as EnCase® can be used to create either a physical (bit-by-bit) or logical (file-by-file) collection of a hard drive. The primary differences are typically cultural. In some EU countries, a representative from the client’s HR department, the local Works Council, the government or all three may be present while data collection is performed.
In many EU countries, the collection schedule must be planned for normal business hours. Holidays, evening and weekend access typically must be scheduled far in advance – and may not even be permitted depending on the country. This reality limits the time window for data collection to about six useful hours per day. A key to improving collection efficiency begins early in the planning process. A project manager from the legal team must work with the client to identify all IT staff who may be necessary to support a collection effort.
One of the issues with EU electronic discovery is how non-English language characters are interpreted by the processing and review applications. Computer terms often have a magic ability to glaze over the eyes of a legal professional. These three terms – ASCII, ANSI and Unicode- are important to understand, as they are methods by which computers translate human language into computer language.
Prior to Unicode, data was stored and displayed using either ASCII or ANSI. ASCII was used on the first IBM PCs as a standard for English character sets. ASCII could be extended to support non-English character sets, but there was no standard for doing so. This made it virtually impossible to send electronic documents internationally. ANSI later provided standards for extending the ASCII character set to support non-English characters. By installing the appropriate language pack on a computer, ANSI made international electronic document transfer possible.
Unicode is an open standard managed by the Unicode Consortium, a non-profit organization that is attempting to provide a universal numeric representation for every character of every language on the planet. For example, Unicode allows a person using English as his or her default computer language to receive an e-mail written in German and the German language characters will properly display. Here are a few obscure, and not so obscure, items to know about Unicode:
Microsoft Windows NT, Windows XP and Vista support Unicode
Microsoft Windows 95, 98 and ME had limited Unicode support
Microsoft Office 97: Microsoft Word, Excel and PowerPoint supported Unicode, Access
and Outlook did not
Microsoft Office 2002/XP added Unicode support for Access and within message bodies in
Microsoft 2003 provided full Unicode support in Outlook 2003 as long as the user was on a Microsoft
Exchange 2000 e-mail server
So what does this mean to the litigation professional? First, the technology used for processing non-English electronic documents must be able to input electronic documents created in ASCII, ANSI and Unicode standards, interpret and correctly extract the text and metadata and store this information into a Unicode compliant database. Second, the indexing and query engine must be able to support Unicode in order to execute searches. Finally, the review application must be able to store and display Unicode characters.
Due to the nature of EU business, it is common to find multiple languages within a single e-mail message due to message threads. Because of this, the review team needs to be able to identify documents by primary language type in order to assign documents to the correct reviewer. Most technologies that identify languages within documents use one of the following methods.
For speed purposes, the technology will scan a limited number of lines in the document and assign a language code to the document. For improved accuracy, the technology will identify a dominant or primary language based on the language found within a given document that has the greatest percentage of use. Ideally, the technology should also be able to identify and store in a searchable field all languages found within a given document.
Searching diligence must be applied when designing and executing keyword searches. Besides the need for vetting by someone fluent in the language, there are a number of items to remember that can make a significant difference in accuracy. For basic keywords, as in English, alternative spellings should also be tested. For example, in German, test terms with “ss” in addition to “£” or “ue” in addition to “ü.” In languages like French, test alternates to characters such as “à” or “è” by using “a” or “e.” Due to spelling differences, British English keywords must be designed with equal diligence as German or French by American English speakers. A British English document might state “I recognise the theatre by the harbour” while the same sentence in American English would be written as “I recognize the theater by the harbor.”
It can be confusing how dates and times are handled by a computer. October 12, 1954 can be displayed
on an U.S. English computer as 10/12/54 but can be displayed on a Spanish computer in the European
format of 12/10/54. Executing date and time filtering within metadata, such as the sent date and time on an e-mail, is fairly simple as all date and time metadata should be standardized by the processing technology into GMT (Greenwich Mean Time) format. However, searching for dates within the body of a document may require that both date formats be used to ensure documents are not missed.
Designing searches for currency values sometimes requires a great deal of patience and iterative testing. Some currency identifiers are simple to find in searches, such as USD, $, GBP, £, EUR or ?, . However, French francs are typically identified with the letter “F” as in 131,51 F and the Danish kroners are identified with the letters “kr” as in kr 131,51. Searching for the characters of “F” and “kr” will yield an unusable number of false search results. (The French franc was replaced by the Euro in 2002.)
Also, did you notice in the examples that the currency values contained a comma? Most currencies use the same decimal and thousands separator that the numbers in the country use, but this is not always true. In
some places in Switzerland, they use the period as a decimal separator for Swiss francs (Sfr. 131.51), but then use commas as the decimal separator everywhere else (131,51).
Searching for specific numeric values is slightly less complicated than currency, but still requires fortitude. For example, consider the simple thousands separator. In the U.S., this character is a comma (,). In Germany, it is a period (.). Thus one thousand and one is displayed as 1,001 in the U.S. and 1.001 in Germany. In Sweden, the thousands separator is a space and 1 001 would be displayed.
The decimal character is equally as interesting. In the U.S., this character is a period (.), while in Germany, it is a comma (,). Thus one thousand one and eight tenths is displayed as 1,001.8 in the U.S. and 1.001,8 in Germany.
Processing and Review
In some situations, the court in the local EU country will only allow potentially responsive, relevant
documents to be transferred to the U.S. The court may require that processing and document review be
conducted either within the country or at least within the EU. This may also be part of a consent agreement, whereby the employee only agrees to the collection as long as those items agreed upon as “private” are removed prior to transfer to the U.S. This situation is typically managed by use of a hosted solution using reviewers in the EU or a war room set up within the country.
For small numbers of custodians, it can be efficient to perform first-pass review for personal information/privacy, relevancy and privilege within the country using local, native speaking attorneys. This ensures that cultural nuances of language usage, such as slang, are recognized. When first-pass review is being performed onsite, it is necessary to have an attorney present who has the authority to make immediate decisions regarding document coding. To improve the defensibility of the onsite activities, ensure that complete and precise documentation is maintained of how onsite processing, searching and reviews were performed.
Bringing Data Back to the U.S.
Unless the amount of data is very small, the data will need to be either physically shipped via a reliable
carrier or personally couriered. While not quite as safe as being carried by a courier, a shipping carrier
can often be used to physically ship data from the EU to the U.S. While uncommon, packages can be lost or delayed for short periods of time in U.S. Customs. If time is not on your side, then a courier must fly from the EU to the U.S.
In a pre-9/11 world, the primary concern of physically transporting data collected in Europe was the risk of damaged media. The new additional risk is one of seizure. One of the many tools provided to the Department of Homeland Security to combat terrorism was the power by the U.S. Immigration and Customs Enforcement division to seize and search computers and documents transported into the U.S. A number of recent news stories have highlighted how travelers into the U.S. are having their computers seized.
There are two primary steps to follow to reduce transportation risk. First, ensure that at least one additional set of media resides within the EU at either the office of your firm, local counsel or vendor. Second, ensure that the media, either shipped or couriered by a person is accompanied by documentation from the law firm.
Translation Machine language translation is often useful to assist English-only speakers with the substantive issue coding by the U.S. legal team. A limited number of vendors have the ability to perform machine language translation of acceptable quality that meets the needs of document review. An even smaller number of vendors have document review applications that allow the reviewer to submit a block of text for just-in-time translation. Regardless of the approach used, machine language translation is a literal translation with accuracy similar to the early days of OCR.
The benefit of using machine language translation is that the number of exceptions that must be reviewed can be significantly reduced. Regardless of which of these methods the legal team uses, have a sample of data from your matter processed by a vendor and evaluated for acceptability by the review team prior to being committed to any one solution.
Documents that will be presented in depositions or as exhibits in court should be translated by translators certified by the American Translators Association, who are also court certified. This provides the highest quality of translation; ensuring language nuances such as idioms are translated as they are used instead of relying on a clumsy and inaccurate literal translation.
This article is only a brief summary of EU electronic discovery. In any matter involving international electronic discovery, legal teams should retain local council. If the U.S. legal team does not have experience with EU electronic discovery, a consultant or vendor with expertise and references in this arena may be critical to executing a defensible and efficient discovery project.