Industry Leading
eDiscovery Insight

Learn from renowned eDiscovery thought leaders


Learn More

Mobile Device Forensic Collections

Mobile devices are everywhere. Within the last decade or so, smart phones have become not only a staple in our society, but arguably a necessity. There are two major players in the smartphone game, Android and Apple. Android devices make up the majority share of the smartphone market followed by Apple. There are others out there, like Windows OS phones and Blackberry, but they make up an incredibly small amount of the total smartphones.

Forensic Tools used for mobile device collections include Cellebrite, Encase, Axiom, and Oxygen. Preserving data from a mobile device can be much different than from a computer, and forensic software and hardware companies, like Cellebrite, specialize in creating tools for mobile device forensic collections and analysis.

There are a few different types of collections you can perform on mobile devices. Physical collections are bit-by-bit preservations of mobile devices, including the unused space. Logical collections preserve the user data only and not the free space of the phone. FileSystem collections preserve just the file system, which is very similar to the Logical collection, but could possibly pull some different data. The last resort in mobile device collections would be a manual collection. This is often used when a forensic tool can’t gain access to the device due to age or operating system restrictions. A manual collection consists of going through the phone manually and literally taking pictures of each screen. Not very efficient, but it’s a last resort option.

The type of data you can pull from a mobile device can be incredibly beneficial to a case. Some of the most sought-after data sets include messages, call logs, contacts, and photos, but there are many other pieces of information that can be discovered within a smartphone, including location data, calendar events, notes, website/application credentials, and many other useful bits of data. As such, mobile device forensic collections are becoming a staple in litigation discovery.  To learn more about how Lexbe can assist with your next Mobile Device Forensic Collection, contact sales@lexbe.com.

Legal Depts are increasingly Relying on AI in eDiscovery

The 12th annual Law Department Operations (LDO) survey found that effective use of technology in law departments has been slow to accelerate. Artificial intelligence has been wildly growing among several markets, but is just beginning to saturate the legal market–however, it is expected to continue. In general, AI has the potential to introduce new sources of growth, changing how work is done and reinforcing the role of people to drive growth in business. As Reese Arrowsmith, vice president, head of legal operations for Campbell Soup Co. states, “I do see AI and other technology advances having an impact on the way lawyers work. New technology will augment ways of working and may impact who completes legal work. I also see the legal industry utilizing data analytics to make decisions and to predict outcomes.”

One survey revealed that nearly half of those asked, said their law department does not use AI at all.  And, when law departments do use AI, it is most likely for e-discovery and document review. Twenty-seven percent of respondents sometimes use it for these areas, and 7% always use it.

To download the full report, click here

Understanding your eDiscovery Index and how it finds (or misses) evidence

How your eDiscovery platform parses and organizes your electronically stored evidence can be the difference between finding or missing that smoking gun. Or worse, unwittingly handing a smoking gun to opposing counsel. Pulling back the curtain on how an eDiscovery platform ingests electronically stored documents and makes the text within documents searchable reveals hidden places where evidence may be hiding. This article explains indexing and breaks down the types of search indexes used in eDiscovery software platforms, discusses the pros and cons of each, and offers solutions to ensure that you never miss crucial evidence.

Indexing occurs during the upload of your documents to your eDiscovery review platform. A number of processes run which separates and organizes your data. The text, in particular, is extracted from your documents and filtered into a database or index. When you enter a search query your software does not review each document searching for the word; that could take hours or days. Rather your software refers to the index (just as you would in a textbook) in order to quickly pull the relevant documents for your review. The process by which the text is extracted from your documents to be placed into that index is critical to the quality of search results.

There are 2 basic indexes used in eDiscovery software platforms, an OCR Index or a Text-based (also called Native extraction) Index.

OCR stands for Optical Character Recognition. In this process, your electronically stored documents could be originally scanned or saved from a native document through a virtual print driver. Specialty OCR software recognizes alpha-numeric text patterns. For example, a Word doc uploaded would be “printed” within the software engine and the text that appears on that virtual print would be lifted off the page and indexed.

Text-based Indexing is also called Native Extraction Indexing because instead of processing the document as a printed page it rather looks at all of the underlying code and data within a document. Where OCR sees the document as a print, Text-based indexing lifts the hood and extracts all of the computer-embedded text in a file and additionally will capture the data that you do not see, such as comments.

The pros of one indexing approach are the cons of the other and vice versa. Specifically, an OCR-based index may miss hidden fields, such as hidden columns on an Excel spreadsheet, while a text-based index would not. Conversely, a Native extraction-based index will not read (index) the text on an image, including scanned or PDF’d documents, where an OCR index will.

This is an example of a native PowerPoint document. When you receive this doc as a .ppt file an OCR-based index would create a virtual print of each slide and lift any text that appears on that print for indexing. The embedded images with text, like this chart titled “Load Growth Model”, would have all text that appears on the chart indexed. Speaker notes, however, like this one regarding “November Data”, could be missed as notes do not normally show on a print, by default.

Conversely, a native extraction-based index would only recognize the .jpg title of the image of the chart and index that file name as text. It cannot “read” an image (as OCR can) and so none of the text appearing on the chart would be indexed. It would, however, pick up the speaker notes regarding November Data. When you search for the company name “CAISO” an OCR-based Index would retrieve this document but a Native Extraction-based index would not. When you search for “November Data” the Native Index would retrieve this document, but an OCR index would miss it. If you were to perform a Boolean search for “CAISO AND November Data” neither index alone would return this document as responsive as it would only see one term or the other.

Some modern eDiscovery software providers will offer both indexes, however, they are siloed and so you would have to run your entire search twice, once through each index. This not only doubles your search time but still leaves you vulnerable to miss evidence when you are using Boolean searches to narrow results. Some eDiscovery vendors will instruct you to write additional language into your ESI order in an attempt to mitigate the loss of potential evidence. Unfortunately, the more complex an ESI request the more likely that mistakes will be made and evidence missed.

Lexbe has solved this false ‘index dilemma’ by creating the first concatenated eDiscovery search index, our Uber-Index℠. At ingestion, documents are run through both OCR and Native extraction indexing simultaneously. Then the OCR and Native-Extracted indices are compiled into one single, searchable database. All text is captured by these two complementary processes, and all evidence is searchable.

Additionally, Lexbe offers an integrated translation feature which is also included in our Uber Index for seamless search in either language. Whether you opt for Lexbe to perform your document translation or upload your own translated docs, our software will tie the original doc to the English translated one for integrated search and document review.

Finally, Lexbe also performs an advanced metadata extraction at ingestion for precision searches. Details such as the author of a document are extracted and will be searchable.

Features OCR Index Text-Based Index Lexbe Uber Index
Embedded Text
Charts
Budgets
Scanned Docs
Hidden Cells/Sheets
Comments
Tracked Changes
BCC Field
Meta-Data Extraction
Translated Text

With the Lexbe eDiscovery platform, your search is faster and more complete than with any other index on the market. For more information on how indexing works watch our webinar Best Practices to Avoid Missing Evidence in Large Document Reviews, part of the Lexbe eDiscovery Webinar Series.

9 Tips for Creating Searchable PDF Documents for Review

The searchable PDF (portable document format) is becoming increasingly relevant to legal professionals in discovery, document review, and related litigation matters. One of the main drivers of this trend, in addition to popularity in corporate environments, are court requirements in many jurisdictions that require pleadings and motions to be filed in PDF. Fortunately, there are many low-cost options that allow firms and organizations to inexpensively create PDFs and newer versions of Adobe Acrobat support Bates numbering and legal redaction. Below are nine best practices that will help you maximize the searchability and benefits of using PDFs in discovery.

1. Choose the ‘Text-Under-Image’ Option: When scanning a document, you may be presented with different options for types of PDF files. If available, you will usually want to choose the option that applies optical character recognition (OCR) to make the document text searchable. This can be implemented in different ways depending on your specific hardware and software, including a ‘”make searchable (apply OCR)” option, or “text-under-image” or “searchable PDF” file type options. This means that your scanned document will be text searchable within the Acrobat viewer and many other programs designed to search PDF files. The other type of PDF you could choose is called an “image-only PDF”, which is not text-searchable. When viewing a PDF file you can tell if a file is searchable by looking for the ‘select tool’ on the top bar in Acrobat Reader. This indicates that the file is text searchable.

WhySearchablePDF2. Get the Resolution Right: When scanning images to PDF for litigation purposes, 300 dpi (dots per inch) is a safe option. Scanning at a lower resolution (e.g. 200 dpi) can work well and will produce a smaller file, but legibility can suffer with smaller fonts (e.g. 6 pt. in financial documents). OCR quality can also suffer from lower scan resolutions. The trade-off is that larger scan resolutions results in larger file sizes. File scans larger than 300 dpi usually do not appreciably increase the readability of a document or its OCR quality.

3. Scan to B&W, Grayscale or Color: For litigation review purposes, ‘Black & White’ is often a good option, particularly with good quality originals, and creates a much smaller file than a grayscale or color scan. Color or grayscale may be required for photos (which do not display well with a ‘black and white’ setting). For some documents, a color scan may be critical to understanding the document, such as some charts (e.g. in Powerpoint presentations) or CAD (computer aided design) documents. Color scans and grayscale scans will be larger than B&W or grayscale scans.

4. Watch the Other Settings: Scanners will often have a number of other settings that can help improve scan quality and OCR. These include ‘deskew’ (rotates any page that is not square with the sides of the scanner bed, to make the PDF page align vertically), ‘background removal’ (whitens nearly white areas of grayscale and color input), and ‘edge shadow removal’ (removes dark streaks that occur at the edges of scanned pages, where the scanner light is shadowed by the paper edge). ‘Deskew’ will help with OCR accuracy, while ‘background removal’ and ‘edge shadow removal’ can improve readability, but can sometimes impair OCR accuracy. For important documents, it’s best to run some tests.

5. Get a Quality OCR program: All OCR is not created equal. The quality of optical character recognition varies substantially based on the quality of the program and the various settings chosen when running the program. Programs often have a ‘fast’ and ‘slow’ mode, with the slow mode usually delivering better quality OCR. Some programs will auto-rotate pages when necessary, and others will not and will make resultant OCR errors.

6. Pay Special Attention to the Numbers: One secret of OCR programs is that they routinely rely on dictionaries to recognize the text of particular characters. This works pretty well with words (if they are in the dictionary), but doesn’t help with numbers or other arbitrary characters not in a dictionary. Expect to see lower quality OCR in financial reports and other number-intensive documents.

7. Make Sure Your Litigation Support Software Really Supports PDF: Many legacy litigation software systems were designed around files saved as TIFFs (tagged image file format), an older type of file format that does not support integrated text as part of the file as PDF does. These older software systems usually have added some support for PDFs, but often the integration with PDF is incomplete and some features are not supported with PDF files.

8. Do Redactions the Right Way: Redactions can be tricky in PDF and this has been a primary reason why TIFF has survived as a popular format in legal matters. A trap for the unwary is that it is possible in a PDF file to redact text on the image of a document, and still have the redacted text be searchable! In a text-under-image PDF file, the redaction must be done on the text and the image. This problem has been fixed in the latest version of Acrobat (Acrobat Professional 8 or 9), and this program can be used for PDF redactions. Third party tools doing redactions are available as well. Many practitioners play it safe with redacted documents by printing, marking out by hand, and rescanning. This method is manual but fool-proof, and works well if number of documents to be redacted is limited.

9. Be Specific in Discovery Requests: Litigators are increasingly asking that documents produced in response to discovery requests be provided in electronic form as PDFs. If you do this, be specific as to the matters above. In particular, be sure to specify that the scan resolution be 300dpi and that the OCR be applied. You may also wish to ask what OCR software is used and what settings will be applied. To be non-specific is to invite an adversary to return documents scanned at 150dpi without OCR, that may be unsearchable, illegible and unintelligible!

Embracing the Power of the Cloud in eDiscovery

Cloud infrastructures provide an unrivalled opportunity to control eDiscovery costs without sacrificing quality, functionality, or security. Firms without large litigation support departments can instantly increase their discovery and review capacities with on demand/SaaS eDiscovery – often allowing them to take on cases they were previously unable to.

When firms make the decision between bringing litigation support and eDiscovery functions in-house or looking for scalable external solutions, there are several questions whose answers weigh heavily on the decision. How big is the firm? Do caseloads support the newly built capacity? Is there capital available to cover the high fixed costs and overhead associated with establishing or expanding an internal support department? Smaller firms consider these questions and often find themselves caught between opportunities to grow and take on larger cases and constraints imposed by the costs associated with establishing the necessary internal infrastructures.

Cloud-based ediscovery technologies provide another option: firms can immediately access massive eDiscovery capacities, expert litigation support professionals, and advanced review capabilities and pay only for what they actually use. Cloud solutions like Lexbe eDiscovery Platform can support document intensive litigation from collection through to trial. All you need is access to a browser and the internet to use most web-based/cloud ediscovery tools.

5KeyFactsCloudeDiscoveryThe Cloud is a term that has developed to describe scaled, non local storage of data. That is, instead of saving a file to a computer’s local hard drive, one can save that file to a centrally managed and expertly maintained hard drive that lives in a highly secure and professionally staffed server facility. The connection between the local computer and files in the cloud is the internet. One major benefit of cloud infrastructure for legal professionals is accessibility. Because your files are stored in the cloud, litigation teams can access them from any internet-capable device. Storing files in the cloud means that being separated from a computer or phone doesn’t really separate you from your data.

In addition to keeping ESI within reach, the cloud also keeps your data out of the hands of unauthorized persons. Data centers specialize in creating the most secure environments in the world with a wide range of measures employed to ensure redundant storage and encrypted access. The largest banks in the world rely on the security of the cloud whose security features are practically inimitable on a local computer. When you hear criticisms of cloud security, it is important to remember that “cloud” is a general, descriptive term and not all cloud providers are providing the same security protocols. It is critical when considering cloud software to ask who the infrastructure partner is to ensure credibility. For instance, Lexbe eDiscovery Platform and services run on AWS SOC III Servers, which were recently named as the sole “leader” in service provider security in a recent Forrester Research Report.

In an eDiscovery context, firms of all size have taken advantage of the accessibility and security benefits offered by the cloud. For small and mid sized firms. these benefits are much more stark and also include substantial cost reductions and efficiencies. It is simply not financially expedient or necessary for smaller firms to pay for expensive in-house installation of software and servers, in addition to hiring additional litigation support staff that will need to manage them locally. There are better options available.

Latest Blog

Subscribe to LexNotes

LexNotes is our monthly newsletter of eDiscovery and legal document management and review tips and best practices.