If you’re working in litigation, I’m sure you’ve frequently wondered how to get your vendor to conduct the most effective keyword searches and not break the bank. How can you find information that might be critical to the case? We’ve learned that keyword searching is an art as much as it is a science. Every project has some kind of resource limitation, so we have developed search strategies to make the most of real-life budgets, time and computing power.
Keyword searches for a typical e-discovery production yield more predictable results because the searches are conducted on complete documents and files that remain intact on the system. However, many of the cases we work on involve data that has been deleted, requiring computer forensic techniques to recover. Simply searching the ‘unallocated space’ of the hard drive (where deleted documents reside) can be helpful, but often retrieves far too much information to be useful. This is because the data is no longer organized as individual files. It’s like hunting through a land fill in search of a penny.
Let’s look at a sample case involving Bill Smith. Bill Smith works for LP Corporation and is suspected of embezzling funds. Counsel requests information such as Office documents, Acrobat files, emails, and web activity. An initial search for keywords such as “Smith”, “LP Corporation”, and “bill@lpcorporation” would return several hundred thousand hits when run across deleted and regular files. If we limit the search to only saved files, valuable information may never be found. However, when searching the ‘unallocated’ part of the hard drive we might see hundreds of thousands hits - too many to review. In Bill Smith’s case, we have 678,354 hits that might represent deleted documents, fragments of documents, emails and web activity. This data is all in unallocated space and can only be retrieved using forensic techniques.
Many folks simply ‘carve’ through unallocated space to resurrect any dead files. This can result in a high number of corrupted or irrelevant hits. How do we avoid this problem? We use keywords as we’re recovering deleted files. This technique provides us with live files that contain relevant keywords. These are now much easier to search than the land fill of unallocated files.
These live files can then be loaded into a forensic application. They are much easier to deal with and we can run additional searches on them producing information that may have been missed on the first pass. We keep searching and filtering down by relevant criteria until we come closer to finding the needle in the haystack. For example, we might search all the documents containing Bill Smith,from that set, we may eliminate all those that don’t contain relevancy to embezzlement. Reducing the search criteria further will reduce our hits so that the original 678,354 hits are now 1,200.
By only restoring deleted information that contains relevant keywords, we dramatically reduce the amount of work performed. By turning deleted information into live files, we can then easily search them and filter criteria yielding a compact, highly relevant set of data. This technique allows us to work more efficiently and save valuable computing and financial resources.



