Make Sense of Ever-Growing Amounts of Data with Analytics

The sheer volume of data is exploding, so much so that a puny one gigabyte of storage on one lowly flash drive could produce up to 100,000 pages of printed material. The data generated worldwide already has surpassed enough printed pages to stretch from the sun to Pluto and back. And by 2020, the world’s data could boom to 44 times what it is now.

If those statistics aren’t startling enough, John Hartman has more. The greatest percentage of all that data comes in electronic formats. Worse yet, some 50 to 70 percent of any organization’s data is redundant, outdated or trivial data that has no economic value.

The data explosion makes data analytics no longer a luxury for attorneys involved in litigation, but an absolute necessity, maintains Hartman, the principal of TERIS Texas. With analytics, he notes, lawyers can take fewer actions and still get more results.

“Having that much data in your pool and the cost associated with handling that data from a litigation standpoint requires you to have some knowledge about that data, and analytics provides information from the data,” Hartman said.

Hartman addressed the issue of using analytics to manage data in recent presentations to corporate attorneys, law-firm attorneys and judges at Legal Tech Texas and the Memphis/Mid-South Chapter of the Federal Bar Association 2015 Annual Seminar. His topic was “Applying Analytics: Effective and Efficient Data Management in Litigation and Beyond.”

Despite the data boom, Hartman notes, only 28 percent of organizations say they are attempting to instigate practices to better “govern” their data, and only 10 percent say their efforts have been effective.

“If you’re not using methods/technology on the front end to give you information about your data and if we go back to statistics that imply the majority of corporate data is, either riddled with ROT (redundant, outdated or trivial), or not structured in a way that allows for precise recall, you find yourself in a poor position to best control related expenses, budget, make strategic decisions and defend one of your most valuable corporate assets—your data,” he said.

So where is an efficient, effective attorney to start in discovery?

Three ‘buckets’ of discovery analytics

Hartman sees three types, or buckets, of discovery analytics for attorneys to consider, either alone or in combinations:

  • Structured analytics. These tools group documents based on their similarities in text andorganization, putting a basic structure around and providing information about the unknown data. Grouping email threads and identifying near duplicates are examples of structured analytics. Think of it as a basic portal to information, providing statistics about and organization for unorganized data.
  • Concept analytics. These tools expand beyond typical keyword searches to retrieve related items, helping attorneys find the meaning of content within a dataset. Concept clustering, concept searching and keyword expansion are examples of concept analytics. Such analytics are helpful in positioning a legal case, understanding potentially relevant material and determining a more accurate budget that will minimize costs while keeping a case defensible.
  • Predictive analytics. These tools start with an attorney weeding out unneeded documents from a subset of documents, then training the computer, usually in several iterations, to mimic the decisions and statistically “predict” how to properly code the rest of the documents. Such analytics are especially useful with very large data sets, where putting attorneys’ eyes on everything would be cost-prohibitive.

Using such analytics, attorneys can obtain dashboard-like overviews including of the number of documents that are originals, duplicates or near-duplicates; the top terms in the documents; and the top senders and recipients of documents. Analytics also can help attorneys zoom in or zoom out for data-map views of the results, just as Google Earth users can zoom in on specific geographic locations or zoom out to see a broader, less detailed context.

“The bottom line is, the brute-force approach is no longer feasible,” Hartman said. The cost savings in discovery and the success in defensibility make analytics a necessity, not a luxury, today.

Three ways to use analytics

Hartman also sees three areas of litigation where analytics can be applied:

  • Pre-case assessment. Before filing or responding to a case, lawyers can use analytics to investigate the data and determine a “go or no-go” strategy. They can explore the concepts and communications among people involved, and determine whether more data needs to be collected. As they better understand the material involved, they can more accurately set their budgets for the case.
  • Review. Increasing speed and accuracy helps manage the most expensive part of discovery. Clustering documents with similar characteristics can help reviewers look at the documents more efficiently. Following email threads can minimize the time spent looking over entire sets of emails. Mass tagging of duplicates reduces the number of documents needing review.
  • Production. Lawyers can use the same analytics tools they used on their client’s data to gain insight into the opposing side’s data. They can sort the opposing side’s data into categories, run concept searches to find any “smoking guns” and gain the ability to review data better than the opponents did.

To illustrate ROI and time savings that analytics can generate, Hartman points to what he calls a very typical case study. A litigation that starts with the potential review of some 254,000 documents can run a basic analytics tool to identify “content” exact duplicates, like a PDF version of a Word document with the very same content. Hash value “de-duping” is not the same as content-based exact duplicate identification, which analytics provides. In the example above, “hash value” de-duping had already occurred during the initial data processing, the content exact duplicates are generated by text that matches exactly, regardless of file type. A lawyer looking at a document can now see if any documents are content exact duplicates and bulk tag all the documents with one key stroke. In this case, what would have been a review cost of more than $203,000 came down to about $181,000, and what would have been more than 35 days of review time came down to some 29 days of review.

Using analytics obviously has several benefits. In particular, Hartman points out its ability to reduce the number of files that need to be reviewed—a critical benefit, considering that discovery can cost about $20,000 per gigabyte of data, with review eating approximately $18,000 of this number. The use of analytics also increases consistency across the review process, compared to review by multiple human beings. The tool also helps law firms in their marketing efforts, since technology gives small firms a bigger footprint than they otherwise would have and helps them market their services to new clients.

TERIS plays a variety of roles in helping its clients with analytics. The firm provides analytics as a service, and it also can consult with firms considering buying their own analytics software. Because TERIS partners with the top software providers instead of developing its own analytics software, Hartman says TERIS serves as an unbiased, technology-agnostic partner to the legal community.

New Federal Rules on Civil Procedure will make analytics even more important. The new rules address the need for proportionality, in which the understanding gained from documents is expected to be proportional to the dollars spent on examining them. Analytics, Hartman says, will help lawyers avoid breaking their clients’ budgets and avoid clogging the courts with weak cases.

The bottom line

Hartman offers these takeaways for attorneys seeking to get their arms around the explosion in data and make better use of analytics:

  • Get in the game. Anyone not already using the many analytics tools available is behind.
  • Test everything. Corporate clients, especially, should formally evaluate discovery software before buying it, including understanding upcoming advances and the technology roadmaps. Once they have the technology, they must commit to keeping it current.
  • Dig deep. Look at the long term, not just at how a minimal level of technology can get attorneys through a short-term deadline.

“We have so much data, and we have lawyers who are banging their heads up against the wall, and clients who are trying to control costs, and courts who are trying to get proportional,” Hartman said. “In corralling this data monster, analytics is a tool that becomes something you really need to have in your tool belt.”