The Case For and Against “Big Data” – Why Big Data and AI Won’t Replace Dispute Resolution Professionals

by Olof Heggemann

In this article we present several reasons why advanced databases and generative AI haven’t yet been able to replace the subjective expert judgment of lawyers when assessing the likely outcomes of disputes. We discuss the issues in relation to an example dispute, showing the difficulty of using such databases, or using generative AI models, even when there is seemingly good data available.

In a recent article, we discussed how to assess a simple legal dispute and put a figure on the overall value of the case.

To make such an assessment, we as counsel, had to elicit probabilities of a full win, partial win, or losing the case and paying both parties’ legal costs. This is naturally a subjective assessment.

A large body of research shows that subjective assessments can be volatile, prone to error and affected by biases. So, with the advent of AI and Big Data – is now not the time to do away with the subjective, error-prone human assessments?

Relying on Big Data

For at least 15 years there have been services that collect data from courts and published arbitration awards. Some examples in the US are Lex Machina, Trellis, Westlaw Edge and Fastcase / Docketalarm. Internationally there is also Wolters Kluwer Dispute Resolution Data, and in the UK there’s Solomonic. For arbitration there is Jus Mundi. All of these provide extensive information about how and when a case was resolved, and sometimes information from the judgments or awards about the case outcome.

As these databases improve, will their data soon become the main guide for decisions on whether to settle or continue with litigation or arbitration?

Let us do a thought experiment and use our previous case as an example.

Example case

We act for the claimant, a supplier that wants to get paid 1 million for a piece of machinery installed for a customer. The defendant, the customer, argues that the delivery did not meet the specifications – and even if it did – the price ended up unreasonably high.

In our thought experiment, we assume access to a complete database of court judgments and decisions in our jurisdiction. This database also has advanced capabilities to analyze the pleadings and arguments submitted to the court.

To identify relevant comparison cases, we would start by applying a search filter to show only disputes involving hardware installations like ours. Setting up such a filter may be challenging, but it should be technically feasible using current tools and some input from our case description or documents. While most databases lack this level of filtering, some do offer fairly detailed case-type categories, and we could perhaps find our case type there. Once we have our set of relevant cases, we would examine them to answer the question: “In how many of these cases did the claimant win?”

Here we encounter our first problem. This question and its answer do not help us a lot.

In our example, the supplier is the claimant because payment was not made in advance. Had the customer paid upfront, the dispute would likely look the same—but with roles reversed: the customer as claimant, seeking to recover its payment. The cases returned by our query will include suppliers acting as both claimant and defendant. From our query, we might observe that claimants win less often than defendants, but this insight alone is of limited use.

So, we adjust our filter, and our database is sophisticated enough to correctly identify which party is the supplier of technical hardware and which is the customer. We then refine our query to ask: how often do suppliers win their cases?

But challenges remain. In cases with both claims and counterclaims, what does a “win” mean if both sides succeed, and the claims cancel each other out? Did both win—or neither? This is difficult to assess, both technically (in identifying all claims, counterclaims, and set-off defences) and from the parties’ perspectives. If the defendant’s goal was simply to neutralise the main claim, then a zero-payment outcome may be seen as a win—though this intent won’t be visible in the database. Only the party knows that.

To make it easier, we change the filter and only look at the cases without counterclaims.

Now we get another small piece of information that can prove more helpful. We could perhaps find that it’s more difficult for suppliers to win compared to customers (“win” here meaning receiving the full amount claimed when acting as claimant or successfully refuting a claim when acting as defendant).

Knowing that suppliers are generally worse off than customers can be incorporated into our overall assessment (assuming the data is representative, but let’s get to that later). This is somewhat useful information. But what does this data tell us about the actual outcomes of cases after the judgment?

Favorable judgments do not always mean favorable outcomes

Let’s say the data shows that the supplier fully wins 30% of the similar cases identified. This information won’t tell us what happened after the judgment. It’s not uncommon that defendants dispute claims because their financial status is bad, or that their financial status might end up bad if they lose. If the defendant declares bankruptcy after the judgment, or if the amount cannot be recovered, the clear wins for claimants identified in the database are sometimes actually a clear financial loss for a claimant supplier. It is also not uncommon that judgments are settled at a discount, to avoid an appeal process. Those 30% wins are not all wins in the end.

It can also be that legal costs were so high that they erased the value of the win. These costs may or may not be fully reflected in the court documents. If our hypothetical database included reliable data on legal costs, we could perhaps try to estimate the “net win”. But even that is difficult, as some costs may be covered by insurance and not represent an actual loss. This is information that is often impossible to extract from court data.

So, our database and queries may tell us something about cases that resulted in a clear win or loss in court—but not what happened afterwards. But what about judgments that fall somewhere in between a clear win and a clear loss?

Outcomes between a full win and a full loss are very difficult to define

In some cases, in our hypothetical database, the supplier may have claimed 1 million but only been awarded 200k. Is that a win when only 20% was granted? Perhaps not. But it’s not uncommon to claim a high amount, knowing the final award will likely be much lower. In some situations, recovering 20% could be a major success—while in others, it may clearly signal a loss.

All outcomes between a clear win and a clear loss are hard to define. We could group cases by how much of the claimed amount was recovered, but whether these outcomes count as a win—or even a “good outcome”—can vary significantly between dispute types, and especially from the perspective the parties, and to our client.

But could we at least use the clear wins and losses to estimate the likelihood of success? Can our data give us a reliable picture of that? Probably not.

Data on judgments and awards show only a limited set of cases, and those cases are likely not representative

Settlements are very common. For a study on the likelihood of settlements in different jurisdictions, see Settlement Around the World: Settlement Rates in the Largest Economies.

This article tells us that trained academics have a very hard time interpreting how likely settlements are by relying on court data. Often in court databases, settlements are reported as “case withdrawn” or abandoned, and it’s difficult to know how many of those cases actually settled (this challenge is discussed in detail in the article). The estimates of settlements in the US ranges from 20% settlements to 67% (and far less than the 90% estimate that is a commonly heard truism). Furthermore, our perfect database will never be able to account for the cases that almost went to court but settled just before. If we’d look at those, perhaps the 90% estimate does make sense.

The actual outcome of each settlement is often not public. Some of the databases mentioned above include limited information on settlement outcomes, manually gathered from parties or court records. But even in our hypothetical complete database, such outcomes would still be hard to classify as a “win” or “loss”, or even a “good” or “bad” result. They almost always fall into the grey zone of “in-betweens” and may include provisions like: “Defendant agrees to pay 500k, and the claimant may collect the installed machinery.” The value of such a settlement is difficult to define.

Our hypothetical database with complete coverage could perhaps give us a better understanding of how likely it is that cases similar to ours settle compared to other categories. This can be useful information – but it’s still far from what we need to understand our case, and to determine the value of our case.

To understand what that value is, and what a good settlement might be, we first need to know how strong our case is. So, can we at least look at cases that ended in a clear win or lose, and use those to assess the strength of our case?

In the UK, about 77% of cases settle (according to the article mentioned above). But the cases that result in a judgment are probably not just any random 23% of all cases. It’s likely these didn’t settle because they were particularly complex, difficult to assess, or involved defendants who couldn’t afford to pay—this perhaps being the real reason for the dispute. In some cases, one or both parties may simply have been extraordinarily unwilling to negotiate. These cases are likely quite different from the 77% that did settle.

Trying to predict the likelihood of winning by just looking at the 23% of cases that did not settle, would be akin to asking the 23% tallest kids in school if they are good basketball. The answers would say something about those kids, but it would not be representative for the larger group. What group would our case be in? It’s difficult to know.

Data affect data (which will affect data) …

Even if we built the perfect database, its very use would alter the landscape it tries to describe.

Let’s say our new database launches at the end of 2026. It somehow manages to overcome the earlier challenges—clearly identifying relevant cases, categorizing wins and losses, and capturing what proportion of the claim was recovered in the grey areas in between. It also shows the net result after legal costs, and whether any amount was collected.

In this ideal scenario, the database might reveal that only 20% of cases brought by suppliers result in a win—or what we might define as a “good outcome”, such as recovering more than 60% of the claim amount after costs. And let’s assume the dataset is large enough for the results to be statistically meaningful.

At this point, we haven’t even started looking at the strength of the arguments or the evidence in our case. But with this data—can we at least get a general sense of how difficult the case might be?

Let’s consider what we could learn if we had access to such a database.

As our incredible database becomes widely used, many counsels will likely start advising suppliers to avoid litigation or arbitration with customers—since so few cases result in a good outcome. This would likely lead to fewer claims brought by supplier claimants, and fewer claims disputed in court by suppliers acting as defendants.

However, the cases that counsels perceive as exceptionally strong would likely still be brought to court, and these strong cases should now more often than before result in good outcomes. Over time, the data would reflect this shift, suggesting supplier cases are not as difficult as once believed. That, in turn, might encourage more suppliers to pursue claims—again altering the data.

In short, the data won’t stay still.

Furthermore, if our database makes it easy to see how challenging disputes are for suppliers, it’s likely that new contracts will be drafted to better protect the supplier’s position. That, in turn, would make older cases—based on different contract terms—provide us with a wrong answer.

When we finally get the perfect database that can provide our seismic shift, its foundation will soon start to shift like sand.

Summary on Big Data

Except for very standardized cases that can be easily categorized, “big data” will not replace the hard thinking required to assess disputed claims. This is not a question of immature technology, but due to the difficulty in collecting, annotating and most importantly, interpreting data that is constantly changing. As an expert in the field of assessing legal disputes puts it:

Whenever we have:
* An unstable system OR
* Not “enough” data
we need to use subjective, Bayesian probabilities to make predictions.

— John Celona, Winning at Litigation Through Decision Analysis: Creating and Executing Winning Strategies in any Litigation or Dispute, Springer, 2016, p. 126.

This does not mean that advanced databases are irrelevant. Reliable data can still provide crucial insights, such as:

how long it typically takes to resolve specific types of disputes in certain courts (whether by judgment or settlement)
the probability of settlement, and how that compares to other categories of disputes
information on legal costs
which lawyers have been engaged in similar cases
which judges have handled comparable matters

With trustworthy information, we can identify broad trends—and shifts in those trends—across different types of disputes. For example, if we first assess our case based on its merits and then learn from data that “suppliers tend to face more difficulties than customers in these types of disputes”, that could prompt us to revise our assessment, like:

“I initially thought the case was strong, but now that I see these disputes are generally more challenging, perhaps I should adjust my expectations.”

This type of reasoning is the foundation of Bayesian thinking. To update assessments in light of new information.

Even if the data offers valuable insights such as above, it still needs to be interpreted in our context. A modest chance of winning a high-value case might be worth pursuing. Conversely, even a strong claim might not make business sense if legal costs are high, and the counterparty may be unable to pay. Then again, it might—if the aim is to put pressure on a competitor.

What the data alone can’t answer is the most important question: What is the value for us in pursuing litigation or arbitration?

And this is the crucial question.

Databases and smart searches with generative AI

Advanced databases have been around for a long time but have not yet created the seismic shift that many expected or hoped for.

Databases with judgments combined with generative AI models (such as ChatGPT) are already becoming great sparring partners for lawyers wanting to discuss the strengths and weaknesses of a case, identify new arguments or choose between approaches. Smarter search functionality is getting better at finding similar cases for counsel to read (or the AI-model to summarize), which can make it easier to understand how similar cases have turned out – but to look at the aggregated results of many cases in an automated manner will quickly lead to the challenges mentioned above.

This is good news for all dispute resolution professionals and mediators out there. There is no doubt that technology has radically changed the research process involved in assessing a case, but the careful thinking required to assess the best course of action in litigation or arbitration will remain beyond reach for even advanced models and big data.

We love talking about disputes, assessments, and clearly visualising risk.
Get in touch: info@eperoto.com

The Case For and Against “Big Data” – Why Big Data and AI Won’t Replace Dispute Resolution Professionals

Company

Legal

Get Started