Serial: A Game of Numbers

Every analyst has their methods: take a poll, measure social media buzz, weigh various key factors. Rootclaim’s analysis of Serial: Who killed Hae Min Lee provides a good example of how the Rootclaim system uses hard numbers in order to “calculate reality” – to determine mathematically which hypothesis is the most likely.

Rootclaim recently took on one of the most controversial criminal convictions: that of Adnan Syed, sentenced to jail for murdering his ex-girlfriend Hae Min Lee. Syed’s conviction has been featured in the podcasts Serial and Undisclosed, and followers of the case have debated the minutiae on forums such as Reddit. Until now, discussion forums have focused on how a few particular pieces of evidence proved one hypothesis or another. Rootclaim has put together the first concerted effort to gather all the relevant information into one cohesive analysis.

It’s All about the Numbers

In order to “calculate reality,” Rootclaim lists the most plausible hypotheses, gathers all of the relevant evidence, and then examines how likely every piece of evidence is under each hypothesis. Most importantly, Rootclaim requires inputs – the likelihood of each piece of evidence.

In the case of Hae Min Lee’s murder: if Adnan is innocent, what’s the likelihood that his friend Jay Wilds would lie and say that he helped Adnan bury Hae? And if Jay is telling the truth, what’s the likelihood that he would change so many details in his story?

If Adnan killed Hae, what’s the likelihood that he would bury her around 7:00 PM that night (like the prosecutor claimed)? Wouldn’t he be more likely to bury her late at night? Of course, assuming that the burial was around 7:00 PM doesn’t mean that a hypothesis is wrong – it’s just much less likely.

But how do you evaluate those statistical likelihoods? And what if you calculate the numbers incorrectly?

Garbage in, Garbage out

GarbageEven if you have the right tools, the wrong inputs will produce a faulty output. Your accountant won’t be able to calculate your taxes correctly if you didn’t keep your records correctly. Similarly, Rootclaim needs to make sure that the calculated inputs are as accurate as possible. So how do we ensure reliable inputs?

Historical Data

The most accurate numbers come from records of historical events, data from similar cases, and the results from supervised tests. Therefore, we prefer to use this data whenever it is available. For example, Hae was strangled, whereas less than 2% of all murders involve strangulation. On the other hand, approximately 27% of serial killers murder their victims via strangulation – making this more likely if a serial killer was involved.

Six of One, Half a Dozen of the Other

Sometimes there are multiple paths to calculate the same figure. The FBI has data on murders by intimate partners (15% boyfriend) and by friends (23%). But should an ex-partner (such as Adnan) be calculated as part of the “boyfriend” category or just as a friend?

“Facts are stubborn things, but statistics are pliable.”

― Mark Twain

One way of calculating the percentage of murders by an ex-partner is to assume that he’s one of many friends – but he’s many times more likely than the average friend.

A better way of calculating this same percentage is to look more carefully at the facts surrounding partner murders. Approximately 70% of partner murders occur after the victim has left the relationship – meaning that most partner murders have been committed by an ex-partner.

No Good Answers

Sometimes there is no reliable data from past cases – so we need to estimate based on what we do know. Even then, Rootclaim strives to use relevant data in order to make an informed calculation. For example, what is the likelihood that a known murderer (such as Roy Sharonnie Davis) would live in Woodlawn (where Hae went to school)?

While there isn’t one statistic to reference easily, we can use a formula to calculate the likelihood. In 1999 there were approximately 35,000 residents in Woodlawn. The violent crime rate in Woodlawn is approximately 1.7 times the national rate, which in 1999 was approximately 6 murders per 100,000 people. Young women (18-24 years old) only constitute approximately 5.7% of homicide victims. Therefore, expanding to a five year period to account for the likely pool of active murderers, it is calculated that: 6 murders per year x 5 years x 1.7 elevated local homicide rate / per 100,000 persons x 35,000 population of Woodlawn x 5.7% young women = 1 murder – and, therefore, one murderer – of a young woman in the preceding 5 year period in Woodlawn.

Two Heads Are Better Than One

Rootclaim relies on input from the crowd – to identify mistakes and suggest more accurate estimates. When more people get involved, everyone benefits by pooling their knowledge. One contributor may know more about forensics, while another may be up to date on social research. Together, we can harness the power of the crowd in order to calculate reality.

Visit Rootclaim to suggest evidence and help calculate the probabilities in the analysis of Serial: Who killed Hae Min Lee, to contribute to one of our other analyses, or to suggest a new analysis.