Loading...

BOOK REVIEW: How big data exposes everyday lies

Nov 23 2017 06:00
Ian Mann

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are , by Seth Stephens-Davidowitz

AUTHOR Stephens-Davidowitz is a data scientist, someone employed to analyse and interpret complex digital data. He has worked at Google, who hired him after learning about the strength and accuracy of his data research into racism.

His exploration of data has led to fascinating revelations about mental illness, human sexuality, child abuse, abortion, advertising, religion and health. The datasets enabled by the digital explosion offered new perspectives on all manner of issues that didn’t exist a couple of decades ago.

The microscope made it possible to see that there is more to a drop of pond water than we thought, and the telescope showed us there is so much more to the night sky than we imagined. Digital data similarly reveals that there is more to human behaviour and society than we thought, and often very different to what we thought.

“One of the primary goals of this book… is to provide the missing evidence of what can be done with Big Data—how we can find the needles, if you will, in those larger and larger haystacks,” the author explains.

In the past we might have suspected something; now, using Big Data, we can prove it, or show that the world works in precisely the opposite manner.

The author’s grandmother frequently emphasised the importance of couples having common friends as a key factor for their marital success, as it was in hers. Is this sound advice?

A team of computer scientists recently analysed the biggest dataset ever assembled on human relationship, Facebook, to answer this question. What the data showed was that having a common core group of friends is a strong predictor that a relationship will not last. Having separate social circles may actually make relationships stronger.

So why did grandmother believe just the opposite of what is true? People tend to exaggerate the relevance of their own experience. We give far too much to weight to certain data points – ourselves. Similarly, we tend to overestimate the prevalence of anything that makes for a memorable story. Consider, for example, whether more people in the OECD countries die from terrorist attacks, or from drowning in bath tubs? (The answer is bath tubs!)

Four unique powers of Big Data

The author claims four unique powers of Big Data.

The first power of Big Data is, obviously, new data – data that could not be understood in small quantities.

The second power is being able to provide honest data. In the digital age, people still hide their thoughts, prejudices and desires from themselves and from other people. This is the origin of the book’s title, “Everybody Lies”. However, through people’s searches on the internet for example, even with their anonymity protected, people’s aggregated views are accurate and honest reflections of their thoughts.

We can also zoom in on small subsets of people - the third power of Big Data. For example, are people sick with the flu more likely to make flu-related searches? Which searches most closely track housing prices? If, for example, searches for schools in a district increase, we can expect housing price changes.

We can also do many causal experiments with Big Data. What types of crucial information will make the stock market move? In the US, one answer is the monthly unemployment rate. Financial institutions do whatever they can to maximise the speed with which they receive, analyse, and act on this information, and make buy or sell decisions. Today, once the labour statistics are released, the market will move in less time than it takes you to blink your eyes.

By analysing Big Data, we are also able to identify information of real value even if it is not explained. The size of a horse’s heart, and particularly the size of the left ventricle, is the single most important predictor of a horse’s success.

In the same vein, horses with small spleens earned virtually nothing. And the horse’s pedigree is a far less reliable predictor of success that we used to believe. This realisation will eventually affect the price of pedigreed horses.

Based on their Big Data analysis, Walmart identified a strong positive correlation between the sale of strawberry Pop-Tarts, and impending hurricanes. In like manner, the quality of a wine can be explained simply by the weather during the growing season, and less by a host of other factors we have become used to considering.

If your goal is to predict which wine will excel, what products will sell, which horses will win, you don’t need to be concerned with why your model works. “Just get the numbers right,” Stephens-Davidowitz recommends

Big Data comes in many forms – not only numbers, but text and even images. Traditionally, when academics or businesspeople want data, they conduct surveys.

Do newspapers influence readers’ left or right political leanings, or do readers’ leanings influence the newspaper? Using Big Data researchers can prove that, just as supermarkets identify what ice cream people want, and then fill their shelves with it, newspapers identify the viewpoints people want to read, and fill their pages with it.

The influence relationship is in the opposite direction to what many thought. But the two big data sets, how people vote in a district, and which papers they read, don’t lie.

Pictures are also data, as we see from the changing ways people have posed. Researchers studied 949 scanned yearbooks from American high schools from 1905 to 2013. From these they were able to create an “average” face out of the pictures from every decade. The image data showed how Americans, particularly women, started smiling in photos.

People originally thought of photographs as paintings for which you posed for hours. Holding a smile would have been impossible. When Kodak began associating photos with happiness, being photographed smiling was how people wanted to show others what a good time they were having.

This is the stuff of science, not pseudoscience. In the past, the world’s most famous linguists analysed individual texts; today they can reveal patterns across billions of books. The methodologies taught to graduate students in psychology, political science, and sociology and business, have been virtually untouched by the digital revolution.

This book demonstrates how much they have missed. 

Readability:  Light --+-- Serious
Insight:        High -+--- Low
Practical:      High --+-- Low

  • Ian Mann of Gateways consults internationally on leadership and strategy and is the author of Executive Update. Views expressed are his own.

Follow Fin24 on Twitter, Facebook, Google+ and Pinterest. 24.com encourages commentary submitted via MyNews24. Contributions of 200 words or more will be considered for publication.

ian mann  |  opinion  |  book reviews

NEXT ON FIN24X

 
 
 
 

Company Snapshot

Money Clinic

Money Clinic
Do you have a question about your finances? We'll get an expert opinion.
Click here...

Voting Booth

Are you participating in #BackFriday sales?

Previous results · Suggest a vote

Loading...