Wednesday, November 10, 2010

Stephen Baker's "The Numerati"

This book is a fun read. It discusses the trend toward data collection and data mining to understand humans. The author gives examples of this technology applied to the workplace, to shopping, to politics, terrorist hunting, medicine, and using a online dating service. It will acquaint you with the technology, its possibilities, and the dangers. It won't give you enough information to understand how the technology works or to really appreciate where, how, and to what extent this technology can be applied.

The writers style is breezy. He intrudes his own persona into the narrative. I especially loved the bit about how he cajoled his wife to sign up with him for a dating service to see if that service could spot them as "potential mates". He sheepishly had to admit it didn't because he put his preference down for a younger woman. Only after he correct this did his wife show up in the list of candidates. Whoa! I'm a little shocked that he would admit to this. I can only imagine how much boxing around the ears he got from his wife. Especially when she discovers that the whole world now knows this about him.

The term "numerati" is used by him to cover those who collect and analyze data, build models, and predict human behaviour. Here's a bit from the book:
... today's Numerati are plowing forward, with an eye on us. They're already stitching bits of our data into predictive models, and they're just getting warmed up. In the coming decade, each of us will spawn, often unwittingly, models of ourselves in nearly every walk of life. We'll be modeled as workers, patients, soldiers, lovers, shoppers, and voters. In these early days, many of the models are still primitive, making us look like stick figures. The ultimate goal though, is to build versions of humans that are just as complex as we are -- each one unique.
The book builds the story from chapter to chapter of how we are being mapped, reduced to numbers, and turned predictable. But almost as a throwaway line at the very end of the book emerges an alternative view. One that I endorse from my experience in computer modeling. Here is a bit of a conversation he had with a friend with a doctorate in computer science on this vision of the Numerati:
...he explains that once he too dreamed of modeling the world but has since concluded that math, while powerful, is flawed.


"Ever heard of garbage in, garbage out?" His point is that mathematicians model misunderstandings of our world, often using the data at hand instead of chasing down the hidden facts.
There is a nugget of truth in this statement, but it is mangled. The data isn't garbage. It us simply unstructured, unqualified, and never completely understood. The "interpretation" that the Numerati are taking is purely statistical. So they will never be able to filter out the misleading and untrue. They simply hope that with large enough piles of data their analysis will approximate ever more closely the real underlying person. But people are maddenly unreliable. Humans learn. Once they know they are being "inspected" they change behaviour. Fads are a good example. Once the trendsetters realize that the "trend" is becoming too popular, they abandon it and start a new one. The data is not "static" because people change. So the data is unreliable, ambiguous, misleading, changeable, and sometimes just wrong. At best you get a crude approximation from this.

The techniques of data analysis, data mining, etc. work best for things that are dumb and static. They will be very useful in medicine, e.g. looking at diseases and genetics, but they will never be definitive in creating a model of a "shopper" or a "voter" or anybody else who can change their mind. The techniques will be useful, but they won't be definitive. You just can't pin people down.

I do recommend that you read the book. It is enjoyable. It will teach you some things. But take it with a grain of salt.

No comments: