Share this content

Prospering in the knowledge economy means better datafying human behaviour

11th Jun 2015
Share this content

One of the six steps to gaining insight advantage is creating data, which follows on from asking the right questions. Once you have defined the questions you should be asking – not defaulting to just asking those that can be answered – the challenge is to uncover or create the data that will deliver the insights required.

The process of creating data has recently acquired the ugly name datafication, popularised in Viktor Mayer-Schonberger and Kenneth Cukier’s book Big Data. Wikipedia defines datafication as a ‘modern technological trend turning many aspects of our life into computerised data and transforming this information into new forms of value’. This presumes an entirely digital phenomenon when the process of creating data so that we can better understand our world has existed since the creation of the first numerical system in Bablyon in the second millennium BC. 

And it has been a (if not the) primary driver of economic and social development, particularly since the Enlightenment. More data means better insight which means better decisions whatever the area - politics, business, healthcare or scientific research.    

The expression creating data is perhaps a little misleading because it is more a case of structuring the miasma of information that surrounds us into a format that yields the understanding we seek. Hence datafication, for all its inelegance, is a more accurate description. All the data that we might ever need exists somewhere, it just needs to be discovered (or uncovered) and structured so that it is amenable to analysis. When looked at in this light, datafication has a long and illustrious history including:

  • Map creation – from the ancient maps of Babylon, Greece and Asia to current satellite navigation systems.
  • Accounting – starting with the double entry book-keeping first used in 13th Century Florence through to the sophisticated financial ratio analysis that is common now.
  • Experimentation – Galileo employed experiments in the late 16th Century for scientific purposes and they remain integral to both R&D and data driven marketing today.
  • Statistical inference – structuring distributions to describe them in terms of centroids (mean, median, mode) and measures of dispersion (standard deviation, inter-quartile range) – has its origins in the 17th Century and underlies modern predictive analytic techniques.
  • Graphical visualisation – from the charts of William Playfair (18th Century) to the interactive visualisations that are becoming increasingly common.
  • Market research – from Likert scale surveys for capturing degrees of belief or opinion (1930s) through conjoint analysis for determining utility (1970s) to computational text mining for sentiment determination and root cause analysis (2000s).

Maps provide a good example of the datafication process. The creation of a basic map involves defining a point according to longitude and latitude, then calculating altitude relative to sea level – essentially decomposing a point into its measurable dimensions and measuring on them. To this can be added more measures, classifications (according to measures) or categorisations – average temperature and rainfall, prevailing wind strength and direction, land usage, population density, birth and mortality rates and average income level among many others.   

The above process gives meaning to geographic space. Insight is generated through applying structure - identifying how that space can be defined, capturing the data and organising it in a way that delivers understanding. 

Six steps for structuring unstructured information into data

The process of structuring information requires six steps:

  1. Identification – of the relevant measurable dimensions and categories into which information can be decomposed.
  2. Extraction – of the data or features for measuring or categorising.   
  3. Classification – grouping data, whether by interval (0-100m, 100-200m above sea level, etc.) or consolidating low level categories into higher level ones (e.g. wheat fields as arable, potato fields as horticulture, etc.).
  4. Indexation – across classes and categories to create relative measures, also over time to identify degree of change.
  5. Summarisation – of the key insights identified by indexation.
  6. Interpretation – of the insights, adding meaning through suggesting potential causes with recommendations on further investigation or action to be taken.

Structuring human behaviour using digital technology

Interpreting human behaviour has long been an important skill for sales people – identifying which customers in a store can be persuaded to buy if provided with a bit of support for example (or determining who is likely to be a shop lifter). Even this can be automated in the digital world. To illustrate how, let’s take an example that is richer in the dimensions that can be datafied – video of a suspect being interviewed by the authorities in relation to a crime, the ultimate aim of which is to identify potential lies through inconsistencies (between body language and words) or signs of high stress. Being video-based, this process lends itself to digitisation and automated algorithmic assessment following solution training and rules definition by experts.

Human behaviour provides many measurable dimensions in the form of facial expression, vocal elements and body movement. 

  • Expression changes can be quantified (frequency, length of time), categorised according to muscles used then classified as to their meaning (contempt, disgust, anger, fear, surprise, happiness, sadness). Facial asymmetry – which increases with tension – can also be defined geometrically.
  • A similar process can be applied to eye movements (up left, up right, etc.), while blink rates – a potential stress indicator – can also be quantified.
  • Facial colouring can also be decomposed into degrees of red, blue and green (0 to 255 in each case) with changes from baseline – to stress-induced pallor or embarrassment-induced blushing, for example – all quantifiable.
  • Vocal amplitude (decibels), pitch (hertz) and tonal quality (irregularities in pitch and changes in amplitude known as jitter and shimmer) are also measurable.
  • Body posture can also be described in terms of angle (forward, backwards or upright) and curvature (slumped, straight). Breathing rate can be tracked via the frequency of fine shoulder movements.  Arm and leg movements (face touching, foot-tapping) are also categorisable and quantifiable.
  • The final dimension is language – speed of speech, gap length between phrases and repetitions of words, terms or concepts.

With suitably accurate photographic and audio equipment and sufficient analytical processing, feature extraction and measurement can be entirely digitised. Categorisation and classification require the involvement of analysts in the first instance, but once the groupings are defined the process can be automated. With the classification and measuring complete, indexing becomes a computational task of identifying normal ranges and correlations between abnormalities across indicators. Summarisation involves highlighting these abnormalities in tabular or graphical form.  Interpretation then translates the information into recommendations. In the first instance this would be handled by analysts but after a while the implicit rules they follow could be codified and programmed into a rules engine with natural language generation capabilities.  

This use case may seem extreme but that is deliberate to highlight the full possibilities for how something as seemingly abstract as human behaviour can be datafied with the smart application of digital technology. And even if the precise data to answer a specific question cannot be found, a good proxy usually can.  

Finding the data that enables a hypothesis to be tested is what social scientists call identification strategies; and success in this area distinguishes the great from the good. In his book Adapt, Tim Harford writes “While Steven Levitt is famous to a wider audience as the Freakonomics researcher who did the research about the drug dealers and Sumo wrestlers, to other economists he is famous for the brilliance of his identification strategies.”

Asking the right questions requires the very human skill of curiosity and success in answering them requires the equally human skill of imagination – art to go alongside the science.  Levitt has both in abundance – hence his ability to amaze with insights which others would not even look for. And any organisation that wishes to prosper in the knowledge economy needs to do the same. Is your company ready for Freakobusiness?

Jack Springman is head of customer analytics at Sopra Steria.


Replies (0)

Please login or register to join the discussion.

There are currently no replies, be the first to post a reply.