Does social media data misrepresent human behaviour?

2nd Dec 2014

Scientists at McGill University, Montreal and Carnegie Mellon University in Pittsburgh have published a paper warning of the potential “flaws” associated with mining social media data for marketing insights.

Although initially designed to notify academics of the growing amount of research being validated by social media data, the paper published in the Nov. 28 issue of Science is also a cautionary notice for brands increasingly using social data as a tool for deciphering trends and making predictions.

The research, led by computer scientists Derek Ruths and Jürgen Pfeffer, found that social media data is often misrepresentative of large sections of society, creating a “population bias” that meant data gleaned from social platforms was likely to contain errors.

Ruths and Pfeffer stated that, despite different social media platforms tending to attract different types of users, they often found that researchers rarely correct for the distorted picture these populations can produce.

Only 5% of over-65s use Twitter, for example, while Pinterest’s dominant audience is females aged 25 to 34. Ruths and Pfeffer’s research highlight that further issues remained around how social media platforms were in charge of channelling human behaviour – with Facebook’s “Like” button, but lack of a “dislike” button a further example.   

As a result, with studies claiming to be able to “predict everything from summer blockbusters to fluctuations in the stock market”, Ruths and Pfeffer believe more now needs to be done to justify the quality of social media insights:

“Such erroneous results can have huge implications: thousands of research papers each year are now based on data gleaned from social media. Many of these papers are used to inform and justify decisions and investments among the public and in industry and government,” says Derek Ruths, assistant professor in McGill’s School of Computer Science.

"A common assumption underlying many large-scale social media-based studies of human behaviour is that a large-enough sample of users will drown out noise introduced by peculiarities of the platform’s population. These sampling biases are rarely corrected for, if even acknowledged".

Another issue Ruths and Pfeffer highlighted was the growing number of spammers and bots on platforms like Twitter, which were able to “masquerade as normal users” but often mistakenly incorporated into many measurements and predictions of human behaviour.  

In August, a document filed with the US Securities Exchange Commission revealed that up to 8.5% of Twitter’s daily users are automatically updating Tweets on the platform, with no human intervention.

“The common thread in all these issues is the need for researchers to be more acutely aware of what they’re actually analysing when working with social media data,” Ruths added.

Replies (0)

Please login or register to join the discussion.

There are currently no replies, be the first to post a reply.