Avoid hearing damage: How to measure the reliability of social listeningby
Nowadays, virtually every brand you’ve heard of is monitoring social media feedback. Some businesses rely on a manual approach, requiring staff to monitor multiple social media sites. Others use automated ‘listening’ tools that track brand mentions and sentiment through interactive dashboards. It’s an interesting observation that 78% of companies say they have dedicated social media teams, but only 26% integrate social media fully into their marketing strategies.
This shows a large gap between businesses (i) recognising the importance of social media and (ii) having sufficient trust in the data to make business decisions. As analytics start to go beyond simple measurement, e.g. counting brand mentions or increases in followers, trust is becoming increasingly important. In this article, I will look at the criteria for evaluating the reliability of social media analytics, particularly when this information is being used to tailor marketing campaigns or make other critical business decisions.
What’s in a name?
One major issue concerns the accuracy of detecting brand mentions. Most companies who offer social media monitoring rely on two strategies for this: (i) restrict to hash tagged mentions, a strategy that leads to high precision (finding only relevant mentions) at the cost of many missed mentions, and (ii) unrestricted keyword search, an approach that could generate numerous false positives.
The graph above illustrates the ‘reach’ of Swatch during the month of April on Twitter. Reach is defined as the calibrated ratio of brand mentions with respect to the total number of posts over a given time period. The graph plots the reach of ‘Simple Reach’ versus ‘Curated Reach’, where ‘Simple Reach’ is based on keyword mentions. Content curation is the process of filtering potential brand mentions by requiring appropriate contextual clues (positive or negative). For example, a true mention of Swatch should contain some reference to watch, time, strap, or the activity of wearing.
A spike can be observed on April 24th for ‘Simple Reach’: this can be attributed to discussions regarding a new “colour swatch” released by a cosmetic brand.
This is an example of misleading analytics caused by the lack of proper content curation. Brand names are quite often simple words (such as the detergent brandAll) and some context checking is obviously required.
Do the numbers make sense?
The graph above also illustrates the need for informative metrics. Some vendors are choosing to calibrate raw mention counts into a meaningful, normalised indexes. This type of index has the advantage of being relatively stable with respect to modest day-to-day fluctuations; significant changes are easily discernible. It facilitates comparison across brands, time intervals, different content sources, and across different demographics. The index should take into account several features such as share of voice, sentiment, sudden spikes, etc. Recently there have been efforts to validate such metrics by attempting to correlate social media trends with hard data, such as the movement of Dow Jones industry indexes. For most businesses, the ultimate validation is obtained when they see positive outcomes of marketing/advertising campaigns that can be attributed to strategy recommendations based on such analytics.
What about accuracy?
Of course, an index is only as good as the data that goes into it. Apart from correctly tagging brand mentions, the accuracy of automatically added metadata, such as sentiment may be questioned. Datasift, an aggregator of social media content claims a sentiment analysis accuracy of 70%. While sentiment analysis accuracy will never reach human performance, it can still add valuable insight. Sentiment analysis is best used to analyse trends in change in public perception, particularly sharp upticks or downturns. More recently, some vendors have started capturing different nuances of sentiment, for example (i) sentiment associated with customer service, product quality, price etc. as well as (ii) intensity. In other words, extremely positive or negative sentiment, both of which may require social outreach. Apart from sentiment, accuracy issues can also arise when relying on demographics data such as age, gender and location.
All sources are not alike
The first two content sources that come to mind when discussing social media analytics are Twitter and Facebook. These are both similar in the sense that they are high-velocity, high volume sources and while they share similarities, they require different type of handling. Data on Twitter is for the most part publicly viewable and accessible; Facebook data on the other hand, has numerous restrictions due to privacy and other policy restrictions.
A recent study by the Pew Research Group classified six types of communities observed in social media. Of the six, two are relevant to the topic of source selection: (i) Tight Crowds, representing highly interconnected people discussing focused topics (including brands) in a conversational manner, and (ii) Brand Clusters, a large disconnected group of people all independently describing their experiences and opinions.
The first group is reflected in sources such as review sites, specialised discussion forums, blogs, private Facebook pages, etc. It’s important to consider these sources for the quality of comments, as well as potential lead generation. The second group is reflected in Twitter users: the volume of data here is useful in aggregate analytics such as share of media, sentiment, demographics etc. In other words, different sources contribute to different analytics.
All samples are not alike
Depending on which content sources are selected, the next issue relates to sampling methodology. This applies to high-velocity, high-volume content sources, such as Twitter, where processing the entire feed is prohibitively expensive, and sampling is necessary.
Marketers sometimes ask whether analytics are based on processing the entire Twitter firehose. It is not necessary to consume the entire firehose; a statistical sample of 10% of the firehose, known as the Decahose (about 50 million posts per day) is sufficient to reliably generate analytics. This broader pipeline permits discovery of socially trending phrases, emerging memes etc.
Other analytics vendors rely on data feeds generated through keyword searches. For example, one could “pull” only those posts associated with a particular hashtag or keyword. While this generates far less data, it does not permit discovery of trends. When computing analytics based on demographics, sampling rates again poses an issue. As an example, only 1% of Twitter data is location stamped; for location-based analytics, it is necessary to ensure that sufficient samples have been obtained for the period of analysis.
I have presented several criteria for evaluating the reliability of social media analytics in this article. This is not meant to be a check list for businesses when evaluating different vendors. But it is meant to raise awareness and call for more transparency in methodology used to generate analytics. Depending on the size of your business, and its capacity to tailor marketing and advertising strategies based on such information, the importance of these issues will vary. Larger enterprises that rely on daily or weekly analytics reports for business and marketing intelligence should obviously pay more attention to reliability issues.
A recent blog in the WSJ titled “Analytics and Big Data; the new Kale?” questioned whether analytics was just a passing fad that would soon be abandoned. The conclusion is that, like kale, analytics has a nutritional value, but only if treated as a hard science rather than as a fad. To that end, there is a need for well documented and justifiable methodology to promote confidence for customers who consume this data. The real value of social media analytics may come when it is integrated with traditional, transactional business data including sales figures.
Rohini Srihari is chief scientific officer at SmartFocus.