Giving meaningful context to social media influence with Microsoft Cognitive Services

Rachel Weil

December 7th, 20170 0

Background

In understanding social media, context is key. Current social-media analytics can tell us what topics are trending, but they don’t provide insight into the tenor of the conversation or who are engaged in it. We recently joined with social marketing startup m-oracle, based in Atlanta, for a two-day hackfest to develop a solution using Azure Functions and the Cognitive Services Topic Detection and Sentiment Analysis APIs to map out context-aware social media influence.

Problem statement

Social media influence can be measured by existing tools and services. However, the factors they consider, such as number of followers or number of posts shared by others, can lack important context. An athlete might have written a widely-shared post about a brand of shoes, but was the content positive or negative? m-oracle is developing BindexT, software as a service to address a need for massively-scaled, intelligent, and context-aware social media insights.

BindexT is meant to encompass multiple social media platforms, factors, and data points. As a first step, our team looked at solving a smaller problem: how can we develop a solution for finding, analyzing, and displaying Twitter influencer data that goes beyond a “just-the-numbers” approach ?

Solution and steps

We wanted to first identify Twitter users influential within a given subject matter domain and assign them an influencer score based on factors, such as number of followers and engagement. Specifically, we wanted to discover:

What specific topics are Twitter influencers within a given domain talking about most frequently?
For each of these topics, is that conversation mostly positive, mostly negative, or varied?
How can we bring these analytics into a useful, searchable web portal?

Today, identifying Twitter influencers, their top conversational topics, and the tenor of that conversation is largely a manual process. Automating some or all of this process through machine learning has the ability to improve the results and discovery time for social media marketing operations.

Our solution included the following components and architecture, followed by explanations of why certain technologies were selected.

Azure Machine Learning: To identify Twitter influencers in certain domains. This was selected based on preexisting Microsoft solution. (See below.)
Cognitive Services Topic Detection API: To identify the top three subtopics mentioned by influencers.
Cognitive Services Sentiment Analysis API: To identify the minimum, maximum, and average sentiment of tweets associated with the topic three subtopics.
Azure Table Storage: To store subtopic and sentiment data for each influencer. After considering other options, Azure Table Storage was selected as a simple storage solution for proving out the technology for a quick proof-of-concept without having to define a schema.
Azure Functions: To automate the API calls and storage processes based on time triggers. We chose to write our Azure Function in Node.js given our developers’ familiarity with Javascript.
Azure Linux VM: To serve a website that could search and display data from Azure Table Storage. We considered a PaaS solution (such as an Azure Web App) for serving the site due to the easy maintenance and deployment, but we ultimately chose to host the site on a Linux VM for the prototype, given m-oracle’s familiarity with Linux on Azure and their preexisting prototype site built with a LAMP stack.

Twitter influencer identification

Luckily for us, a team at Microsoft recently published a ready-to-deploy solution for identifying Twitter influencers using Azure ML Studio. This project gave us exactly what we needed: an up-to-date list of Twitter influencers within certain topic domains such as “basketball.”

Topic detection

Once we’ve determined which Twitter users are influential within a given domain, we want to know what subtopics they post about most frequently.

Microsoft Cognitive Services expose prebuilt machine learning algorithms as simple REST APIs. To use the Topic Detection API for this project, we compiled at least 100 tweets into a single JSON object and then sent it to the REST endpoint. This call returns a second endpoint rather than the full results, as processing the results can take a few minutes. Once the results are ready, we sorted them by frequency and retain the top three topics for each influencer.

function topicAnalysis() {
    
    // Send tweets (documents) to Topic Detection API
    requestObj({
        url: "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/topics",
        headers: { "Content-Type": "application/json", "Ocp-Apim-Subscription-Key": textkey },
        method: "POST",
        body: JSON.stringify({ "documents": documents })
    }, function (err, res, body) {
        if (err) { console.log(`ERROR`) }
        let topicendpoint = res.headers.location;
        context.log(`WORKING // Sending tweets to Cognitive Services topic analysis API...`);

        // Check for results
        let endpointCycle = setInterval(callTopicEndpoint, 30000);
        function callTopicEndpoint() {
            requestObj({
                url: topicendpoint,
                headers: { "Ocp-Apim-Subscription-Key": textkey },
                method: "GET",
            }, function (err, res, body) {
                let result = JSON.parse(body);

                // If results are ready, retrieve topics and scores
                if (result.status == "Succeeded") {
                    documents = [];
                    for (let i = 0; i < result.operationProcessingResult.topics.length; i++) {
                        if (result.operationProcessingResult.topics[i].score >= mincount) {
                            topics.push({
                                topic: result.operationProcessingResult.topics[i].keyPhrase,
                                score: parseInt(result.operationProcessingResult.topics[i].score)
                            });
                        }
                    }

                    // Sort topics by frequency of mentions
                    topics.sort(function (a, b) {
                        return b.score - a.score;
                    })

                    // Get the top topics and send to a function to handle sentiment analysis and table storage
                    for (let i = 0; i < toptopic; i++) {
                        pullTweets(twitterhandle, tweetcount, topics[i].topic, sentAnalysis)
                    }

                    clearInterval(endpointCycle);
                }

                // If results are not yet ready, log a message.
                else if (result.status == "Running") {
                    context.log(`WORKING // Crunching the numbers. This could take several minutes...`) 
                }

                // If the API returns something else, log an error message.
                else {
                    context.log(body);
                    context.log(`Something went wrong. :(`);   
                }
            });
        }
    })
}

Sentiment analysis

Once we found the top three topics for a given Twitter influencer, we queried that user’s most recent 200 tweets on the topic and sent each corresponding tweet to the Sentiment Analysis API. We retrieved the minimum, maximum, and average sentiment associated with each topic and then stored these values in a separate table within Azure Table Storage. These API calls and the subsequent data storage were kicked off from within Microsoft’s serverless architecture platform, Azure Functions. Our Azure Function was set to run on a recurring timer.

Web front-end

Finally, we built a web front-end that allowed users to search data stored in Azure Table Storage for certain keywords. The website would display relevant Twitter influencers with their associated subtopic and sentiment ratings.

Conclusions

Following the hackathon, we recognized that while this solution helps prove out a central cognitive technology, our data storage and serverless architecture may need to evolve to encompass additional technologies to scale.

Because the Topic Detection API does not return a full list of which documents correspond to which topics, we needed to get the topics, then query the user for those topics using the Twitter API. However, the Twitter API has its own limitations around how public users’ timelines can be queried. The solution we implemented was to create a table to temporarily store a given user’s tweets while the Topic Detection API analyzed the tweets. Once the topics were returned, we queried those stored tweets for the top topics, sent those tweets to Sentiment Analysis to receive per-topic ratings, then deleted the table storing the tweets.

The code created during the hackfest is available on GitHub and may be useful to those interested in analyzing the content and sentiment of social media posts. It may also be of interest to those looking to see examples of using Azure Functions and Azure Table Storage in Node.js. Here is a link to that source code as well as some additional resources that may be helpful for others developing similar solutions.

Computing Influence Score for Twitter Users: An Azure ML Jupyter Notebook containing a complete solution written in Python
Microsoft Cognitive Services Text Analytics APIs: An overview of the Text Analytics services, including Topic Detection and Sentiment Analysis
Twitter Influencer Score Using Machine Learning: A video walkthrough of computing an influencer score via Azure ML

Ideas on how this prototype could be extended? Questions about the code? Feel free to leave comments below or reach out to the BindexT developers via GitHub.