Sentiment Analysis of QAnon Drops

(Source: https://www.adl.org/qanon)

Motivation

Learning a new concept in any field can be a daunting task. A typical approach is to find an application that is relevant or at least interesting to you, the learner. In teaching myself unsupervised machine learning, I was intrigued by QAnon drops; anonymous internet posts that have been shaking up the socio-political world. Are there interpretations that ML can make on QAnon drops? Do these drops lean towards a specific sentiment? Lets find out!

The Project

With the preceding motivation I set out to create a python-based unsupervised approach to describing the sentiment of QAnon drops. In the following write up we will use word2vec and K-means cluster, to implement a quick and dirty method of extracting positive or negative sentiment from these posts.

Tech and the Code

Python 3+, Pandas, Numpy, Scikit-Learn, Gensim and Matplotlib

Acknowledgements

There are a few extremely helpful resources I’d like to call out before we begin:

  1. Curated QAnon Drop dataset in JSON format
  2. Great resource for an intro on K-means clustering
  3. Excellent article on unsupervised sentiment analysis

Brief Primer on QAnon Drops

Back in 2017 anonymous, politically-motivated posts began popping up on the image-board 4-chan and then 8-chan. These posts or “drops” detailed a clandestine organization of high ranking individuals associated with pedophilic sex cults; among other nefarious practices. The true author remains a mystery to this day. The drops have garnered international attention and have provided motivation to a sizable swath of our population. Whether or not any of it is true is beyond this article however it’s very interesting to analyze from a data science standpoint. For more info on QAnon, checkout the wikipedia article.

Data Cleansing

QAnon drops appear (from a cursory glance) fairly random but do follow a similar theme throughout. On the surface, they are subject to numerous misspellings and incorrect grammar:

“Court order to preserve ALL data sent to GOOG?\nThink GOOG+ / Gmail / etc.\nComms Cleanup?\nThe More You Know…”

“There is TRUTH in MEMES.\nTRUTH that DESTROYS the FAKE NEWS narrative. \nHOUSE OF CARDS.\nQ”

In order to glean some sort of meaning, we need to clean up the data a bit by removing this “noise”.

from gensim.parsing.preprocessing import remove_stopwordscorpus = pd.DataFrame(columns=['text'])for i in imported_text['posts']:    if 'text' in i:      corpus.loc[len(corpus.index)] = remove_stopwords(str(i['text']).replace("\n", "").replace("\r", "").replace("\t", "").replace('Q', '').lower())
# Drop test postscorpus['text'] = corpus[~corpus['text'].str.contains("test")]# Remove Linkscorpus['text'] = corpus['text'].str.replace(r'https\S+', '', regex=True)corpus['text'] = corpus['text'].str.replace(r'http\S+', '', regex=True)# Drop post repliescorpus['text'] = corpus[~corpus['text'].str.contains(">>")]corpus.dropna(inplace=True)
corpus['text'].apply(lambda x: str(x).split())

Next Step

Now that we have a good grasp on the data, lets extract some word embeddings with word2vec… See you in Part Two!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store