Sentiment Analysis of Q-Anon Drops

Recap

To catch up, we are discovering sentiment in QAnon drops using an unsupervised (k-means clustering) approach. In my previous post we imported our data into a pandas dataframe and did some initial data cleaning.

What happens now

In this article we take our logical next step, translating the QAnon drops into a format the classifier (discussed in the next post) can understand. To do this we are going to leverage a python ML model called word2vec.

A little on Word2vec

Machine learning models do not learn well or even at all, in some cases, on text data. Since the models rely on complex calculations, they require numerical data.

Building / Training

We are using Gensim’s implementation of the word2vec algorithm.

from gensim.models import Word2Vec
corpus=[
["this", "is", "an", "example"],
["this", "is", "another", "example"]
]
v_model = Word2Vec(corpus_list, min_count=3, vector_size=1000, workers=4, window=5, epochs=40, sg=1)

Interpretation

We can get a good understanding of the trained embeddings and whether or not they make sense by choosing some words at random and running most_similar() on the trained vectors. These vectors can be grabbed by calling .wv on the trained model. This will give the 10 closest words, or rather the words with the closest similarities. Here are some examples:

print(v_model.wv.most_similar('corruption'))[('with', 0.7566554546356201), 
('surrounds', 0.7193422913551331),
('pure', 0.7051680088043213),
('society', 0.6776847243309021),
('dealing', 0.6197190880775452),
('temptation', 0.6144115328788757),
('allowed', 0.6123470067977905),
('everywhere', 0.6070465445518494),
('excuse', 0.6066375970840454),
('favor', 0.5992740988731384)]
print(v_model.wv.most_similar('investigation'))[('cops', 0.7476568818092346),
('ongoing', 0.7167799472808838),
('evidence', 0.7045204043388367),
('argument', 0.6977810263633728),
('sabotage', 0.6782168745994568),
('memos', 0.676296591758728),
('appointment', 0.664799690246582),
('introduce', 0.6562026143074036),
('page', 0.6557939052581787),
('signers', 0.6513379216194153)]

Next Step

Now that we have our corpus in the correct form we can cluster the vectored words into positive and negative sentiments using a K-Means classifier. See you in Part Three!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store