Sentiment Analysis of QAnon Drops


In the previous two posts we imported a corpus of QAnon drops, split it and extracted a vector-space embedding of the individual words.

What Happens Now

In this part we will take our newly vectorized data and pass it into a k-means classifier. Our hope here is to split each word into positive and negative sentiment. After which we will use each words “sentiment score” to build the overall sentiment of each drop.

K-means Implementation

from sklearn.cluster import KMeansk_means = KMeans(n_clusters=2, max_iter=5, random_state=True, n_init=40).fit(corpus_vec.astype('double'))

K-means Interpretation

Let’s take a look at the plot.

print(trained_vec.wv.similar_by_vector(trained_k_means.cluster_centers_[0], topn=100, restrict_vocab=None))print(trained_vec.wv.similar_by_vector(trained_k_means.cluster_centers_[0], topn=100, restrict_vocab=None))
# Weight words in clusters based on distance to center and populate pandas dataframesentiment_corpus = pd.DataFrame(trained_vec.wv.index_to_key)sentiment_corpus.columns = ['key']sentiment_corpus['vectors'] = sentiment_corpus['key'].apply(lambda x: trained_vec.wv[f'{x}'])sentiment_corpus['cluster'] = sentiment_corpus['vectors'].apply(lambda x: trained_k_means.predict([np.array(x)]))sentiment_corpus.cluster = sentiment_corpus['cluster'].apply(lambda x: x[0])sentiment_corpus['cluster_signed'] = [-1 if i==0 else 1 for i in sentiment_corpus['cluster']]# Lets determine HOW positive or negative each word is...sentiment_corpus['min_value'] = sentiment_corpus.apply(lambda x: 1/(trained_k_means.transform([x.vectors]).min()), axis=1)sentiment_corpus['distance'] = sentiment_corpus['min_value'] * sentiment_corpus['cluster_signed']
positive_word_count = len(sentiment_corpus[sentiment_corpus['cluster_signed'] == 1])negative_word_count = len(sentiment_corpus[sentiment_corpus['cluster_signed'] == -1])print(f"Number of positive words: {positive_word_count}")>> Number of positive words: 1627 print(f"Number of negative words: {negative_word_count}")>> Number of negative words: 1467
score_df['split'] = corpus_df['text'].map(lambda x: [sentiment_corpus[sentiment_corpus['key'] == j]['distance'].values for i, j in enumerate(x)])score_df['split'] = score_df['split'].map(lambda x: [j for j in x if j.size > 0])score_df['split'] = score_df['split'].map(lambda x: [j[0] for i, j in enumerate(x)])score_df['line_score'] = score_df['split'].map(lambda x: sum(x))
score_df['split'] = score_df['split'].map(lambda x: [j for j in x if j.size > 0])

QAnon Drop Insight

Here’s the part where everything comes together. We have a dataframe populated with the sentiment values for each QAnon drop. Lets print some figures and plot…

# Totalpositive_drops_count = len(score_df[score_df[‘line_score’] > 0])negative_drops_count = len(score_df[score_df[‘line_score’] < 0])print(f”Number of positive drops: {positive_drops_count}”)
>> Number of positive drops: 1283
print(f”Number of negative drops: {negative_drops_count}”)
>> Number of negative drops: 1777
(Positive Sentiment)
(Negative Sentiment)

Next For You

Go crazy, tune your hyperparameters, validate your data with known trained data! Dive into some metrics. Use your own corpus of something completely unrelated!

Next For Me

I’d like to run the model against an uncleaned (or less cleaned) dataset to attempt to capture more of the “QAnon-ness”.

Thank You!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store