Domain of One’s Own: A Corpus Study, Part 3 – Sentiment Analysis
This is the third of four posts taking a deep data dive into a collection of over 300 articles written about Domain of One’s Own. Part 1 and Part 2 explored who has been writing about DoOO, the most common words and phrases used by those authors, and how the use of some terms change over time. Part 4 will be published Thursday, exploring emergent topics in the DoOO corpus. In this post, we briefly explore a sentiment analysis of words and authors in the corpus. All data, visualizations, and code for this project can be found in its repository on GitHub.
How do people feel about Domain of One’s Own? Well, sentiment analysis is a tricky aspect of natural language processing. It requires a lot of nuance (and coding) to fully account for context when determining sentiment algorithmically. But we can have some fun looking at rough estimates, as long as we take it with a grain of salt.
For this sentiment analysis, I’m going to use a common sentiment lexicon developed by Bing Liu, et al. that tags a subset of English words as either positive or negative and make article-level and author-level classifications by simply counting the relative proportion of positive and negative words. There are problems with using such a simple approach: words have different sentiments according to context, the algorithm will count “not happy about” as a positive sentiment because of the word “happy,” etc. But it can give us some rough and interesting results to play with and, if so desired, to prompt a closer reading of some texts in the corpus to gain insights we otherwise might have missed.
To start with, here’s a chart of the sentiment analysis of each piece in the corpus (linked rather than embedded, due to its size).
Somewhat more interesting is this chart of the overall sentiment of the works of each author in the corpus (normalized for each author’s individual word count).
In my opinion, this doesn’t give us much interesting and useful information without looking at specific pieces, but it could prompt us to look at some things more closely. For example, only three authors have a net-negative sentiment. Is this characteristic of their writing in general? something specific about the piece(s) in question? a problem with the method of calculating sentiment? All of the above?
Somewhat more useful (and enlightening when attempting to answer those previous questions) is an analysis of which words in the corpus tend to lead to positive and negative sentiment results. Here is a word cloud of the most common positive- and negative-tagged words in the DoOO corpus.
A few things immediately stand out to me. First, the word reclaim jumps out as the most prominent positive-tagged word in the corpus. Not surprising given the fact that the two most prolific authors are the co-founders of Reclaim Hosting and that many institutions use Reclaim to power their DoOO program, but striking nonetheless. What also jumps out to me are the number of negative-tagged words that are likely positive or neutral in this corpus: cloud, issue(s), critical, disruptive, disruption, impossible (maybe?), rhetoric (!), breaking (maybe?), fall (probably the season), and punk. So maybe those net-negative articles use these terms a lot? It’s also easy to see the enthusiasm surrounding DoOO in this corpus by looking at words like awesome, love, brilliant, excellent, etc. in this sentiment word cloud.
Again, this kind of sentiment analysis doesn’t give us a lot of nuanced insight on its own, but it can give us some directions we might want to look for a more detailed analysis in the future. Stay tuned for the final installment of this corpus study on Thursday, in which I walk through a topic analysis of the DoOO corpus.
Featured image by chivosol.