Domain of One’s Own: A Corpus Study, Part 1 – Words and Voices

This is the first of four posts taking a deep data dive into a collection of over 300 articles written about Domain of One’s Own. In this post, we’ll explore who has been writing about DoOO and what kind of vocabulary is used to describe the program. Part 2 explores the most common phrases used by those authors, and how the use of some terms change over time. Part 3 briefly explores a sentiment analysis of words and authors in the corpus. Part 4 takes a look at emergent topics in the DoOO corpus.  All data, visualizations, and code for this project can be found in its repository on GitHub.

What do people talk about when they talk about Domain of One’s Own? Which voices are most prominent? How has the conversation shifted over time? What’s missing from the conversation?

Over the past couple months, my colleagues and I have been exploring the history of Domain of One’s Own, here at UMW and elsewhere. As part of that effort, Lee Skallerup Bessette curated a list of over 300 articles and blog posts about Domain of One’s Own. This list includes the blogs of current and former UMW faculty and staff, faculty and staff at institutions with similar programs, and even a few articles from the mainstream media. So in my emerging role as the office data wrangler, I wrote a few scripts that take Lee’s list of articles, scrape the web for the content of those articles, clean and transform that data, and then analyze it, looking for emergent patterns that we might not have otherwise found. This is the first in a series of four posts in which I’ll walk through some of the more important and interesting findings from that corpus analysis.

All the code behind the visualizations and stats in this post can be found in the project GitHub repository. Web scraping is done in Python. Statistical analysis and visualization is done in R using tools from the TidyVerse.

Who’s talking about Domain of One’s Own?

We make no claims that our list of articles is complete, but Lee spent a lot of time diving deep down rabbit holes to assemble what seems to me to be a fairly comprehensive and representative list of pieces about DoOO. So let’s see what’s included.

Here are the most prolific authors, by word count, in the corpus. All authors shown here have contributed at least 500 words to the corpus.

DoOO stats visualization: number of words per author

Not surprisingly, Jim Groom has written a lot about Domain of One’s Own! Let’s take him out of the mix temporarily, so we can get a better look at the rest of the list.

DoOO stats visualization: words per author, no Jim Groom

Now we can see that Tim Owens (another former DTLTer and Jim’s co-founder at ed-tech startup, Reclaim Hosting) and Audrey Watters (one of the most important critical voices in ed-tech and a guest at UMW later this spring, where she’ll speak and lead a design sprint on activist ed-tech) have said a lot about DoOO. Martha Burtis (director of the Digital Knowledge Center at UMW, former director of DTLT, and co-conspirator with Jim and Tim in the early days of DoOO and UMW Blogs) and Michael Feldstein (educational technologist and co-publisher of e-Literate) occupy the next tier. After that, a number of familiar voices follow in a “long tail” of diminishing word counts. For the most part, the authors outside of the top five have written a single piece or two short pieces, while the top five have each written multiple pieces and/or delivered a keynote address on DoOO, hence the tiered drop-offs.

It’s not particularly surprising to me to see three key figures from DTLT’s past in the top four, with the two who left DTLT to pursue their work at Reclaim Hosting full time occupying the top two spots. However, Jim’s dominance of the corpus (over 70% of the total words!) will require some care in the following analyses, so we don’t accidentally attribute trends in Jim’s writing to the community as a whole.

What are people saying about Domain of One’s Own?

Let’s take a look at the vocabulary people are using as they discuss DoOO. Here are the most common words in the corpus (stop words removed).

DoOO stats visualization: most common words

It’s not surprising that words like domain, web, students, faculty, learning, umw, digital and the like round out the top. It’s also not surprising that WordPress shows up, as it’s the most common app installed in DoOO hosting accounts. (It’s also the only app used as part of UMW Blogs, so it was the easiest way for our students and faculty to transition from UMW Blogs to UMW Domains.) It’s also probably not surprising, though worth noting, that reclaim is a common word in this corpus, used both as a proper name (Reclaim Hosting) and as a verb (“reclaim your domain!”). Let’s flag this word for when we explore changes over time…

This list is just a raw count, however, meaning that it is 70% based on Jim Groom’s blog posts. So does this just represent Jim’s way of describing DoOO, or is it a more general trend?

Here are the top words in Jim’s corpus.

DoOO stats visualization: Jim Groom's most common words

And here are the top words in the rest of the corpus.

DoOO stats visualization: Most common words in non-Jim_groom corpus

There are a few subtle differences. For example, Jim seems to talk about UMW more than others. And while Jim talks more about back-end technical topics (domain, web, hosting, etc.) with domain being the most common term, the rest of the corpus seems to emphasize the bureaucratic (university) and the pedagogical (students, education, learning), with students being the most common term. That’s not at all to say that Jim has nothing to say about pedagogy (see the prominence of DS106 in his corpus, the course he co-taught with Martha Burtis and others in which they first experimented with giving students domains at UMW). But he does seem to write about the technical aspects of building DoOO more than others, understandable as he was instrumental in building it at UMW, and doubly understandable as he, Tim, and the Reclaim Hosting team manage much of the back-end for the universities they work with. Also, keep in mind that this is a single-word analysis, not a topic model (that’s coming in Part 4).

I also find it interesting that the word reclaim shows up more prominently in Jim’s corpus, while the word create (the name of the DoOO-like program at an increasing number of schools, like Oklahoma University and Middlebury College) is more prominent in the rest of the corpus. But since I’m suspicious that this dynamic may have changed over time, let’s flag that for further investigation, too…

These kind of comparative studies can get tedious when looking at two sets of word rankings. So let’s use another visualization tool that makes the differences much more stark. The following graphic maps word frequency in two dimensions, with the Y axis representing the probability that a given word will occur in Jim’s posts and the X axis representing the probability that that word will occur in the rest of the corpus. The diagonal line represents equal probability. Words close to the line occur with similar frequency in Jim’s writing and the rest of the corpus. Words towards the upper left are more uniquely characteristic of Jim’s writing, while words towards the bottom right are more uniquely characteristic of the rest of the corpus. (Note this is a very high-resolution image, so you’ll want to click on it to open it full-size.)

DoOO stats visualization: comparing the word frequency in writings by Jim Groom and the rest of the corpus

From this graph, we can see that Jim is a pretty positive guy! Words like awesome, amazing, brilliant, and cool sit far to his side of the line. Others seem more concerned with university bureaucracy than Jim, with words like standards, bottlenecks, program, administrative, academia, and adoption sitting far to the non-Jim side of the line. Jim also talks more about DS106 than others (which makes sense) as well as (now former) UMW employees who have played a significant role in DS106 (Alan (Levine)) and DoOO (Tim (Owens)). There’s also more about pedagogy and education on the non-Jim side, and more about distributed content that is aggregating and syndicating (big ideas in open-source and “IndieWeb” publishing) on Jim’s side. (Note that which words pop up on the graph is somewhat random. In previous iterations of this graphic, words like Martha (Burtis), (Jeff) McClurkenpunk, and indie show up on Jim’s side, further emphasizing both Jim’s aesthetic and the role that other UMW staffers like Martha and Jeff have played in the history of DoOO.)

Another word worth following up on is Woolf, which sits pretty far to the right of the line. The name, Domain of One’s Own, is a reference to Virginia Woolf’s A Room of One’s Own, and that connection with feminist history is important to a program like DoOO. Not only is DoOO meant to empower students in new ways, especially as writers ― or media creators more generally ― but DoOO was founded at Mary Washington, a historically womens’ college where more than two-thirds of the students are women.

So I looked up who uses the word Woolf in this corpus. There are five authors: Debra Schleef (12 times in her piece “Who’s Afraid of Domain of One’s Own”), Audrey Watters (6 times over several pieces), Martha Burtis (5 times in “Coding, Serendipity, and Domain of One’s Own”), Jim Groom (3 times over two pieces), and Jon Udell (1 time in his WIRED piece, “A Domain of One’s Own”).

So while it looks on this visualization like Jim uses it less than the rest of the corpus, that’s just because it’s an infrequent word. In fact, only three people use the term more than him, and 20 out of 27 uses of Woolf come from UMW folks, which seems fitting given the history of DoOO here. That said, there a notable duality to the way of One’s Own is used ― both connecting to feminist ideology and expressing an “indie” feel. But that duality should be unpacked at length and with great nuance, not as an aside in this post.

DoOO stats visualization: Comparing word frequency in writings by UMW authors and non-UMW authors

When we compare UMW authors to non-UMW authors, the results only change slightly (the UMW authors category is still dominated in raw word count by Jim Groom). But a couple interesting changes emerge. Largely due to the inclusion of Martha Burtis along with Jim Groom in the UMW corpus, the term DS106 becomes the most uniquely UMW term in the corpus. DTLT also rises in prominence, and the importance of syndication among UMW authors (a common faculty request at UMW when assigning domain work to their students) is marked as well. Institutional bureaucracy still stays prominent on the other side of the line. (Perhaps why I love working here so much?!)

This is just the beginning of what data analysis of this corpus can tell us about Domain of One’s Own and how it’s being used in various institutions. In the next post in this series, I’ll walk through two-word phrases, institutions mentioned, and how the usage of some terms change over time. Then in Part 3, I’ll unpack some sentiment analysis of text in this corpus, followed by a topic model of this corpus in Part 4. So stay tuned for more! And if you just can’t wait, take a look at the GitHub repository, where you can download data and code and play around with it yourself…

Update: If you’ve written something about Domain of One’s Own and would like to add it to our corpus, please email me at kshaffer@umw.edu. We’ll publish a follow-up study with the expanded corpus.

Featured image by Marcin Ignac (CC BY-NC-ND).