What’s a Sysadmin to Do? — Avoiding Digital Detritus on a Blogging Platform Older than All of My Kids

We have a lot of blogs.

I don’t just mean those of us in DTLT — we do have a lot of blogs. I mean the University of Mary Washington. UMW currently has over 2500 active domains in our Domain of One’s Own program (and almost 3500 domains all-time), and that’s to say nothing about how many blogs and websites are on those domains and their subdomains.

But before there was Domain of One’s Own there was (and still is) UMW Blogs. After three years of DTLT staff and a few UMW faculty experimenting with blogs in and out of class, UMW Blogs launched in 2007 — a WordPress installation that allowed any student, faculty, or staff member to get their own subdomain (like mygreatblog.umwblogs.org) and WordPress site, administered by DTLT. Since then, the 600 blogs of 2007 grew to over 11,000 blogs and 13,000 users in 2018!

Maintaining these platforms can be quite a challenge. Domain of One’s Own is more individualized, stable, and flexible, and we get great support from our hosting company, Reclaim Hosting. But maintaining our legacy system, UMW Blogs, at the same time seems to alternate between idyllic bliss and mass panic. It’s not very heavily used, but when something goes wrong, it goes really wrong, bringing down every site on the system. And with a number of sites that haven’t been updated since the twenty-aughts, there are many outdated themes and plugins that are poised to cause such problems.

Of course, we can’t just pull the plug. Some of our faculty and students are still using UMW Blogs, and many of the sites no longer being maintained are important to our institution and its history — whether it’s an innovative (for its time) course website, an awesome student collaboration, or an important piece of institutional history. Even while we encourage students and faculty now to use Domain of One’s Own, we want to ensure the UMW Blogs system works and those important pieces of our institutional history don’t become digital flotsam.

With that in mind, we’ve embarked on a major project over the past few months to ensure the stability of our legacy system and the long-term preservation of UMW’s digital history. On my last day as a member of DTLT 🙁 I’m going to chronicle some of those efforts, both for the benefit of the UMW community and for those at other institutions who find themselves (or soon will) in a similar situation.

Digging deep

UMW Blogs has some really interesting stuff on it! A group of students (some of whom are now UMW employees) catalogued the historical markers throughout the Fredericksburg/Spotsylvania area, mostly from the Civil War, providing important historical context. A student wrote love letters to his girlfriend at another university regularly for several months, leaving her coded messages and invitations to dinner dates (“don’t forget the coupon!”). (I’m not linking to that one because, while it is public, the couple may not remember that they are still public… more on that issue later.) Two colleges on campus hosted their Faculty Senate sites there (no longer public). Student government leaders (and campaigns) hosted sites on UMW Blogs. And there are historical sites from many student clubs, activists, and research groups. And who can forget Ermahgerd Sperts! Or my nomination for best username: umwblogs.umwblogs.org.

That said, we also have about 700 sites that were last updated on the same date they were created. (“Hello, World!”… and nothing since :/ ) We also have a number of sites that have “broken” since they were last maintained — mostly using themes and plugins that have not been updated to retain compatibility with upgrades to the WordPress core platform. And sites that, while valuable to some at the time, have been neither updated nor visited in a loooooong time.

In fact, we identified over 5000 blogs on the platform that have not been updated since 2015 or earlier, are not administered by any current UMW community members, and have either not been visited at all in the last two years or have been visited less than 100 times in the time period for which we have analytics. That’s essentially half the platform that is inactive and no longer providing benefit to users, but which is also open to vulnerabilities or “bit rot” that can cause problems for the active sites.

Here’s the thing, though. Some of the inactive sites we identified are also important pieces of institutional history.

So I did my data-science thing and identified a list of blogs that meet all of the following criteria:

  • The blog has not been updated since before Jan 1, 2016.
  • None of the blog administrators are current members of the UMW community.
  • The site has either not been visited at all in the last two years, or has logged less than 100 visits all-time.

And then Lee, our two student aids Bethany and Stefanie, and I went through the entire list to identify sites important to our institutional history, as well as course websites that are less than five years old. (Some courses are offered every three or four years, and having relatively recent course websites live can be useful for faculty and students, even if they haven’t been visited in the last couple years.) These are sites that we either think should be kept on the platform, or — more likely — that we think would be good candidates for UMW Libraries’ new Digital Archive. The latter will create a flat-file archive (no databases or dynamic content) that will be far more future-proof and less likely to just break one day. (You’ll likely be hearing more about that in the future!!)

Now, we didn’t visit all 5000+ blogs manually! Rather, we looked carefully at the metadata — site titles, the person(s) attached to the sites as administrators, the administrator’s email address, and the dates the sites were created and last updated. This gave us a really good idea if the site was created by a student or faculty member, if the site was a course website, collaborative student project, personal blog, etc. We identified almost 300 sites from this collection that we did check manually, often consulting with each other about them, before deciding on the 62 of these 5000+ sites that were important to keep public or submit to the Digital Archive.

What to do with all these blogs?!

In the end, we determined that of the 11,333 blogs on the UMW Blogs platform, 6012 of them were important to keep active (including about 50 which would best serve the UMW Community by being frozen and publicly archived before “bit rot” and broken plugins bring them down). The other 5321 blogs, many of which were important in their time, are ready to be removed from the platform. Now, we’re not talking about just deleting them! We are working with Reclaim Hosting to create a flat-file archive and a WordPress XML export of each of those blogs, which DTLT will retain for probably 2 years before permanently deleting them. We are also preparing to email the administrators of those sites to let them know our plans so they can download their content before we remove anything from the platform (or, worst-case scenario, ask us to email them the backup archive after we purge the platform). But ultimately, it is important for the health of the platform to streamline the database and focus on supporting the more recent and active sites.

There’s another issue that has been pressing on our minds — mine especially. UMW Blogs does not have a Terms of Service like Domain of One’s Own does. And while past members of DTLT (none of which are administering the platform anymore) told users that their UMW Blogs sites would be hosted basically forever, that presents a major data ownership and privacy issue. The internet is a different place than it was in 2007. According to Paul Mason, the entire internet in 2007 was smaller than Facebook is today! And that’s to say nothing of the changing ways in which we view our personal data, even our public creative work, since GamerGate, Ferguson, and Cambridge Analytica. And as the birthplace of Domain of One’s Own,  UMW (and DTLT in particular) has focused more and more on the ownership aspect of writing and working on the web — empowering students to make critical decisions about what they put on the web, what they don’t put on the web, and what they delete from the web.

We’ve received a number of requests over the past couple of years from alumni for us to remove their blog from UMW Blogs, or a specific post they created on a faculty course site, … even specific comments they left on classmate’s blog as part of an assignment. We are well aware of the vulnerabilities that working in public can create, as well as the ways in which we as people change and grow, leaving behind aspects of the (digital) identity that we once shared with the world.

And so, beyond the need to streamline the platform, we think it’s important for the sake of our former students and faculty that we take the initiative to remove old content from our public platform, and pass it along to them so they can decide what of it should be public and where it should be hosted.

So over the next few weeks, after everything is archived locally and before anything is deleted from the platform, DTLT will be reaching out to those former students, faculty, and staff, letting them know our plans, and providing them the opportunity (and documentation) to export their data and preserve it publicly or privately, in a place of their choosing.

This not only helps those currently on the platform have a better experience, but it helps our former community members once again reflect critically on their public digital identity and take a bit more ownership over their data and what’s done with it.

I hope that helps explain a bit of the what and the why of what we’re doing here. As a proponent of “digital minimalism,” I often tell my students and colleagues that what we delete is as important a part of curating our digital identity as what we publish. And our freedom to delete increases our freedom to experiment. As the attention economy and algorithmically driven content discovery have radically changed the internet since the early days of UMW Blogs, it’s worth rethinking both what we as an institution hold onto, and what we as individuals decide to keep in public venues.

This process has been more than just streamlining a platform to make it run more efficiently. It’s been a time for us to reflect on these things a unit. And hopefully it will encourage others to do the same.