Thursday, August 5, 2010

History Sifting

Good news: Google still retains archives of Usenet postings going back to the mid-1980s. This means that my early Internet presence/history is still out there.

Bad news: Google's search utilities for those posts is godawful. If I enter the same search parameters ten times, I will get back ten, slightly different sets of search results. Each search return set is of different size and composition (with unpredictable overlap). Even more fun, an initial search for my author name applied to ALL possible news groups returns a smaller data set than my author name applied against a more specific listing of news groups. So, one must do a high-level search to find as many of the various newsgroups one has posted under, then do more specific author/newsgroup searches in order to maximize results.

Worse news: apparently, in (what I can only assume was) an effort to cut down on SPAM address-harvesting  possibilities, someone went through and munged posting addresses. For example, my primary PSU-era posting ID of "THJ100@PSUVM.PSU.EDU" has been munged to be "THJ...@PSUVM.PSU.EDU". Compounding this, not only does this make it so I can't search for any articles posted under my old posting IDs, the Google search tools think that "THJ...@PSUVM.PSU.EDU" is a bad search parameter.

All of this is compounded by the fact that I've posted under a number of userids, over the years, and a similar number of author names. While I generally used my full name in my "From:", there was a non-trivial number of posts where I used a variety of nom de plumes. Fortunately, I've a pretty good memory, so I remember most of those names, userids, etc. - allowing me to come up with a fairly exhaustive set of search parameters to use. So, even though's search functionality sucks, I've been able to pull down 350+ pieces of my Usenet posting-history. However, I've been fairly prolific throughout my entire Internet-posting life (what you see scattered across your screens on a daily basis is nothing new). So, 350+ posts over more than a decade's worth of Usenet posts far under-represents my total output.

Oh well. It just proves that, to a degree, the Internet does forget.

At any rate, what do I do with what I have found? There's no good engines for converting flat text files into a blog format - especially not one that preserves the original chronology.

