Saturday, July 22, 2006

Archiving Pebble blogs at ramble.oucs

Note: This article was originally posted in the Connect section on the Educause Web site, at: 
However, this address has since become inaccessible, so the post has been reproduced here as an archive with the same date and approximately the same time. 

RAMBLE was a small JISC-funded project that linked mobile blogs with online learning environments. To practise what we preached, we maintained a project blog with many of the entries written offline and then posted from a handheld device.

We hosted our own blog server called Pebble, feature-rich multi-user multi-contributor blog by Simon Brown, written as a Web application in Java and released under an open source license .  Those who have deployed it are invariably impressed (saying typically, "Pebble rocks!") and it keeps getting better; it was well suited for the project because it supported the private blogs that were need for personal student reflections in addition to public blogs.

When colleagues in the department heard about Pebble, they also wanted a blog.  So we let them hop on board and blog away, even the Director, but we could offer no guarantees of service reliability.  This was - as so often is the case - a service run largely on good will and very little else!  A year or so later, with blog spam escalating at an alarming rate, we were obliged to call it a day, at least until some more resources come along.

But what about the blogs themselves?  The Pebble Web app underlying the RAMBLE blogs was taken offline at short notice and all the blogs vanished immediately together with comments etc.   Although a properly resourced service will not be abruptly terminated, this is a general issue to consider if you are providing hosting arrangements at your institution: if you are not going to maintain a blog server forever, what happens to a blog, say, when a student graduates? 

A first reaction might be to develop export facilities for the student to take the content with them.  Aside from the issue of standard formats for such data and what students can actually do with them (copy and paste is not really a practical option for more than a few entries), there is the perhaps greater issue of context.  Even for the relatively few blogs on ramble.oucs there were some subscribers to newsfeeds, trackbacks and hyperlinks from other sites to permalink entries and generally it had been established in a variety of contexts including projects, individual work patterns and daily activities.

Fortunately, Pebble's design is amenable to static archiving under the most popular Web servers: for instance, it has nice URLs, not only .html extensions for the permalinks, but also for calendar dates and so on.

So this was a real boon when it came to creating a usable archive.

Here's a technical summary of the steps taken for anyone interested in the details:

Step 1. Copied the blogs elsewhere temporarily
  1. Installed (deployed) a copy of the same version of Pebble on my Win XP desktop PC, accessed under localhost.
  2. Stopped the Pebble Web app and copied across the Pebble blogs from the original server plus associated data, all of which are contained in the file system, the blog entries being stored as XML files.
  3. Restarted Pebble on my machine
  4. Requested a few final 'farewell' messages from colleagues and posted on their behalf
  5. Tidied up the blog display, removing the comments and tracback decorators and some spam
Step 2. Created the archive
  1. Created a static archive using wget (with options -r -k l 0)
  2. Used ReplaceEm to do a recursive search and replace on references to localhost:port/path_to_blogs/, pointing them to
  3. Created a compressed archive (.tar.gz) of the generated files
Step 3. Deployed the archive
  1.  We had been running Tomcat under Apache, plus ramble.oucs was a virtual host; we removed the tie between Apache and Tomcat on the server (specifically removed reference to blog directories in mod jk2's file)
  2. Created a blogs directory within Apache's htdocs space for the virtual hosting of ramble.oucs
  3. Copied over and unpacked the .tgz file ... et voilà!
  4. Checked the result.  OK.  
The results are not perfect and there are probably many other viable approaches, but this has been a good result as a great deal has been preserved in context and at least people have been informed about where to read the next random jottings... like here :-)

No comments: