Sunday, April 30, 2006

A Research Genealogy Project?

The Mathematics Genealogy project provides a field to categorise dissertations according to the Math Subject Class. Seeing how the selection is very broad, e.g. covering computer science, I was prompted to wonder what about genealogy projects for other subjects? There appear to be a few ideas and initiatives, including Thomas Witten's proposal for a Physics PhD Genealogy project, the High Energy Physics directory, the Software Engineering Academic Genealogy, the Theoretical Computer Science Genealogy and the Notre Dame University academic genealogy, that covers current members of its departments of Chemistry & Biochemistry and Physics.

It's a very fragmented picture, with independently developed systems, very partial coverage of researchers and yet already some duplication. It will become even more so as subject disciplines keep growing...

So it makes sense to me to take a fundamentally more integrated view that incorporates research in any field, one that can also have a richer model, taking into account different kinds of research qualifications, not just PhDs; and different kinds of relationships, not just formal supervisor-student; thereby responding to issues raised in the Mathematics PhD in the United Kingdom.

The findings yielded on this broader base will be fascinating, showing among other things how disciplines evolve over the generations, shedding light on questions such as: What happened to descendants of those who studied classics? What did the ancestors of computer scientists research? Many trends can be observed. There's a lot of talk in the UK about lifelong learning, so how about considering lifelong and generational research?

Another aspect that needs attention is the quality of entries. It's a tall order for just one central team responsible for verifying information received and compiling the database, which is the current arrangement at the Mathematics Genealogy Project. It would be better to distribute the workload and make use wherever possible of local expert knowledge, suitably authorised to update data in the areas with which they are familiar, whilst allowing for as wide public participation as possible.

So what's the solution?

I'm quite sure that the biggest consideration is organisational, not technical. It's probably a workflow problem and perhaps can be addressed by appealing to other international networks, most likely business networks. The quality control needs to rest with academic departments and it seems sensible that they should deal with information relating first to their department, then their institution and then neighbouring institutions. So I envisage an international network of genealogy research nodes where public contributions would be submitted though their nearest research node rather like, "contact your nearest reseller."

A few days ago I attended a presentation by someone who has done work for the World Wide Web consortium and he re-iterated the point that if there's one technical issue affecting software above all others it's scalability. So any proposal probably ought to design and develop a system that distributes the processing (cpu and resources) as well as the administration, though the computing power need not be distibuted per site (big companies typically use a few data centres containing large numbers of rack-mounted PCs). This suggests an application for a parallel computing grid.

I don't know what the implementation itself should look like: it could well be underpinned by a relational database or might even be a special kind of wiki (thinking about how that can really grow rapidly). However, the data model should certainly be given careful consideration. How to deploy it on the Internet? How to authenticate and authorise? Lots of questions will pop up if one investigates further!

No comments: