I circulated the idea of a Research Genealogy Project among a few colleagues, who have offered some comments, giving me a bit more to ponder, particularly the basic question of what is this is really for? What purpose does it serve?
My tentative response to this at the moment is that the long term goal is to understand about higher levels of knowledge, understanding and insight and how they can propagate, flourish and advance. At a more mundane level, it might offer clues into the kinds of conditions that are more likely to lead to successful research activities based on a large body of genealogy data, perhaps useful for funding bodies.
In terms of a genealogy project based on formal research qualifications, I would focus initially on the relationships rather than the objects. There are many kinds of relationships and a standard each-way link without any meaning is usually not appropriate: the existing Maths Genealogy project already has some a sense of ordering or direction in which the Professor generally is the one who imparts to the student until the student absorbs and understands.
There are other inputs that could be modelled: ranging from formal instruction to collaboration, to influence. Looking back at my own Ph.D. (Use of Formal Methods for Safety-critical Systems), apart from my supervisor, I was given guidance by a few other staff and learnt from quite a number of researchers in the field. For instance, at the start I had to learn from those who had developed the formal theoretical foundations (e.g. the theory of testing equivalences of processes), whilst others provided certain contextual background (the application domain of medical device communications). When it came to applying some new theory, I used some methodologies (that applied safety analysis techniques) that adapted or built on the work of contemporary Ph.D students. All these informed and influenced me in my own research, but in different ways.
I corresponded with some of these by email, but although it might be interesting to model correspondence between researchers (nice graph theory applications), I can't see how you can dig into these emails in practice and in any case they were just a small proportion of authors that influenced my work.
It's going to be easier if you can work with what has been freely published, which brings us back to the thesis. What if they could be marked up in such a way that you can extract meaning? So you could know in a particular thesis whose work had provided the foundations, who was doing similar work. This is a task for experts in knowledge representation, retrieval and analysis. Patterns might emerge that show coalesence among some theses, where a lot of researchers tackle a popular topic and related issues; further some theses may show a lot of interconnectivity not only within subject areas but across subject areas, which might suggest making more explicit particular areas for co-operation and joint conferences. On the other hand, some research may be shown to go off on a limb and have little to do with others. Some nice visuals will make this much easier to see!