Business - Written by on Tuesday, May 27, 2008 17:07 - 1 Comment

Jeff DeChambeau
The Archeology of (Programmers’) Social Artifacts

indy-code.jpg

Ok, not quite.

My friend Abram Hindle is doing some fascinating research: he’s working on ‘mining artifacts from versioned software.’ Here’s what that means:

In software development, programmers use a central database to keep track of every change made to the code of their software. This database keeps copies of all earlier versions of the software, as well as the current version (and maybe some unstable versions for testing).

Abram is writing a piece of software that analyzes a development database like the one described above. This software looks through every iteration of the project’s code and determines what changes were made, by who and when, and stores that information in it’s own database. Based on this data, mined from comparing and contrasting previous versions of in-development programs, Abram’s software is able to figure out how much time the programmers spent on each part of the program. What’s more, the software can even determine which programmers create or fix the most errors.

These techniques and methods allow programmers to be socially linked by virtue of what parts of the code they edit — regardless of when either programmer makes their contributions and changes. On being able to map programming contributions socially like that, my buddy Phil says “I think if we did that here at my work, I’d be best friends with a guy who quit 10 years ago.” And he’s right: lots of companies have version control databases that reach back 5, 10, even 25 years. With this software you could look inside of the old code and take a long view at the effectiveness of programming teams over the years under different management regimes, or just track the lifetime growth of a given subroutine.

This form of data mining isn’t only applicable to software programming, though; it will work with any kind of version controlled document (I’m looking at you, wikis). With a mining program like this, you could examine your company wiki and see — nicely summarized — the types of contributions that each editor makes, how long they take to do it, and where they like to spend their time editing.

All of this isn’t without its dark side, though. Programming can be an involved and complicated process, often too much so to be neatly summarized by a graph. That is, there’s the danger that normal programming practices could be misunderstood by managers, who penalize programmers for generating errors, all the while losing sight of the fact that those programmers are generating, by a wide margin, the most code.

I think that this is just one example of a larger theme: that we’re able to extract useful data from the very process of creating and sharing useful data. I’m very excited to see where research like this goes over the coming months.



1 Comment

You can follow any responses to this entry through the RSS 2.0 feed. Responses are currently closed, but you can trackback from your own site.

Jenn Durley
May 28, 2008 14:15

I have a family member who programs for a major software company. He will attest to management’s misunderstanding of this type of data (something to do with him being the programmer who churns out the most code / bugs). That said, I agree that it is very interesting to think about non-software applications for this type of data mining. Fantasy application: exposing that deadbeat member of a group project whose contribution was the Title Page.

Now available in paperback!
Don Tapscott and Anthony D. William's latest collaboration, Macrowikinomics: New Solutions for a Connected Planet. Learn more.

Business - Oct 5, 2010 12:00 - 0 Comments

DRM and us

More In Business


Entertainment - Aug 3, 2010 13:14 - 2 Comments

Want to see the future? Look to the games

More In Entertainment


Society - Aug 6, 2010 8:19 - 4 Comments

The Empire strikes a light

More In Society