Business - Written by Jeff DeChambeau on Tuesday, May 27, 2008 17:07 - 1 Comment
The Archeology of (Programmers’) Social Artifacts

Ok, not quite.
My friend Abram Hindle is doing some fascinating research: he’s working on ‘mining artifacts from versioned software.’ Here’s what that means:
In software development, programmers use a central database to keep track of every change made to the code of their software. This database keeps copies of all earlier versions of the software, as well as the current version (and maybe some unstable versions for testing).
Abram is writing a piece of software that analyzes a development database like the one described above. This software looks through every iteration of the project’s code and determines what changes were made, by who and when, and stores that information in it’s own database. Based on this data, mined from comparing and contrasting previous versions of in-development programs, Abram’s software is able to figure out how much time the programmers spent on each part of the program. What’s more, the software can even determine which programmers create or fix the most errors.
These techniques and methods allow programmers to be socially linked by virtue of what parts of the code they edit — regardless of when either programmer makes their contributions and changes. On being able to map programming contributions socially like that, my buddy Phil says “I think if we did that here at my work, I’d be best friends with a guy who quit 10 years ago.” And he’s right: lots of companies have version control databases that reach back 5, 10, even 25 years. With this software you could look inside of the old code and take a long view at the effectiveness of programming teams over the years under different management regimes, or just track the lifetime growth of a given subroutine.
This form of data mining isn’t only applicable to software programming, though; it will work with any kind of version controlled document (I’m looking at you, wikis). With a mining program like this, you could examine your company wiki and see — nicely summarized — the types of contributions that each editor makes, how long they take to do it, and where they like to spend their time editing.
All of this isn’t without its dark side, though. Programming can be an involved and complicated process, often too much so to be neatly summarized by a graph. That is, there’s the danger that normal programming practices could be misunderstood by managers, who penalize programmers for generating errors, all the while losing sight of the fact that those programmers are generating, by a wide margin, the most code.
I think that this is just one example of a larger theme: that we’re able to extract useful data from the very process of creating and sharing useful data. I’m very excited to see where research like this goes over the coming months.
1 Comment
Jenn Durley
Business - Oct 5, 2010 12:00 - 0 Comments
DRM and us
More In Business
- Facebook, Facebook, Facebook
- Survey: How are you using Facebook, Twitter, smart phones, and other technology platforms?
- Will Facebook be your CRM provider?
- Wiki Banking
- The importance of being competent
Entertainment - Aug 3, 2010 13:14 - 2 Comments
Want to see the future? Look to the games
More In Entertainment
- Lessons in collaboration from B.B. King’s
- CL!CK – LEGO’s fun social product development platform
- Peer Pressure 2.0: Farmville
- Online gaming more than just fun
- The NFL – The most protective league, attempting to control the uncontrollable
Society - Aug 6, 2010 8:19 - 4 Comments
The Empire strikes a light
More In Society
- Balance: customer receptivity vs. customer revulsion
- The Net Gen: Too plugged-in for parenting?
- Are you addicted to social media?
- The privacy discussion we need to have
- “The Data-Driven Life”: Who’s not interested in discovery?

Now available in paperback!
I have a family member who programs for a major software company. He will attest to management’s misunderstanding of this type of data (something to do with him being the programmer who churns out the most code / bugs). That said, I agree that it is very interesting to think about non-software applications for this type of data mining. Fantasy application: exposing that deadbeat member of a group project whose contribution was the Title Page.