Business - Written by Denis Hancock on Tuesday, October 2, 2007 21:38 - 1 Comment
Mining the blogosphere for varieties of self-expression
Different groups use language differently in order to get their ideas across – one only has to flip back and forth between Jerry Springer and election debates to realize this. Ok, this may be a particularly bad example of the point I’m trying to get across, but hopefully you get the idea – there is a lot to be learned from how different groups communicate amongst themselves and with others.
Unfortunately, trying to figure out these differences has long faced a restraint common to much academic research – the time and expense to collect and annotate data. But what if, somehow, there was suddenly millions upon millions of easily accessible communications, from all kinds of people all over the world, unedited in very natural language, right at researcher’s fingertips?
Well of course there is something just like that – the blogosphere. In turn, a group of professors (mostly computer scientists, with a Chair of Psychology thrown in for good measure) have published this paper on Mining the Blogosphere: Age, gender and the varieties of self-expression. Sample size for their research? To quote:
Our corpus comprises over 140 million words of naturally occurring text from randomly selected blogs by men and women from their teens into their forties.
It’s almost absurd to think how big that sample is compared to such tests from the past - they must have felt like Dr. Evil sitting in their research lair getting ready to mine the data… if a trillion dollars was suddenly feasible in Dr. Evil’s quest for world domination… and Austin Powers had something to do with Personal Pronouns and conjunctions… and the researchers in this case had a lair of some sort. Since that comparison really makes little sense lets get right to some of their findings:
older bloggers tend to write about externally–focused topics, while younger bloggers tend to write about more personally–focused topics; changes in writing style with age are closely related. (translation – older folks are interested in what’s going on out there and the younger folks are me me me)
the linguistic factors that increase in use with age are just those used more by males of any age, and conversely, those that decrease in use with age are those used more by females of any age. (translation – I could get myself into trouble with this one so I’ll leave it as an exercise)
There are a variety of others, and if you are interested in this sort of thing it is well worth the read (noting if you think men use auxillery verbs more than women, you are sadly mistaken). Personally, I’m more interested in the use of this blogosphere sample of millions in terms of academic research- what else could we learn?
1 Comment
Naumi Haque
Business - Oct 5, 2010 12:00 - 0 Comments
DRM and us
More In Business
- Facebook, Facebook, Facebook
- Survey: How are you using Facebook, Twitter, smart phones, and other technology platforms?
- Will Facebook be your CRM provider?
- Wiki Banking
- The importance of being competent
Entertainment - Aug 3, 2010 13:14 - 2 Comments
Want to see the future? Look to the games
More In Entertainment
- Lessons in collaboration from B.B. King’s
- CL!CK – LEGO’s fun social product development platform
- Peer Pressure 2.0: Farmville
- Online gaming more than just fun
- The NFL – The most protective league, attempting to control the uncontrollable
Society - Aug 6, 2010 8:19 - 4 Comments
The Empire strikes a light
More In Society
- Balance: customer receptivity vs. customer revulsion
- The Net Gen: Too plugged-in for parenting?
- Are you addicted to social media?
- The privacy discussion we need to have
- “The Data-Driven Life”: Who’s not interested in discovery?

Coming soon in paperback! Help rename the paperback version of Macrowikinomics and win a one-hour webinar for you and your colleagues with Don Tapscott. Ends 5:00pm ET, August 31.
Hmm… I’m pretty sure I read something recently about the US government using similar technology to monitor Web sites and blogs for terrorist activity. Apparently terrorists have certain language indicators as well.