knowledge management for development

Data Cleansing, “Role Types”, the Main Discussion Group

In my previous post I provided an outline of how the KM4Dev SNA will be conducted. Phase One, the analysis of the discussion groups, has commenced. A fortnight or so ago I was provided post data in a XML format for the main group. This was quite exciting because I have 10 complete years of data, and two years of incomplete data to analyse. It’s also a daunting task, because before any analysis can be done 10,576 rows and 7 columns had to be cleaned and manipulated! My tool of choice for a dataset of this size, at least for the initial cleaning and manipulation stage, is Microsoft Excel. Excel has some very good capabilities including a =CLEAN command to remove non-printable hidden characters that cause problems in analysis tools.

The dataset contained 10,354 posts. 7,238 were reply posts. Of these "Anonymous" posted 1,999 replies to 1,374 posts. This represents about 18% of all posts. However, it was necessary to remove "Anonymous" from the dataset, because "Anonymous" is almost certainly not a single person, and to leave them in would distort the results. Similarly, identified pseudonyms, aliases, and duplicate names, along with “self-replies” and no answers were removed. Ultimately this process left 703 identified individuals in the network. These people comprise the node-set for the public bounded or contained network, for which activity and various network measures can be applied.

One of the first measures applied was Gloor’s Contribution Index (messages sent – messages received)/(messages sent + messages received). It is interpreted as follows:

  • If an individual only sends messages and receives none then their contribution index is +1.000.
  • If an individual only receives messages and sends none then their contribution index is -1.000.
  • If the communication behaviour is balanced then the contribution index is 0.000.

 

Coupling the index with the frequency of posting allows an individual’s “role type” to be determined as shown below. There are other indices that could be used, including those developed by Derek Hansen, Ben Shneiderman and Marc Smith, but I find Peter Gloor’s Contribution Index sufficient for this stage of the analysis.

 

 

The next diagram shows the results for the KM4Dev main discussion group. The active or key participant group comprises 113 individuals, and deeper analysis shows they are active over almost all the years in the dataset. I still need to do further analysis, but this approach provides a way of partitioning the dataset later on.

 

 

A common heuristic that can be used to determine the size of the network and predict the number of “lurkers” is the 90-9-1 rule. A 2010 study by Dr Michael Wu, using ten years of data from more than 200 online communities, found that:

  • 90% of all users are “lurkers” who don’t actively contribute.
  • 9% of all users are “occasional contributors” providing less than 50% of the content.
  • 1% of all users are “hyper-contributors” providing greater than 50% of the content.

 

The pie chart below presents data for 2010 and 2011 for the main discussion group.

 

 

Using this heuristic the predicted size of the KM4Dev main discussion group is 2,420 people. Note there is a very close correlation between Gloor’s Expediters and Wu’s Hyper-Contributors. I don’t know what the membership of the group was in 2011 and 2012, but based on the assignment brief I received it looks pretty close. What do you think?

In my next post I will provide some time analysis, and after that we will get into the social network analysis proper. I look forward to your discussion, questions, and insights.

Regards Graham

 

  1. Gloor, P 2006, Swarm creativity: Competitive advantage through collaborative innovation networks, Oxford University Press, Oxford.
  2. Hansen, D, Shneiderman, B & Smith, M 2011, Analyzing social media networks with NodeXL. Insights from a connected world., Elsevier, Burlington.

 

Views: 113

Tags: Contribution, Gloor's, Heuristic, Index, SNA, Wu's, analysis, network, social

Comment

You need to be a member of knowledge management for development to add comments!

Join knowledge management for development

Comment by Graham Durant-Law on May 14, 2012 at 8:26pm

Hullo Riff,

Thank you for your comments and the numbers. It is always nice to have a deduction confirmed. I will be posting some of the results in the next day or so.

Regards Graham

Comment by Riff Fullan on May 4, 2012 at 3:20am

Hi Graham,

Your SNA work on the KM4Dev community is fascinating, alright! On the size of the community, it is of course a difficult question, partly because there are three (more or less) linked communities: km4dev, SA-GE (Francophone) and Siwa (mostly Latin America), but also within the larger KM4Dev, there is the original email discussion group (membership today of 1386) and the more recent Ning community (was it created in early 2010? Anyway, today it lists 2643 members).....

Members

© 2013   Created by Knowledge Management for Development.

Contact km4dev | Site Design : Groupsia International  |  Report an Issue  |  Terms of Service