2006/06/09 20:22

Drawing Networks for Dummies (with Korean Stem Cell Scandal Data)

Fig 1. Social Network among the coauthors for the 2005 Science stem cell paper

This is an era of network.  In the world of academics, it covers American Journal of Sociology to Biological Bulletin.  Hit the Google News with “network”. As of now (Jun 9 2006), the biggest news all over the world is about whether death of the Al-Qaeda would hurt his own “network”. Likewise, political and cultural implication of network can never be exaggerated in this Global-Internet world.

 

hkimscil is one of those who are very much interested in network analysis. Months ago, he posted an interesting data on his blog, which was about the relationship among Woosuk Hwang and his coauthors for the scandalously discredited Science paper. I’m not sure about the reason he did not go further to analyze the data. Given the data structure, however, I suspect he might have experience some problems in doing so (no offense! ^^). The problem is very simple and easily encountered by people who begin to think of Network Analysis seriously (so did I ^^;). Statistical/mathematical analysis starts from correct understanding of data structure. In case of network analysis, one of the best references for network data is Breiger’s 1974 Social Forces paper - “The Duality of Persons and Groups”.

 

Fig 2. hkimscil’s data

 


Network analysis approaches data as matrix. Raw network data is usually an “incidence” matrix, where row represent actors, and columns the groups (or events) the actor belongs to. For example, a matrix with two columns (“NAME” and “2004”) can make a good incidence matrix. In the 32 by 1 matrix, value (1 or 0) indicates whether the actor belongs to the group of coauthors for 2004 paper.

 

However, this one column matrix does not produce any meaningful social networks among actors. The actors are just divided into two groups. Hence, we need more than two group information to produce a network matrix.

 

Another problem comes from columns such as “ORG1, 2” and “STATUS”. I can understand why the columns are placed there, but they do not compose a network matrix. To satisfy the formality of incidence matrix, I changed the data into following format.

 

Fig3. Incidence Matrix

 


For a further analysis, I limited the population to 2005 authors. The number of group variables are 21. With the information in the original data, I recoded “ORG 1, 2” into 7 categories. Likewise, 3 status groups and 10 area groups are constructed.

 

With incidence matrix, we can create an “adjacency” matrix, where both row and columns are of actors, and the elements are the number of co-membership in a group above. If Woosuck Hwang belong to every group Sunjong Kim joins, the element Hwang by Kim would be 21. Adjacency matrix is directly applied to many of network measures.

 

Imagine an i by j incidence matrix A. We can obtain two adjacency matrix from this with a simple matrix algebra:

 

            i by i adjacency matrix = AAT, where AT is the transpose of A

            j by j adjacency matrix = ATA, where AT is the transpose of A

 

In this case, AAT would be about the relationship among actors, while ATA would be about the relationship among the groups. The latter might not substantively make sense. But suppose we study the relationship among NGOs. The data of the memberships of the organizations can produce the network among them.

 

At this point, somebody would suspiciously ask me “That’s it?”. ^^ Don’t worry. It is “for dummies”. There is a very easy way waiting for you.

 

With the excel spreadsheet hanging on your desktop, go get UCINET 6which is generously available to public. It is a limited version, but working very good for this size of data.

 

For the purpose of “drawing networks”, we need to make an actor by actor adjacency matrix first. For this:

 

  1. In Excel, block and copy all the data area including column headers such as NAME.
  2. Go to UCINET, and Ctrl+S. Then you will see a UCINET spreadsheet, where first row and firs column are shaded in grey. Do not touch anything, and just Ctrl+V. Then you will see the excel data placed on the UCINET spreadsheet.
  3. Ctrl+S with any name you want, and Alt+X (exit). The name of saved file would be “name.##h”. Keep in mind the file name and path.
  4. In the menu, open “Data Affiliations …” The you will see following dialogue box:

 


Click on




in the first line. And open the ##h file you saved. Then click on “OK”.
 
  1. In the same directory you saved the data file, you will find “Affiliations.##h” file. It is the adjacency matrix formatted for UCINET.

 

Then you are perfectly ready for drawing a network.

 

  1. Click on “Draw” on the menu bar, and you will see a big browser entitled “NetDraw”. Yes, it is the program for drawing networks.
  2. By clicking on the open icon (not the one with A), open the “Affiliations,##h”, then you will see the network draw at the top.

It would be fun to find out the way change node colors, shape of overall network, etc. I would like to leave this pleasure to you guys!

 

- w