Serendip is an independent site partnering with faculty at multiple colleges and universities around the world. Happy exploring!

Linked and the Discovery of Scale Free Networks

Jessica B's picture

The book begins with a river in Prussia. In the 18th century, there was a series of bridges in Konigsberg that connected the banks of either side of the river to each other an island in the middle. A popular topic of discussion in cafes at the time asked if you could cross each bridge only once and come back to where you started. A man named Euler finally solved this problem. It's not significant that he solved it. The significance is in how he solved it.

Euler abstracted the bridges into a series of nodes and links. The nodes were the landmasses the bridges connected and the links were the bridges themselves. Using this graph, Euler discovered that the only way such a feat is possible is if the nodes with an odd number of links are either the starting or ending point in the traversal. That means there can only be two such nodes. Unfortunately, all the nodes on the Konigsberg bridge graph had an odd number of links so the answer to the problem is no.

This is important because Euler representing a real-life construct and as graph which had inherent properties through which its behavior could be deduced. Considering that graphs are just small networks, this discovery is significant.

Linked discusses "scale free networks," which are networks that consist of a huge number of small nodes connected to a small number of huge hubs via links. Scale-free networks can be found in naturally occurring systems such as the food web and social networks and in man-made systems such as the Web. Despite their dissimilarities, they all share some important properties.

Scale-free networks were discovered in stages.

The path to this discovery began most notably in 1967 when Stanley Milgrim, a noted psychologist for obedience studies, conducted an experiment on social interconnectivity. He sent a packet containing a letter explaining the experiment and several post cards to random people, asking them to try to find some other random person. Each person, if they didn't know the target person, was to send it to someone they knew on a first-name basis who they thought was more likely to know this person. They were also to send a postcard back to Milgrim so he would know how many intermediates the letter passed through. The average number of people it took to find the target person was 5.5, a surprisingly small number when you consider the number of people in the US or even in area (he did this mostly in the midwest). The question was: why were there so few links? The answer is the first discovery.

Discovery #1: While it only takes one link per node to connect all nodes, all systems in nature (and probably all man made systems) have more than one link. This means that you don't have to go through every link to find a certain node.

Duncan Watts, who was working on his Ph.D. at Cornell in the mid-1990s, was interested in how things (in this case, people) form groups. He began with the question "What is the likelihood that two friends of mine know one another?" To bring that into mathematical terms, he created the clustering coefficient. It measures how tightly knit a group is by how many of one person's friends know one another. He theorized that people form into tightly knit groups that are connected by weak links to people outside the group (ex. your neighbor). The interesting thing that was gleaned by this is that:

Discovery #2: Networks are not formed randomly. They are made up of clusters of some sort connected by links between the clusters.

The Web is an example of a network. The webpages are nodes and the links are (you guessed it) links. Some webpages (such as Google) have millions of links coming to them. Other webpages (such as "Alice's page about her adorable cats!!!!!!!!!1) probably don't have any links. Barabasi, who was very interested in the networks of the Web and Internet (and, incidentally, wrote this book), noticed this. How can one reconcile this with the idea of networks as groups connected by links? That model follows a bell curve: the majority of the nodes have about the same number of links. A few have more and a few have less.

Barabasi realized that some nodes are hubs: nodes which have an amazingly large number of links. These hubs are what keep the number of steps between nodes so low, since they are connected to so many of the nodes. Instead of following a bell curve, hubs followed a power law. This is a graph that looks a little like an asymtote. It allows for the majority of nodes to have quite a few links but also allows for a few nodes to have an incredible amount of links. He found this power law and these hubs in all the networks that he had found to be so closely connected (ex. airline routes, as opposed highway systems).

Discovery #3: Hubs exist in scale-free networks; they are what keep the number of steps so low and are described by power laws.

So why do hubs exist? First one needs to realize that networks are not static. They grow over time (think of the Web). This means a node that's been around since the beginning is going to have more links simply because it has had more opportunities to acquire nodes. However, this alone did not result in true hubs. Barabasi (who was doing this research) then added in something else: preferential attachment. This means that new nodes seeking to link with something tend to link with the node with the most links. Humans act according to this, too. Hollywood directors tend to want to hire actors that have already been in a lot of movies. People tend to want to buy the products that are already popular.

Discovery #4: Hubs are formed via growth and preferential attachment.

This model explained how nodes were formed but only on the basis of something having a long time to acquire nodes. What about Google? It's much newer than Yahoo! or AltaVista or other search engines and yet it's dominating in popularity (As I type this, I'm wearing a Google shirt). Barabasi had to add something else: fitness. Each node, no matter what it is, has a certain fitness that indicates how likely it's going to be to get nodes (often despite the first two factors, though they help).

Discovery #5: A new trait given to nodes -- "fitness" -- explains why newcomers can acquire so many links so quickly.

With these four discoveries, the idea of scale-free networks was formed. Like Euler's bridge traversal graph, these networks have certain inherent characteristics that describe what they do. Their nature makes them different from random networks, in many ways more robust, and they should be considered in a different light.

Personal postscript:

The concept of scale-free networks is important. These networks exist in virtually all (possibly all) naturally created networks, be it the food web, the Internet, or actors in Hollywood. Understanding these networks can facilitate a deeper understanding of so many fields and could help other fields by showing them how they could make their own networks more robust (especially regarding electronics and other complex man-made objects). It also shows how the interactions between potentially simple things can result in something complex and elegant.