Centrality Measures¶

Centrality algorithms are used to understand the roles of particular nodes in a graph and their impact on that network.

Degree Centrality as a baseline metric of connectedness
Closeness Centrality for measuring how central a node is to the group
Betweenness Centrality for finding control points
PageRank for understanding the overall influence

Degree Centrality¶

The degree counts the number of incoming and outgoing relationships from a node, and is used to find popular nodes in a graph. Degree Centrality was proposed by Linton C. Freeman (1979) For example, a person with a high degree in an active social network would have a lot of immediate contacts and be more likely to catch a cold circulating in their network.

The average degree of a network is simply the total number of relationships divided by the total number of nodes. This can be heavily skewed by high degree nodes. The degree distribution is the probability that a randomly selected node will have a certain number of relationships.

The graph above shows that the average degree is 10. But the median is 2. This illustrates the heavily skewed nature of a some networks. (ref: Jacob Silterra)¶

When Should I Use Degree Centrality?¶

Use Degree Centrality if you’re attempting to analyze influence by looking at the number of incoming and outgoing relationships, or find the “popularity” of individual nodes.

To get the nodes that are following most other users can be done very easily in Neo4j.

First create our graph and give it the name social:

CALL gds.graph.create('social','USER','FOLLOWS')

CALL gds.degree.stream('social')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).id AS name, score AS followers
ORDER BY followers DESC

Note that it is not required to use a graph from the graph catalog:

CALL gds.degree.stream({
nodeProjection: 'User',
relationshipProjection: {FOLLOWS: {type: 'FOLLOWS'}}})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).id AS name, score AS followers
ORDER BY followers DESC

name	followers
Alice	4.0
Bridget	3.0
Michael	3.0
Mark	2.0
Charles	1.0
Doug	1.0
David	1.0
James	1.0
Amy	0.0

It is also possible to store the results as properties of the different nodes :

CALL gds.degree.write({
nodeProjection: 'User',relationshipProjection:
{FOLLOWS: {type: 'FOLLOWS'}},writeProperty:'followers'})

MATCH (n:User)
RETURN n.id as name, n.followers as followers

name	followers
Alice	4.0
Bridget	3.0
Charles	1.0
Doug	1.0
Mark	2.0
Michael	3.0
David	1.0
Amy	0.0
James	1.0

Closeness Centrality¶

This a method to detect those nodes that are able to spread information throughout the network. The formula for Closeness is :

\[C(u) = \frac{1}{\sum_{v=1}^{n-1}d(u,v)}\]

with :

\(u\) is a node
\(n\) is the total number of nodes in the graph
\(d(u,v)\) is the shortest path distance between the nodes u ans v

Definitions can vary according to the database being used. In Neo4j, the implementation for closeness is :

\[C(u) = \frac{n-1}{\sum_{v=1}^{n-1}d(u,v)}\]

The symbol n has a slightly different meaning. n is here the number of nodes in the same component (subgraph or group)

This definition is normalized closeness

To calculate the closeness of our social network :

CALL gds.alpha.closeness.stream({
    nodeProjection: "User",
    relationshipProjection: "FOLLOWS"
})
YIELD nodeId, centrality
RETURN gds.util.asNode(nodeId).id as name, centrality
ORDER BY centrality DESC;

The output is now :

name	centrality
Alice	1.0
Doug	1.0
David	1.0
Bridget	0.7142857142857143
Michael	0.7142857142857143
Amy	0.6666666666666666
James	0.6666666666666666
Charles	0.625
Mark	0.625

The node James has a high degree of closeness, in its group it seems to be connected to every other graph. But when studying the graph, its importance seems to be overstated.

For this reason the original formula will have to be adapted. An example is the Wassermand and Faust definition.

Wasserman and Faust Definition¶

Stanley Wasserman and Katherine Faust came up with an improved formula for calculating closeness for graphs with multiple subgraphs without connections between those groups.

\[C_{WF}(u) = \frac{n-1}{N-1}(\frac{n-1}{\sum_{v=1}^{n-1}d(u,v)})\]

This slightly modifies the way we use Neo4j, We now have to use the parameter improved:true.

 CALL gds.alpha.closeness.stream({
nodeProjection: "User",
relationshipProjection: "FOLLOWS",
improved: true
})
YIELD nodeId, centrality
RETURN gds.util.asNode(nodeId).id as name, centrality
ORDER BY centrality DESC

or when using the name social from our graph catalog :

CALL gds.alpha.closeness.stream('social',{improved:true})
YIELD nodeId, centrality
RETURN gds.util.asNode(nodeId).id as name, centrality
ORDER BY centrality DESC;

The results are now different :

name	centrality
Alice	0.625
Doug	0.625
Bridget	0.44642857142857145
Michael	0.44642857142857145
Charles	0.390625
Mark	0.390625
David	0.25
Amy	0.16666666666666666
James	0.16666666666666666

The results are now more representative of the closeness of nodes to the entire graph.The scores for the members of the smaller subgraph (David, Amy, and James) have been dampened, and they now have the lowest scores of all users. This makes sense as they are the most isolated nodes.This formula is more useful for detecting the importance of a node across the entire graph rather than within its own subgraph.

Harmonic Centrality¶

Another variation is the harmonic centrality. This deals with the fact that an unconnected note has a zero distance and an infinite closeness. In stead of taking the sum of the distances, the inverse of those instances is used.

\[C_{H}(u) = \sum_{v=1}^{n-1}\frac{n-1}{d(u,v)}\]

Note

This is not supported by Neo4j.

Betweenness Centrality¶

Sometimes the most important node in the system is not the one with the most overt power or the highest status. Sometimes it’s the middlemen that connect groups or the brokers who the most control over resources or the flow of information.Betweenness Centrality is a way of detecting the amount of influence a node has over the flow of information or resources in a graph. It is typically used to find nodes that serve as a bridge from one part of a graph to another. The Betweenness Centrality algorithm first calculates the shortest (weighted) path between every pair of nodes in a connected graph.Each node receives a score, based on the number of these shortest paths that pass through the node.The more shortest paths that a node lies on, the higher its score.

A node is considered pivotal for two other nodes if it lies on every shortest path between those nodes. Pivotal nodes play an important role in connecting other nodes—if you remove a pivotal node, the new shortest path for the original node pairs will be longer or more costly.¶

\[B(u) = \sum_{s \neq u \neq t} \frac{P(u)}{P}\]

\(u\) is the node for which the in betweenness centrality is calculated
\(P\) total number of shortest path for s and t
\(p(u)\) number of shortest paths between between s and t, and passing through u

Illustration of the calulation of \(B(D)\)¶

Example use cases include:

Identifying influencers in various organizations.Powerful individuals are not necessarily in management positions, but can be found in “brokerage positions” using Betweenness Centrality. Removal of such influencers can seriously destabilize the organization.
Uncovering key transfer points in networks such as electrical grids. Counterintuitively, removal of specific bridges can actually improve overall robustness by “islanding” disturbances.
Helping microbloggers spread their reach on Twitter, with a recommendation engine for targeting influencers.

CALL gds.betweenness.stream({nodeProjection: "User",relationshipProjection: "FOLLOWS"})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).id  AS user, score
ORDER BY score DESC;

which returns

user	centrality
Alice	10.0
Doug	7.0
Mark	7.0
David	1.0
Bridget	0.0
Charles	0.0
Michael	0.0
Amy	0.0
James	0.0

These result illustrate what we already know: “Alice is the most important person in the graph”

Adding a short-circuit in the graph¶

For large graphs, exact centrality computation isn’t practical. The fastest known algorithm for exactly computing betweenness of all the nodes has a runtime proportional to the product of the number of nodes and the number of relationships.

Let’s add a new user “Claire” who binds the two graphs together

WITH ["James","Michael","Alice","Doug","Amy"] as existing_users
MATCh (existing:User) where existing.id in existing_users
MERGE (newuser:User {id:"Claire"})
MERGE (newuser)<-[:FOLLOWS]-(existing)
MERGE (newuser)-[:FOLLOWS]->(existing)

The impact of adding Claire as a “connection” is visible:

user	centrality
Claire	44.33333333333333
Doug	18.333333333333332
Alice	16.666666666666664
Amy	8.0
James	8.0
Michael	4.0
Mark	2.1666666666666665
David	0.5
Bridget	0.0
Charles	0.0

Let’s move on and delete the node “Claire”:

MATCH(n:User {id:"Claire"})
DETACH DELETE n

PageRank¶

Page Rank is an algorithm that measures the influence or importance of nodes in a directed graph. It is computed by either iteratively distributing each node’s score (originally based on degree) over its neighbours. Theoretically this is equivalent traversing the graph in a random fashion and counting the frequency of hitting each node during these walks.

PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. PageRank was named after Larry Page one of the founders of Google.

The intuition behind influence is that relationships to more important nodes contribute more to the influence of the node in question than equivalent connections to less important nodes.Measuring influence usually involves scoring nodes, often with weighted relationships, and then updating the scores over many iterations. Sometimes all nodes are scored, and sometimes a random selection is used as a representative distribution.

Formula for a node (page) \(u\) with \(n\) pages linking into this page. The pagerank \(PR(u)\) of this node is calculated as :

\[PR(u) = (1-d) + d \left(\frac{PR(T_1)}{CO(T_1)}+ \ldots+\frac{PR(T_n)}{CO(T_n)}\right)\]

\(d\) is the dampening factor of 0.85 (default choice). This number between 0 and 1 is the probability that a user will continue without clicking. So which value does Google use then? Nobody outside the Google-complex knows, but the value recommended in the original paper is 0.85.
\(1-d\) is the probability that the node/page will be hit directly without any hit
\(PR(T_i)\) is the pagerank of node \(i\) (which connects to node \(u\))
\(CO(T_i)\) is the outdegree of a node \(i\)

T1,..Tn are called backlinks. These are all nodes that connect to the node node \(u\).

Iterative Procedure¶

PageRank is an iterative algorithm that runs either until scores converge or until a set number of iterations is reached.

Without the use of the dampening factor, it could be possible to get stuck in a so-called sink during the iteration procedure.

A node, or group of nodes, without outgoing relationships (also called a dangling node) can monopolize the PageRank score by refusing to share \(CO(T)=0\). This is known as a rank sink. You can imagine this as a surfer that gets stuck on a page, or a subset of pages, with no way out.

The damping factor provides another opportunity to avoid sinks by introducing a probability for direct link versus random node visitation. When you set d to 0.85, a completely random node is visited 15% of the time.
 CALL gds.pageRank.stream({  nodeProjection: "User",
relationshipProjection: "FOLLOWS",  maxIterations: 20,  dampingFactor: 0.85})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).id AS page, score
ORDER BY score DESC
LIMIT 5

Results :

page	score
Doug	1.661536349568594
Mark	1.5517852810554118
Alice	1.1047215465462075
Bridget	0.5332017140464881
Michael	0.5332017140464881

Last change: Oct 30, 2023

Centrality Measures¶

Degree Centrality¶

When Should I Use Degree Centrality?¶

Closeness Centrality¶

Wasserman and Faust Definition¶

Harmonic Centrality¶

Betweenness Centrality¶

Adding a short-circuit in the graph¶

PageRank¶

Iterative Procedure¶

Graphs

Navigation

Related Topics

Centrality Measures¶

Example Graph Data: A Social network¶

Degree Centrality¶

When Should I Use Degree Centrality?¶

Closeness Centrality¶

Wasserman and Faust Definition¶

Harmonic Centrality¶

Betweenness Centrality¶

Adding a short-circuit in the graph¶

PageRank¶

Iterative Procedure¶