Game of Thrones (Series 1)

In this case study, the main topic is the first Series. Source : https://networkofthrones.wordpress.com/

Make sure you study the algorithms first before exploring this case study.

The network consists of Characters and Interactions. There are five interaction types. Character A and Character B are connected whenever:

  • Character A speaks directly after Character B

  • Character A speaks about Character B

  • Character C speaks about Character A and Character B

  • Character A and Character B are mentioned in the same stage direction

  • Character A and Character B appear in a scene together

The data for the 8 different series is available on Github .

../_images/got-script-to-network.png

Introduction (using Python this time!)

Connect to the local database (after pip-installing py2neo)

from py2neo import Graph
graph = Graph("bolt://localhost:7687", auth=("neo4j", "xxx"))

Read the spreadsheets into Pandas DataFrame The spreads are available at Edges and Nodes .

df_edges = pd.read_excel('got_series1_edges.xlsx',)
df_nodes = pd.read_excel('got_series1_nodes.xlsx')

Make sure the graph is enmpty

graph.run("MATCH (n) DETACH DELETE n")

Create the nodes

i=0
for idx in df_nodes.index:
   id_ = df_nodes.loc[idx,"Id"]
   label = df_nodes.loc[idx,"Label"]
   cypher="CREATE (a:Person) SET a.id=" + "'" + id_ + "'" + "SET a.label=" + "'" + label + "'"
   cypher= cypher + f'SET a.seed={i}'
   graph.run(cypher)
   i+=1
for idx in df_edges.index:
   src = df_edges.loc[idx,'Source']
   tar = df_edges.loc[idx,'Target']
   weight = df_edges.loc[idx,'Weight']
   season = df_edges.loc[idx,'Season']
   cypher = "MATCH (src:Person {id:"+"'"+src+"'}),"
   cypher = cypher + " (tar:Person {id:"+"'"+tar+"'})"
   cypher = cypher + "MERGE (src)-[r:INTERACT]-(tar)"
   cypher = cypher + f"SET r.weight={weight}"
   graph.run(cypher)

Create Named Graph

We create named graph got1 taking into account the weight of the INTERACT relationship.

CALL gds.graph.create('got1','Person',
{INTERACT:{properties: "weight"}},{nodeProperties:'seed'})

The Influencers

The pagerank algorithm shows the nodes that have the most influence on the network.

CALL gds.pageRank.stream({  nodeProjection: "Person",
relationshipProjection: "INTERACT",  maxIterations: 20,  dampingFactor: 0.85})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).id AS page, score
ORDER BY score DESC
LIMIT 5

Note that the same result can be obtained when using the graph got1, we just created.

CALL gds.pageRank.stream('got1',  {maxIterations: 20,  dampingFactor: 0.85})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).id AS page, score
ORDER BY score DESC
LIMIT 5

The table below illustrates the importance of Tyrion. Tyrion has been called one of the author’s finest creations and most popular characters by The New York Times .

page

score

TYRION

3.2352420920170064

YOREN

2.7232937733980744

VARYS

1.6609751244675761

TYWIN

1.6070904363099554

NED

1.3170987544020603

../_images/tyrion1.jpeg

Tyrion of the house of Lannister

Find the Communities

Running the louvain algorithm requires sufficient memory availability. This estimation of the cost of running the algorithm can be obtained using the estimate procedure.

The following will estimate the memory requirements for running the algorithm: Cypher

CALL gds.louvain.write.estimate('got1', { writeProperty: 'community' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory

nodeCount

relationshipCount

bytesMin

bytesMax

requiredMemory

126

513

13385

591376

[13385 Bytes … 577 KiB]

  • requiredMemory: An estimation of the required memory in a human readable format.

  • bytesMin: The minimum number of bytes required.

  • bytesMax: The maximum number of bytes required.

There are 24 Communities

CALL gds.louvain.stats('got1')
YIELD communityCount

Louvain is a stochastic algorithm; every time you run it you’ll get different community IDs, and it’s possible that community assignments may change. It uses a heuristic to maximize the modularity of each identified community.

To make Louvain deterministic you can use Seeding.

In the stream execution mode, the algorithm returns the community ID for each node. This allows us to inspect the results directly or post-process them in Cypher without any side effects.

CALL gds.louvain.stream('got1',{ seedProperty: 'seed' })
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).label AS label, communityId
ORDER BY label DESC
LIMIT 5

label

communityId

Ilyn

125

Joffrey

125

Cersei

125

Arya

125

Janos

125

Baelor

125

Bronn

125

Barristan

125

Benjen

125

Kevan

125

The mutate execution mode extends the stats mode with an important side effect: updating the named graph with a new node property containing the communityId for that node. The name of the new property is specified using the mandatory configuration parameter mutateProperty.

The result is a single summary row, similar to stats, but with some additional metrics. The mutate mode is especially useful when multiple algorithms are used in conjunction.

CALL gds.louvain.mutate('got1', { mutateProperty: 'communityId' })
YIELD communityCount, modularity, modularities

The write execution mode extends the stats mode with an important side effect: writing the community ID for each node as a property to the Neo4j database. The name of the new property is specified using the mandatory configuration parameter writeProperty.

CALL gds.louvain.write('got1', { writeProperty: 'communityId' })
YIELD communityCount, modularity, modularities

As a consequence, Cypher code can be run on the graph using the community.

MATCH (n:Person)-[:INTERACT]-(b:Person)
WHERE n.communityId = 117
RETURN n

The Louvain algorithm can also run on weighted graphs, taking the given relationship weights into concern when calculating the modularity.

CALL gds.louvain.stream('got1', { relationshipWeightProperty: 'weight' })
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).label AS label, communityId, intermediateCommunityIds
ORDER BY label ASC

Last change: Oct 30, 2023