1.7 Louvain
The Louvain method is an algorithm to detect communities in large networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities. This means evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.
The Louvain algorithm is a hierarchical clustering algorithm, that recursively merges communities into a single node and executes the modularity clustering on the condensed graphs.
The algorithm consists of the repeated application of two steps. The first step is a “greedy” assignment of nodes to communities, favoring local optimizations of modularity. The second step is the definition of a new coarse-grained network, based on the communities found in the first step. During this step nodes of the same community are merged into a single node, inheriting all connected relationships. These two steps are repeated until no further modularity-increasing reassignments of communities are possible. Because ties are broken arbitrarily, you can get different results between different runs of the Louvain algorithm.
The main drawback to Louvain is that it is significantly slower that Label Propagation and Weakly Connected Components, and the results can be hard to interpret. The algorithm is sensitive to the weighting scheme used on relationships. A good sign you need to tweak your schema or weighting is when you notice your results include only a single giant community, or every node is in it’s own community of one.
For more information on this algorithm, see:
- Lu, Hao, Mahantesh Halappanavar, and Ananth Kalyanaraman “Parallel heuristics for scalable community detection.”
- https://en.wikipedia.org/wiki/Louvain_modularity
Attention ! Running this algorithm requires sufficient memory availability. Before running this algorithm, estimate memory required for processing your graph.
Organisation of this tutorial section
The following is organised as follows. First we introduce the general syntax for using Louvain function in its different modes as this is possible in Neo4J, namely:
- stream, stats, mutate on named graphs,
- write mode on graphs in the catalogue, and
- anonymous graphs.
Then the tutorial shows, through examples, the use of the Louvain function in its different modes through simple graphs. Finally, it proposes exercices based on the graph representing the content of the famous saga Game of Thrones.
1.7.1 Syntax
1.7.1.1 Stream mode
Run Louvain in stream mode on a named graph.
CALL gds.louvain.stream(
graphName: String,
configuration: Map
)
YIELD
nodeId: Integer,
communityId: Integer,
intermediateCommunityIds: Integer[]
Parameters
ame | Type | Default | Optional | Description |
---|---|---|---|---|
graphName | String | n/a |
no | The name of a graph stored in the catalog. |
configuration | Map | {} |
yes | Configuration for algorithm-specifics and/or graph filtering. |
General configuration of the algorithm
Name | Type | Default | Optional | Description |
---|---|---|---|---|
nodeLabels | String[] | ['*'] |
yes | Filter the named graph using the given node labels. |
relationshipTypes | String[] | ['*'] |
yes | Filter the named graph using the given relationship types. |
concurrency | Integer | 4 |
yes | The number of concurrent threads used for running the algorithm. |
Algorithm specific configuration
Name | Type | Default | Optional | Description |
---|---|---|---|---|
relationshipWeightProperty | String | null |
yes | The property name that contains weight. If null , treats the graph as unweighted. Must be numeric. |
seedProperty | String | n/a |
yes | Used to set the initial community for a node. The property value needs to be a number. |
maxLevels | Integer | 10 |
yes | The maximum number of levels in which the graph is clustered and then condensed. |
maxIterations | Integer | 10 |
yes | The maximum number of iterations that the modularity optimization will run for each level. |
tolerance | Float | 0.0001 |
yes | Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns. |
includeIntermediateCommunities | Boolean | false |
yes | Indicates whether to write intermediate communities. If set to false, only the final community is persisted. |
consecutiveIds | Boolean | false |
yes | Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). Cannot be used in combination with the includeIntermediateCommunities flag. |
Results
Name | Type | Description |
---|---|---|
nodeId | Integer | Node ID. |
communityId | Integer | The community ID of the final level. |
intermediateCommunityIds | Integer[] | Community IDs for each level. Null if includeIntermediateCommunities is set to false. |
1.7.1.2 Stats mode
Run Louvain in stats mode on a named graph.
CALL gds.louvain.stats(
graphName: String,
configuration: Map
)
YIELD
createMillis: Integer,
computeMillis: Integer,
postProcessingMillis: Integer,
communityCount: Integer,
ranLevels: Integer,
modularity: Float,
modularities: Integer[],
communityDistribution: Map,
configuration: Map
Parameters
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName | String | n/a |
no | The name of a graph stored in the catalog. |
configuration | Map | {} |
yes | Configuration for algorithm-specifics and/or graph filtering. |
General configuration of the algorithm
Name | Type | Default | Optional | Description |
---|---|---|---|---|
nodeLabels | String[] | ['*'] |
yes | Filter the named graph using the given node labels. |
relationshipTypes | String[] | ['*'] |
yes | Filter the named graph using the given relationship types. |
concurrency | Integer | 4 |
yes | The number of concurrent threads used for running the algorithm. |
Algorithm specific configuration
Name | Type | Default | Optional | Description |
---|---|---|---|---|
relationshipWeightProperty | String | null |
yes | The property name that contains weight. If null , treats the graph as unweighted. Must be numeric. |
seedProperty | String | n/a |
yes | Used to set the initial community for a node. The property value needs to be a number. |
maxLevels | Integer | 10 |
yes | The maximum number of levels in which the graph is clustered and then condensed. |
maxIterations | Integer | 10 |
yes | The maximum number of iterations that the modularity optimization will run for each level. |
tolerance | Float | 0.0001 |
yes | Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns. |
includeIntermediateCommunities | Boolean | false |
yes | Indicates whether to write intermediate communities. If set to false, only the final community is persisted. |
consecutiveIds | Boolean | false |
yes | Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). Cannot be used in combination with the includeIntermediateCommunities flag. |
Results
Name | Type | Description |
---|---|---|
createMillis | Integer | Milliseconds for loading data. |
computeMillis | Integer | Milliseconds for running the algorithm. |
postProcessingMillis | Integer | Milliseconds for computing percentiles and community count. |
communityCount | Integer | The number of communities found. |
ranLevels | Integer | The number of supersteps the algorithm actually ran. |
modularity | Float | The final modularity score. |
modularities | Integer[] | The modularity scores for each level. |
communityDistribution | Map | Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of community size for the last level. |
configuration | Map | The configuration used for running the algorithm. |
1.7.1.3 Mutate mode
Run Louvain in mutate mode on a named graph.
CALL gds.louvain.mutate(
graphName: String,
configuration: Map
)
YIELD
createMillis: Integer,
computeMillis: Integer,
mutateMillis: Integer,
postProcessingMillis: Integer,
communityCount: Integer,
ranLevels: Integer,
modularity: Float,
modularities: Integer[],
communityDistribution: Map,
configuration: Map
Parameters
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName | String | n/a |
no | The name of a graph stored in the catalog. |
configuration | Map | {} |
yes | Configuration for algorithm-specifics and/or graph filtering. |
General configuration of the algorithm
Name | Type | Default | Optional | Description |
---|---|---|---|---|
nodeLabels | String[] | ['*'] |
yes | Filter the named graph using the given node labels. |
relationshipTypes | String[] | ['*'] |
yes | Filter the named graph using the given relationship types. |
concurrency | Integer | 4 |
yes | The number of concurrent threads used for running the algorithm. |
mutateProperty | String | n/a |
no | The node property in the GDS graph to which the community ID is written. |
Algorithm specific configuration
Name | Type | Default | Optional | Description |
---|---|---|---|---|
relationshipWeightProperty | String | null |
yes | The property name that contains weight. If null , treats the graph as unweighted. Must be numeric. |
seedProperty | String | n/a |
yes | Used to set the initial community for a node. The property value needs to be a number. |
maxLevels | Integer | 10 |
yes | The maximum number of levels in which the graph is clustered and then condensed. |
maxIterations | Integer | 10 |
yes | The maximum number of iterations that the modularity optimization will run for each level. |
tolerance | Float | 0.0001 |
yes | Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns. |
includeIntermediateCommunities | Boolean | false |
yes | Indicates whether to write intermediate communities. If set to false, only the final community is persisted. |
consecutiveIds | Boolean | false |
yes | Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). Cannot be used in combination with the includeIntermediateCommunities flag. |
Results
Name | Type | Description |
---|---|---|
createMillis | Integer | Milliseconds for loading data. |
computeMillis | Integer | Milliseconds for running the algorithm. |
mutateMillis | Integer | Milliseconds for adding properties to the in-memory graph. |
postProcessingMillis | Integer | Milliseconds for computing percentiles and community count. |
communityCount | Integer | The number of communities found. |
ranLevels | Integer | The number of supersteps the algorithm actually ran. |
modularity | Float | The final modularity score. |
modularities | Integer[] | The modularity scores for each level. |
communityDistribution | Map | Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of community size for the last level. |
configuration | Map | The configuration used for running the algorithm. |
1.7.1.4 Write mode
Run Louvain in write mode on a graph stored in the catalog.
CALL gds.louvain.write(
graphName: String,
configuration: Map
)
YIELD
createMillis: Integer,
computeMillis: Integer,
writeMillis: Integer,
postProcessingMillis: Integer,
nodePropertiesWritten: Integer,
communityCount: Integer,
ranLevels: Integer,
modularity: Float,
modularities: Integer[],
communityDistribution: Map,
configuration: Map
Parameters
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName | String | n/a |
no | The name of a graph stored in the catalog. |
configuration | Map | {} |
yes | Configuration for algorithm-specifics and/or graph filtering. |
General configuration of the algorithm
Name | Type | Default | Optional | Description |
---|---|---|---|---|
nodeLabels | String[] | ['*'] |
yes | Filter the named graph using the given node labels. |
relationshipTypes | String[] | ['*'] |
yes | Filter the named graph using the given relationship types. |
concurrency | Integer | 4 |
yes | The number of concurrent threads used for running the algorithm. Also provides the default value for ‘writeConcurrency’. |
writeConcurrency | Integer | value of 'concurrency' |
yes | The number of concurrent threads used for writing the result to Neo4j. |
writeProperty | String | n/a |
no | The node property in the Neo4j database to which the community ID is written. |
Algorithm specific configuration
Name | Type | Default | Optional | Description |
---|---|---|---|---|
relationshipWeightProperty | String | null |
yes | The property name that contains weight. If null , treats the graph as unweighted. Must be numeric. |
seedProperty | String | n/a |
yes | Used to set the initial community for a node. The property value needs to be a number. |
maxLevels | Integer | 10 |
yes | The maximum number of levels in which the graph is clustered and then condensed. |
maxIterations | Integer | 10 |
yes | The maximum number of iterations that the modularity optimization will run for each level. |
tolerance | Float | 0.0001 |
yes | Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns. |
includeIntermediateCommunities | Boolean | false |
yes | Indicates whether to write intermediate communities. If set to false, only the final community is persisted. |
consecutiveIds | Boolean | false |
yes | Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). Cannot be used in combination with the includeIntermediateCommunities flag. |
Results
Name | Type | Description |
---|---|---|
createMillis | Integer | Milliseconds for loading data. |
computeMillis | Integer | Milliseconds for running the algorithm. |
writeMillis | Integer | Milliseconds for writing result data back. |
postProcessingMillis | Integer | Milliseconds for computing percentiles and community count. |
nodePropertiesWritten | Integer | The number of node properties written. |
communityCount | Integer | The number of communities found. |
ranLevels | Integer | The number of supersteps the algorithm actually ran. |
modularity | Float | The final modularity score. |
modularities | Integer[] | The modularity scores for each level. |
communityDistribution | Map | Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of community size for the last level. |
configuration | Map | The configuration used for running the algorithm. |
1.7.2 Anonymous graph
Run Louvain in write mode on an anonymous graph.
CALL gds.louvain.write(configuration: Map)
YIELD
createMillis: Integer,
computeMillis: Integer,
writeMillis: Integer,
postProcessingMillis: Integer,
nodePropertiesWritten: Integer,
communityCount: Integer,
ranLevels: Integer,
modularity: Float,
modularities: Integer[],
communityDistribution: Map,
configuration: Map
General configuration of the algorithm
Name | Type | Default | Optional | Description |
---|---|---|---|---|
nodeProjection | String, String[] or Map | null |
yes | The node projection used for anonymous graph creation via a Native projection. |
relationshipProjection | String, String[] or Map | null |
yes | The relationship projection used for anonymous graph creation a Native projection. |
nodeQuery | String | null |
yes | The Cypher query used to select the nodes for anonymous graph creation via a Cypher projection. |
relationshipQuery | String | null |
yes | The Cypher query used to select the relationships for anonymous graph creation via a Cypher projection. |
nodeProperties | String, String[] or Map | null |
yes | The node properties to project during anonymous graph creation. |
relationshipProperties | String, String[] or Map | null |
yes | The relationship properties to project during anonymous graph creation. |
concurrency | Integer | 4 |
yes | The number of concurrent threads used for running the algorithm. Also provides the default value for ‘readConcurrency’ and ‘writeConcurrency’. |
readConcurrency | Integer | value of 'concurrency' |
yes | The number of concurrent threads used for creating the graph. |
writeConcurrency | Integer | value of 'concurrency' |
yes | The number of concurrent threads used for writing the result to Neo4j. |
writeProperty | String | n/a |
no | The node property in the Neo4j database to which the community ID is written. |
Algorithm specific configuration
Name | Type | Default | Optional | Description |
---|---|---|---|---|
relationshipWeightProperty | String | null |
yes | The property name that contains weight. If null , treats the graph as unweighted. Must be numeric. |
seedProperty | String | n/a |
yes | Used to set the initial community for a node. The property value needs to be a number. |
maxLevels | Integer | 10 |
yes | The maximum number of levels in which the graph is clustered and then condensed. |
maxIterations | Integer | 10 |
yes | The maximum number of iterations that the modularity optimization will run for each level. |
tolerance | Float | 0.0001 |
yes | Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns. |
includeIntermediateCommunities | Boolean | false |
yes | Indicates whether to write intermediate communities. If set to false, only the final community is persisted. |
consecutiveIds | Boolean | false |
yes | Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). Cannot be used in combination with the includeIntermediateCommunities flag. |
1.7.3 Examples
Consider the graph created by the following Cypher statement:
CREATE (nAlice:User {name: 'Alice', seed: 42})
CREATE (nBridget:User {name: 'Bridget', seed: 42})
CREATE (nCharles:User {name: 'Charles', seed: 42})
CREATE (nDoug:User {name: 'Doug'})
CREATE (nMark:User {name: 'Mark'})
CREATE (nMichael:User {name: 'Michael'})
CREATE (nAlice)-[:LINK {weight: 1}]->(nBridget)
CREATE (nAlice)-[:LINK {weight: 1}]->(nCharles)
CREATE (nCharles)-[:LINK {weight: 1}]->(nBridget)
CREATE (nAlice)-[:LINK {weight: 5}]->(nDoug)
CREATE (nMark)-[:LINK {weight: 1}]->(nDoug)
CREATE (nMark)-[:LINK {weight: 1}]->(nMichael)
CREATE (nMichael)-[:LINK {weight: 1}]->(nMark);
This graph has two clusters of Users, that are closely connected. Between those clusters there is one single edge. The relationships that connect the nodes in each component have a property weight
which determines the strength of the relationship.
We can now create the graph and store it in the graph catalog. We load the LINK
relationships with orientation set to UNDIRECTED
as this works best with the Louvain algorithm.
The following statement will create the graph and store it in the graph catalog.
CALL gds.graph.create(
'myGraph',
'User',
{
LINK: {
orientation: 'UNDIRECTED'
}
},
{
nodeProperties: 'seed',
relationshipProperties: 'weight'
}
)
In the following examples we will demonstrate using the Louvain algorithm on this graph.
1.7.3.1 Memory estimation
First off, we will estimate the cost of running the algorithm using the estimate
procedure. This can be done with any execution mode. We will use the write
mode in this example. Estimating the algorithm is useful to understand the memory impact that running the algorithm on your graph will have. When you later actually run the algorithm in one of the execution modes the system will perform an estimation. If the estimation shows that there is a very high probability of the execution going over its memory limitations, the execution is prohibited.
The following will estimate the memory requirements for running the algorithm:
CALL gds.louvain.write.estimate('myGraph', { writeProperty: 'community' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
Results
nodeCount | relationshipCount | bytesMin | bytesMax | requiredMemory |
---|---|---|---|---|
6 | 14 | 5321 | 580088 | “[5321 Bytes … 566 KiB]” |
1.7.3.2 Stream
In the stream
execution mode, the algorithm returns the community ID for each node. This allows us to inspect the results directly or post-process them in Cypher without any side effects. For example, we can order the results to find the nodes with the highest betweenness centrality.
The following will run the algorithm and stream results:
CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).name AS name, communityId, intermediateCommunityIds
ORDER BY name ASC
Results
name | communityId | intermediateCommunityIds |
---|---|---|
“Alice” | 2 | null |
“Bridget” | 2 | null |
“Charles” | 2 | null |
“Doug” | 5 | null |
“Mark” | 5 | null |
“Michael” | 5 | null |
We use default values for the procedure configuration parameter. Levels and innerIterations
are set to 10 and the tolerance value is 0.0001. Because we did not set the value of includeIntermediateCommunities
to true
, the column communities is always null
.
1.7.3.3 Stats
In the stats
execution mode, the algorithm returns a single row containing a summary of the algorithm result. In particular, Betweenness Centrality returns the minimum, maximum and sum of all centrality scores. This execution mode does not have any side effects. It can be useful for evaluating algorithm performance by inspecting the computeMillis
return item. In the examples below we will omit returning the timings.
The following will run the algorithm and returns the result in form of statistical and measurement values.
CALL gds.louvain.stats('myGraph')
YIELD communityCount
Results
communityCount |
---|
2 |
1.7.3.4 Mutate
The mutate
execution mode extends the stats
mode with an important side effect: updating the named graph with a new node property containing the community ID for that node. The name of the new property is specified using the mandatory configuration parameter mutateProperty
. The result is a single summary row, similar to stats
, but with some additional metrics. The mutate
mode is especially useful when multiple algorithms are used in conjunction.
The following will run the algorithm and store the results in myGraph
:
CALL gds.louvain.mutate('myGraph', { mutateProperty: 'communityId' })
YIELD communityCount, modularity, modularities
Results
communityCount | modularity | modularities |
---|---|---|
2 | 0.3571428571428571 | [0.3571428571428571] |
In mutate
mode, only a single row is returned by the procedure. The result contains meta information, like the number of identified communities and the modularity values. In contrast to the write
mode the result is written to the GDS in-memory graph instead of the Neo4j database.
1.7.3.5 Write
The write
execution mode extends the stats
mode with an important side effect: writing the community ID for each node as a property to the Neo4j database. The name of the new property is specified using the mandatory configuration parameter writeProperty
. The result is a single summary row, similar to stats
, but with some additional metrics. The write
mode enables directly persisting the results to the database.
The following run the algorithm, and write back results:
CALL gds.louvain.write('myGraph', { writeProperty: 'community' })
YIELD communityCount, modularity, modularities
Results
communityCount | modularity | modularities |
---|---|---|
2 | 0.3571428571428571 | [0.3571428571428571] |
When writing back the results, only a single row is returned by the procedure. The result contains meta information, like the number of identified communities and the modularity values.
1.7.3.6 Weighted
The Louvain algorithm can also run on weighted graphs, taking the given relationship weights into concern when calculating the modularity.
The following will run the algorithm on a weighted graph and stream results:
CALL gds.louvain.stream('myGraph', { relationshipWeightProperty: 'weight' })
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).name AS name, communityId, intermediateCommunityIds
ORDER BY name ASC
Results
name | communityId | intermediateCommunityIds |
---|---|---|
“Alice” | 3 | null |
“Bridget” | 2 | null |
“Charles” | 2 | null |
“Doug” | 3 | null |
“Mark” | 5 | null |
“Michael” | 5 | null |
Using the weighted relationships, we see that Alice
and Doug
have formed their own community, as their link is much stronger than all the others.
1.7.3.7 Seeded
The Louvain algorithm can be run incrementally, by providing a seed property. With the seed property an initial community mapping can be supplied for a subset of the loaded nodes. The algorithm will try to keep the seeded community IDs.
The following will run the algorithm and stream results:
CALL gds.louvain.stream('myGraph', { seedProperty: 'seed' })
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).name AS name, communityId, intermediateCommunityIds
ORDER BY name ASC
Results
name | communityId | intermediateCommunityIds |
---|---|---|
“Alice” | 42 | null |
“Bridget” | 42 | null |
“Charles” | 42 | null |
“Doug” | 47 | null |
“Mark” | 47 | null |
“Michael” | 47 | null |
Using the seeded graph, we see that the community around Alice
keeps its initial community ID of 42
. The other community is assigned a new community ID, which is guaranteed to be larger than the largest seeded community ID. Note that the consecutiveIds
configuration option cannot be used in combination with seeding in order to retain the seeding values.
1.7.3.8 Stream intermediate communities
As described before, Louvain is a hierarchical clustering algorithm. That means that after every clustering step all nodes that belong to the same cluster are reduced to a single node. Relationships between nodes of the same cluster become self-relationships, relationships to nodes of other clusters connect to the clusters representative. This condensed graph is then used to run the next level of clustering. The process is repeated until the clusters are stable.
In order to demonstrate this iterative behavior, we need to construct a more complex graph.
CREATE (a:Node {name: 'a'})
CREATE (b:Node {name: 'b'})
CREATE (c:Node {name: 'c'})
CREATE (d:Node {name: 'd'})
CREATE (e:Node {name: 'e'})
CREATE (f:Node {name: 'f'})
CREATE (g:Node {name: 'g'})
CREATE (h:Node {name: 'h'})
CREATE (i:Node {name: 'i'})
CREATE (j:Node {name: 'j'})
CREATE (k:Node {name: 'k'})
CREATE (l:Node {name: 'l'})
CREATE (m:Node {name: 'm'})
CREATE (n:Node {name: 'n'})
CREATE (x:Node {name: 'x'})
CREATE (a)-[:TYPE]->(b)
CREATE (a)-[:TYPE]->(d)
CREATE (a)-[:TYPE]->(f)
CREATE (b)-[:TYPE]->(d)
CREATE (b)-[:TYPE]->(x)
CREATE (b)-[:TYPE]->(g)
CREATE (b)-[:TYPE]->(e)
CREATE (c)-[:TYPE]->(x)
CREATE (c)-[:TYPE]->(f)
CREATE (d)-[:TYPE]->(k)
CREATE (e)-[:TYPE]->(x)
CREATE (e)-[:TYPE]->(f)
CREATE (e)-[:TYPE]->(h)
CREATE (f)-[:TYPE]->(g)
CREATE (g)-[:TYPE]->(h)
CREATE (h)-[:TYPE]->(i)
CREATE (h)-[:TYPE]->(j)
CREATE (i)-[:TYPE]->(k)
CREATE (j)-[:TYPE]->(k)
CREATE (j)-[:TYPE]->(m)
CREATE (j)-[:TYPE]->(n)
CREATE (k)-[:TYPE]->(m)
CREATE (k)-[:TYPE]->(l)
CREATE (l)-[:TYPE]->(n)
CREATE (m)-[:TYPE]->(n);
The following will load the example graph, run the algorithm and stream results including the intermediate communities:
CALL gds.louvain.stream({
nodeProjection: 'Node',
relationshipProjection: {
TYPE: {
type: 'TYPE',
orientation: 'undirected',
aggregation: 'NONE'
}
},
includeIntermediateCommunities: true
}) YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).name AS name, communityId, intermediateCommunityIds
ORDER BY name ASC
Results
name | communityId | intermediateCommunityIds |
---|---|---|
“a” | 14 | [3, 14] |
“b” | 14 | [3, 14] |
“c” | 14 | [14, 14] |
“d” | 14 | [3, 14] |
“e” | 14 | [14, 14] |
“f” | 14 | [14, 14] |
“g” | 7 | [7, 7] |
“h” | 7 | [7, 7] |
“i” | 7 | [7, 7] |
“j” | 12 | [12, 12] |
“k” | 12 | [12, 12] |
“l” | 12 | [12, 12] |
“m” | 12 | [12, 12] |
“n” | 12 | [12, 12] |
“x” | 14 | [14, 14] |
In this example graph, after the first iteration we see 4 clusters, which in the second iteration are reduced to three.
1.7.4 Exercise
We will compute the Louvain community structure of our pre-loaded graph.
CALL gds.louvain.stream('got-interactions')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS person, communityId
ORDER BY communityId DESC
The above query returns the name of the Person
and the community ID
they belongs to. If we want to investigate how many communities there are, and how many members there are in each community we can change the RETURN statement:
CALL gds.louvain.stream('got-interactions')
YIELD nodeId, communityId
RETURN communityId, COUNT(DISTINCT nodeId) AS members
ORDER BY members DESC
We can see that there are 1382 communities - 11 with more than one member.
Louvain: weighting
The Louvain algorithm can also run on weighted graphs, taking the given relationship weights into concern when calculating the modularity.
Before we continue we need a graph that was created with the weight
relationship property.
CALL gds.graph.create(
'got-weighted-interactions',
'Person',
{
INTERACTS: {
orientation: 'UNDIRECTED',
aggregation: 'NONE',
properties: {
weight: {
property: 'weight',
aggregation: 'NONE',
defaultValue: 0.0
}
}
}
}
)
If a relationship doesn’t have a weight
property the number specified in defaultValue
will be used as a fallback.
We can then use the ‘weight’ property on the INTERACTS relationship and see what happens:
CALL gds.louvain.stream(
'got-weighted-interactions',
{
relationshipWeightProperty: 'weight'
}
)
YIELD nodeId, communityId
RETURN communityId, COUNT(DISTINCT nodeId) AS members
ORDER BY members DESC
This gives us 1384 communities, 13 with more than one member.
Louvain: intermediate communities
One of the cool things about Louvain is that it is hierarchical clustering algorithm. It identifies communities at multiple levels in the graph: first smaller communities, that then combine to form larger ones.
To retrieve the intermediate communities, you can simple set includeIntermediateCommunities: true
:
CALL gds.louvain.stream(
'got-interactions',
{
includeIntermediateCommunities: true
}
)
YIELD nodeId, communityId, communityIds
RETURN communityId, COUNT(DISTINCT nodeId) AS members, communityIds as intermediateCommunities
We can extract membership in different levels of communities and see how the composition changes:
CALL gds.louvain.stream(
'got-interactions',
{
includeIntermediateCommunities: true
}
)
YIELD nodeId, communityIds
RETURN count(distinct communityIds[0]), count(distinct communityIds[1])
includeIntermediateCommunities: false` is the default value, in this case the `communityIds` field of the result is `null
Can you identify nodes that belong to different communities in the first level of hierarchy, but combine to the same community in the next level?
Louvain: cleanup
To clean up the in-memory graph created during the Louvain exercise you can run the following query
CALL gds.graph.drop('got-weighted-interactions');