Best practices for Querying graphs by edge and node attributes in NetworkX

Question

Using NetworkX, and new to the library, for a social network analysis query. By Query, I mean select/create subgraphs by attributes of both edges nodes where the edges create a path, and nodes contain attributes. The graph is using a MultiDiGraph of the form

G2 = nx.MultiDiGraph()
G2.add_node( "UserA", { "type" :"Cat" } )
G2.add_node( "UserB", { "type" :"Dog" } )
G2.add_node( "UserC", { "type" :"Mouse" } )
G2.add_node( "Likes", { "type" :"Feeling" } )
G2.add_node( "Hates", { "type" :"Feeling" } )

G2.add_edge( "UserA", 'Hates' ,  statementid="1" )
G2.add_edge( "Hates", 'UserB' ,  statementid="1"  )
G2.add_edge( "UserC", 'Hates' ,  statementid="2" )
G2.add_edge( "Hates", 'UserA' ,  statementid="2"  )
G2.add_edge( "UserB", 'Hates' ,  statementid="3"  )
G2.add_edge( "Hates", 'UserA' ,  statementid="3"  )
G2.add_edge( "UserC", 'Likes' ,  statementid="3"  )
G2.add_edge( "Likes", 'UserB' ,  statementid="3"  )

Queried with

for node,data in G2.nodes_iter(data=True):
    if ( data['type'] == "Cat" ):
       # get all edges out from these nodes
            #then recursively follow using a filter for a specific statement_id

#or get all edges with a specific statement id
   # look for  with a node attribute of "cat"

Is there a better way to query? Or is it best practice to create custom iterations to create subgraphs?

Alternatively (and a separate question), the Graph could be simplified, but I'm not using the below graph because the "hates" type objects will have predcessors. Would this make querying simpler? Seems easier to iterate over nodes

G3 = nx.MultiDiGraph()
G3.add_node( "UserA", { "type" :"Cat" } )
G3.add_node( "UserB", { "type" :"Dog" } )

G3.add_edge( "UserA", 'UserB' ,  statementid="1" , label="hates")
G3.add_edge( "UserA", 'UserB' ,  statementid="2" , label="hates")

Other notes:

Perhaps add_path adds an identifier to the path created?
iGraph has a nice query feature g.vs.select()

Aric · Answer 1 · 2017-12-27T18:40:06.853

It's pretty straightforward to write a one-liner to make a list or generator of nodes with a specific property (generators shown here)

import networkx as nx

G = nx.Graph()
G.add_node(1, label='one')
G.add_node(2, label='fish')
G.add_node(3, label='two')
G.add_node(4, label='fish')

# method 1
fish = (n for n in G if G.node[n]['label']=='fish')
# method 2
fish2 = (n for n,d in G.nodes(data=True) if d['label']=='fish')

print(list(fish))
print(list(fish2))

G.add_edge(1,2,color='red')
G.add_edge(2,3,color='blue')

red = ((u,v) for u,v,d in G.edges(data=True) if d['color']=='red')

print(list(red))

If your graph is large and fixed and you want to do fast lookups you could make a "reverse dictionary" of the attributes like this,

labels = {}
for n, d in G.nodes(data=True):
    l = d['label']
    labels[l] = labels.get(l, [])
    labels[l].append(n)
print labels

The examples appear to provide a good way to look up either nodes or edges. But to look up a combination of nodes and edges? In your example imagine the query. "Return the subgraph of fish nodes that also have edge with an attribute of "Color=red". Is there also a one liner to query both, and search through the subgraphs? e.g. does edges_iter return both nodes and edges? — Jonathan Hendler, Mar 27 '13 at 04:35

score 11 · Accepted Answer · edited May 23 '17 at 12:18

11

Building on @Aric's answer, you can find red fish like this:

red_fish = set(n for u,v,d in G.edges_iter(data=True)
               if d['color']=='red'
               for n in (u, v)
               if G.node[n]['label']=='fish')

print(red_fish)
# set([2])

edited May 23 '17 at 12:18

Community

1
1

answered Mar 27 '13 at 10:17

unutbu

711,858
148
1,594
1,547

daedalus · Answer 3 · 2013-03-27T09:44:16.827

In order to select edges based on attributes of edges AND nodes, you may want to do something like this, using your graph, G2:

def select(G2, query):
    '''Call the query for each edge, return list of matches'''
    result = []
    for u,v,d in G2.edges(data=True):
        if query(u,v,d):
            result.append([(u,v)])
    return result

# Example query functions
# Each assumes that it receives two nodes (u,v) and 
# the data (d) for an edge 

def dog_feeling(u, v, d):
    return (d['statementid'] == "3" 
            and G2.node[u]['type'] == "Dog"
            or G2.node[u]['type'] == "Dog")

def any_feeling(u,v,d):
    return (d['statementid'] == "3" 
            and G2.node[u]['type'] == "Feeling"
            or G2.node[u]['type'] == "Feeling")

def cat_feeling(u,v,d):
    return (G2.node[u]['type'] == "Cat"
            or G2.node[v]['type'] == "Cat")

# Using the queries
print select(G2, query = dog_feeling)
print select(G2, query = any_feeling)
print select(G2, query = cat_feeling)

This abstracts away the iteration process into the select() function and you can write your queries as individual, testable functions.

Best practices for Querying graphs by edge and node attributes in NetworkX

3 Answers3

Linked

Related