3

I am brand new to Gremlin and am using gremlin-python to traverse my graph. The graph is made up of many clusters or sub-graphs which are intra-connected, and not inter-connected with any other cluster in the graph.

A simple example of this is a graph with 5 nodes and 3 edges:

  • Customer_1 is connected to CreditCard_A with 1_HasCreditCard_A edge
  • Customer_2 is connected to CreditCard_B with 2_HasCreditCard_B edge
  • Customer_3 is connected to CreditCard_A with 3_HasCreditCard_A edge

I want a query that will return a sub-graph object of all nodes and edges connected (in or out) to the queried node. I can then store this sub-graph as a variable and then run different traversals on it to calculate different things.

This query would need to be recursive as these clusters could be made up of nodes which are many (inward or outward) hops away from each other. There are also many different types of nodes and edges, and they all must be returned.

For example:

  • If I specified Customer_1 in the query, the resulting sub-graph would contain Customer_1, Customer_3, CreditCardA, 1_HasCreditCard_A, and 3_HasCreditCard_A.
  • If I specififed Customer_2, the returned sub-graph would consist of Customer_2, CreditCard_B, 2_HasCreditCard_B.
  • If I queried Customer_3, the exact same subgraph object as returned from the Customer_1 query would be returned.

I have used both Neo4J with Cypher and Dgraph with GraphQL and found this task quite easy in these two langauges, but am struggling a bit more with understanding gremlin.

EDIT:

From, this question, the selected answer should achieve what I want, but without specifying the edge type by changing .both('created') to just .both().

However, the loop syntax: .loop{true}{true} is invalid in Python of course. Is this loop function available in gremlin-python? I cannot find anything.

EDIT 2:

I have tried this and it seems to be working as expected, I think.

g.V(node_id).repeat(bothE().otherV().simplePath()).emit()

Is this a valid solution to what I am looking for? Is it also possible to include the queried node in this result?

KOB
  • 3,062
  • 1
  • 24
  • 60

1 Answers1

2

Regarding the second edit, this looks like a valid solution that returns all the vertices connected to the starting vertex. Some small fixes:

  • you can change the bothE().otherV() to both()
  • if you want to get also the starting vertex you need to move the emit step before the repeat
  • I would add a dedup step to remove all duplicate vertices (can be more than 1 path to a vertex)
g.V(node_id).emit().repeat(both().simplePath()).dedup()

exmaple: https://gremlify.com/jngpuy3dwg9

noam621
  • 2,518
  • 1
  • 13
  • 24
  • Great, this looks like it returns the correct result, but when I run the following two queries to 1. find the total number of `Customer` nodes in the entire graph and 2. find the total number of `Customer` nodes in the returned subgraph, the results are 3 (correct) and 21 (obviously not what I want) respectively. 1. `print(g.V().hasLabel("Customer").count().next()) # returns 3` 2. `cluster = g.V(node_id).emit().repeat(both().simplePath()).dedup()` `print(cluster.V().hasLabel("Customer").count().next()) # returns 21` – KOB Aug 20 '20 at 09:04
  • To explain in more detail, what I would want to achieve is for the query in your answer to return an entire sub-graph object that I can then run queries and traversals on, just as if it was the original, entire graph itself being queried. Is this possible? In otherwords, is it possible to create a graph object from the result of a traversal? – KOB Aug 20 '20 at 09:14
  • @KOB If you are working with Tinkerpop you can save the subgraph by using the `subgraph` step as described here: http://tinkerpop.apache.org/docs/current/reference/#subgraph-step. you can also take the ids of the vertices in the first query and use them in the second query, add id() after the `dedup` and use the ids in the `V(ids).hasLabel("Customer").count().next()` – noam621 Aug 20 '20 at 11:01