Why are edges required in a Relay/GraphQL Connection?

Question

In a Relay/GraphQL schema configuration, one-to-many relationships (with pagination) are specified as in the tutorial example

type ShipConnection {
  edges: [ShipEdge]
  pageInfo: PageInfo!
}
type ShipEdge {
  cursor: String!
  node: Ship
}

However, the one-to-one connection made by ShipEdge seems redundant. Why can't we move the cursor to ShipConnection and store an array of Ship IDs as edges?

type ShipConnection {
  edges: [Ship]
  pageInfo: PageInfo!
  cursor: String!
}

What were the design decisions to require one extra object for every edge in a one-to-many relationship?

Petr Bela · Answer 1 · 2021-03-21T08:53:14.827

11

(Updated with more explanations)

There are 3 ways to represent an array of data in GraphQL:

List: Use when you have a finite list of associated objects that you're fine fetching all at once. In GraphQL SDL, this is represented as [Ship].
Nodes: Use when you need to paginate over a list, usually because there can be thousands of items. Note that this is not part of the Relay specification and as such is not supported by the Relay client (instead, you'd wrap the item in an edge as described in #3), but some other clients such as Apollo are more flexible and support this construct (but you need to provide more boilerplate). In GraphQL, this would be represented as type ShipConnection { nodes: [Ship], pageInfo: PageInfo! }.
Edges: Use when, in addition to pagination, you also need to provide extra information for each edge in the connection (read below for more details). In GraphQL, you'd write it as type ShipConnection { edges: [ShipEdge], pageInfo: PageInfo! }.

Note that your GraphQL server might support all three options for a specific association, and the client then selects which field they want. Here's how they'd all look together:

type Query {
  ships: [Ship]       // #1
  shipsConnection: [ShipConnection]
}

type ShipConnection {
  nodes: [Ship]       // #2
  edges: [ShipEdge]   // #3
  pageInfo: PageInfo!
}

type PageInfo {
  endCursor           // page-based pagination
  hasNextPage
}

type ShipEdge {
  cursor: String!     // edge-based pagination
  node: Ship
  // ... edge attributes
}

type Ship {
  // ... ship attributes
}

Lists (#1) should only ever be used when you know that the number of items won't grow (for example, if you have a Post, you may want to return tags as a List, but you shouldn't do that with comments). To decide between #2 and #3, there are two reasons for using edges over just plain nodes:

It's a place for edge-specific attributes. For example, if you have a User that belongs to many Groups, in a relational database you'd have a UserGroup table with user_id and group_id. This table can have additional attributes like role, joined_at etc. The GroupUserEdge would then be the place where you could access these attributes.
Have a place for the cursor. Relay, in addition to page-based pagination (using pageInfo) supports edge-based pagination. Why does Relay need a cursor for each edge? Because Relay intelligently merges data requirements from your entire app, it may already have a connection with the same parameters you're requesting but not enough records in it. To fetch the missing data, it can ask for data in the connection after some edge's cursor.

I understand it may be confusing, considering databases have cursors, too, and there is only one cursor per query. A Relay connection is not a query really, it's rather a set of parameters that identify a query. A cursor of connection's edge is a set of parameters that identify a position within a connection. This is a higher abstraction level than a pure query cursor (remember that edges need to be able to identify a position even over a connection that might not be a DB query, or be hidden by a 3rd party system). Because of this required flexibility, one cursor for a connection would not be enough.

edited Mar 21 '21 at 08:53

answered Dec 10 '15 at 03:44

Petr Bela

7,368
2
25
34

Yes, in the "ship" scenario you might want `createdAt` and `color` on the ship itself; I was just giving those as abstract examples of field names. Note that in some domains you could have multiple edges pointing to the same node, and you might want to know when each edge (in the graph sense) was added and so would use `createdAt`. I was using `color` as a generic property name, but you could think of other things that might describe the nature of the edge. such as `weight` (how important the edge is) or `creator` (who established the link) etc. I'll edit my answer to avoid this confusion. – wincent Jan 25 '16 at 22:58
1

This is a helpful answer but I still can't imagine when relay would need to fetch data using a cursor from the middle of a connection. In the situation where you have a "connection with the same parameters you're requesting but not enough records in it" a cursor for the last edge would suffice. – SamBarnes May 13 '17 at 09:13
2

An example off the top of my head: You fetch a list of comments but then the last comment is deleted. So to fetch next batch of comments, you need to start from the currently-last cursor. I'm sure there are many more use cases. The point is, Relay tries to be as generic as possible and robust enough to manage whatever happens to the data. – Petr Bela May 14 '17 at 10:56
@PetrBela When you do keyset pagination you are not affected by a deleted record. I don't see why you would need the previous comments cursor in order to fetch the next page. – Massimo Fazzolari Mar 19 '21 at 06:37
@MassimoFazzolari Yeah I guess my previous example wasn't the best. However, the point of having a cursor (for both offset and keyset pagination) is still the same. The question is why does Relay mandate that each node has a cursor when the whole connection already has a cursor? Perhaps to account for using the same connection in two different components, one paginating by 3 and the other by 10 items? Might be an edge case but Relay would still be able to handle it. (This is probably used in Facebook comments which initially show like top 3 but then you expand it, it adds 8 or so etc.) – Petr Bela Mar 19 '21 at 13:50
@PetrBela As far as I understand it you get a slight optimisation when you have a component asking for 3 items but you already loaded the first 10. But your Facebook example wouldn't benefit of that because you don't want to load 10 items in cache if you only need to show 3 in most cases. I still don't see any real use case where edges are useful. – Massimo Fazzolari Mar 20 '21 at 09:08
@MassimoFazzolari If you see 3 comments on the news feed and then go to the detail page which shows 10, then Relay can just fetch the 7 new ones. I probably can't explain any more than that since I didn't author it but this is how I understand it. FB has a lot of specific use cases and optimizations that are probably not worth the complexity in most projects, and this seems to be one of those. (You may note that Apollo doesn't deal with this at all as they didn't deem it common enough.) – Petr Bela Mar 21 '21 at 08:18
@MassimoFazzolari Edges are still useful, though, since you can attach edge-specific data, which I've hopefully explained in my answer. Edge cursors, on the other hand, have more of a theoretical value which might not be strictly needed in most projects but since the Relay client works that way, it still requires them. – Petr Bela Mar 21 '21 at 08:22
@PetrBela Could you point any real-world API that attach edge-specific data? Shopify and Github don't use that. Also attaching data to edges means that you need to have different Connection Types for each model, which in my opinion makes your code less reusable. Edges and cursors for each node look to me like a classic example of over-engineering. – Massimo Fazzolari Mar 24 '21 at 15:43
@MassimoFazzolari I don't have a list of who uses edge data in their APIs. I was just trying to explain what they can be used for and why Relay requires this structure. And yes, in practice, in 95% of cases you won't need edges, but the authors of Relay decided to cover the theoretical 5% by writing a spec that covers them (and, most likely, that 5% is used in the FB codebase). I'd end the discussion here since I'm not an author of the spec and don't have any more info beyond what I've just speculated. – Petr Bela Mar 25 '21 at 16:58
1

On the last note, I'd add that I did use edge data in one of my APIs. However, I found that in practice it's easier to convert "relationship tables" to standalone entities, as they're nicer to work with. In other words, instead of orgs -> org_users -> users tables, where the `Org` type has a users connection with the org_user being the edge, it's better to have orgs -> members -> users tables, where the `Org` type has a members connection, and each `Member` has an associated `User`. – Petr Bela Mar 25 '21 at 17:03

score 9 · Accepted Answer · edited May 23 '17 at 11:47

The edges field provides you with a place to put per-edge data. For example, you might want to put a creator or priority field on there, describing who added the edge and how important the relationship is, respectively.

If you don't require this kind of flexibility (or the other features that you get with connections, such as pagination), you could use a simple GraphQLList type. See this answer for more on the difference between between connections and lists.

score 5 · Answer 3 · edited Apr 01 '21 at 13:43

5

We've written a blog article about the differences between a simple GraphQL schema vs a Relay-specific schema:

https://www.prisma.io/blog/connections-edges-nodes-in-relay-758d358aa4c7

edited Apr 01 '21 at 13:43

Community

1
1

answered Jun 02 '16 at 12:46

schickling

3,046
4
25
30

Why are edges required in a Relay/GraphQL Connection?

3 Answers3