# Postgresql – Recursively find the path to all leaves descending from a given parent in a tree

graphhierarchypostgresqlrecursivetree

I'm trying to write a query to identify the path to each furthest descendant of a particular parent node in a table of tree structures like:

    0     1
|     |
2     3
|     |
4     5
/ \    |
*6*  8  *7*
|
*9*


There are many parents, all children have one parent, parents have 0-5 children (but the graph is very "long" – most parents only have one child). There are no cycles.

I'm trying to efficiently identify the path to the furthest descendants of a specific node (and not to any intermediate nodes). E.g. in the above:

• get_leaf_paths(1) would return 1 row: {1, 3, 5, 7}
• get_leaf_paths(2) would return 2 rows: {2, 4, 6} and {2, 4, 8, 9}

Sample table:

CREATE TABLE graph (
id bigint PRIMARY KEY,
parent_id bigint,
FOREIGN KEY (parent_id) REFERENCES graph(id)
);
INSERT INTO graph (id, parent_id)
VALUES (0, NULL),
(1, NULL),
(2, 0),
(3, 1),
(4, 2),
(5, 3),
(6, 4),
(7, 5),
(8, 4),
(9, 8);


I'm hoping for a result that looks something like:

SELECT get_leaf_paths.* FROM get_leaf_paths(0);
path
-----
{0, 2, 4, 6}
{0, 2, 4, 8, 9}
(2 rows)


In my initial attempt at a function with a recursive query, I've had trouble selecting only the furthest leaves, especially since some branches are shorter than others (e.g. 6 and 9 above are at different depths). The paths can be very deep (thousands or millions of elements), so I would also like to avoid the excessive memory usage of generating paths for every intermediate node.

Any ideas would be greatly appreciated. Thanks!

WITH RECURSIVE
cte AS ( SELECT id,
parent_id,
id::TEXT path,
NOT EXISTS ( SELECT NULL
FROM graph gr
WHERE graph.id = gr.parent_id ) is_leaf
FROM graph
WHERE id = 2 /* initital node id */
UNION ALL
SELECT graph.id,
graph.parent_id,
cte.path || ',' || graph.id,
NOT EXISTS ( SELECT NULL
FROM graph gr
WHERE graph.id = gr.parent_id )
FROM cte JOIN graph ON cte.id = graph.parent_id)
SELECT path
FROM cte
WHERE is_leaf


https://dbfiddle.uk/?rdbms=postgres_12&fiddle=2e40ff454f302033bf5e8cba8b0b0d85

Multiple initial nodes may be applied too (WHERE id IN (0, 1)).