Detecting whether a tree-like data structure contains a cycle

Raymond Chen

Suppose you are given a tree-like data structure, in the sense that you are given a root node, and each node has children. You want to validate that the data structure indeed represents a tree, with no cycles.

You can actually solve this problem by combining two things you already know how to do.

First, you can walk the tree and generate elements. If the necessary node-traversal methods are available, You can use one of the iterative tree-walking algorithms. Otherwise, you can use a traditional depth-first search with a stack. This converts the tree-like structure into a linear list.

Next, you can use Floyd’s two-pointer algorithm for finding a cycle in a linked list. This will tell you whether your tree loops back on itself.

Bingo, problem solved by combining two things you already know.

Bonus chatter: If you have to manage an explicit stack, then the memory usage can reach twice the number of nodes in the worst case. Alternatively, you can build a hash table of nodes that have been visited so far, and stop if you ever revisit a node before reaching the end of the tree. This caps the memory usage to the number of nodes.

Bonus bonus chatter: If you know the total number N of nodes, say, say because it’s recorded in the file format, or you can count them while you load the file, then you don’t need to use Floyd’s two-pointer cycle-finding algorithm. You can just walk through the tree until you either (1) reach the end, or (2) have walked through N nodes. Because once you reach node N + 1, you know that you must have revisited a node due to the pigeonhole principle, and therefore found a cycle.

2 comments

Discussion is closed. Login to edit/delete existing comments.

  • Kārlis Gaņģis 0

    The more proper approach for finding cycles (and one that can properly tell where is the cycle, not just that there is one) is Tarjan’s strongly connected components algorithm

  • Marek Knápek 0

    Or you can leverage the fact that most (all?) data structures are aligned in memory. Tree node will be always allocated at address ending with one or more (binary) zeros. Meaning child pointers will have zeros at the least significant end. Meaning you can use one of the tree walking algorithms marking the pointers you followed by putting one into them. If you ever find child pointer with one in it, you have cycle/graph. If you walk all the nodes without encounterig marked pointer, you have tree. Don’t forget to unmark all the pointers. This obviously doesn’t work in multithreaded environment. Doesn’t work in cases where child pointers are not pointers but integers indexing into some kind of vector. And probably in other cases.

Feedback usabilla icon