#data structures #compilers

dfs and bfs in the wild

I’ve been building a static analysis tool that detects concurrency bugs in Go programs. One of the rules it enforces is if a goroutine loops over a channel using for range, that channel needs to be closed at some point, otherwise the goroutine will block forever and leak.

For example:


go func() {

for range ch { }

}()

To check whether close(ch) actually happens, the tool looks at the code’s control flow graph (CFG). This post explains them in slightly more detail. A CFG is just a map of how the program executes: it breaks a function into basic blocks (straight-line sequences of instructions) and draws arrows between them wherever the code can branch.

My first attempt was a simple BFS over this graph. Starting from the goroutine launch, I checked whether there was any path through the graph that eventually reached a close(ch). If so, I assumed the channel was properly closed.

This approach worked until I tested it against something like this:


func f(cond bool) {

	ch := make(chan int)
	
	go func() {
	
		for range ch { }
		
	}()

	if cond {
	
		close(ch)
	
	}

}

BFS found a path to close(ch) and concluded the channel was safe. It ignored the branch where cond is false and the function returns without ever closing the channel. That goroutine leaks, and the tool said nothing.

The problem is that BFS stops as soon as it finds what it’s looking for. It found one safe path and called it done. But I didn’t need to know whether a close was reachable on some path. I needed to know whether it was guaranteed on every path.

That’s a different question, and it calls for DFS. Using DFS, I traced every path from the goroutine launch to the end of the function. If any path reached the end without passing through close(ch) first, the channel was not reliably closed.


var dfs func(node int) bool

dfs = func(n int) bool {

	if n == closeBlock {
	
	return false // this path is safe, stop here
	
	}

if len(successors[n]) == 0 {

	return true // reached the end without closing, this is the leak

}

for _, next := range successors[n] {

	if dfs(next) {
	
		return true
	
	}

}

	return false
	
	}

DFS is well suited here because it commits to exploring a single path all the way to its endpoint before backtracking. That commitment is exactly what makes it possible to say “no path escapes” rather than merely “some path is safe.”

For the conditional example, DFS traces the if cond branch and finds close(ch) — fine, that path is safe. Then it backs up and traces the else branch, reaches the function exit, and notices it never closed the channel. That’s the leak. The tool flags it.

Often times, data structures and algorithms feel abit too academic or theretical, but it’s always fun when you need to reach for them in practice. Most of the time the graph is implicit in the structure of the problem, not handed to you as an adjacency list. Recognizing that the CFG was a graph, and that “guaranteed execution” was a path coverage question, was the step that made the right algorithm obvious.

#datastructures