### Better Know a Theorem: The Greedy Algorithm

27Jun07

Consider the following network design problem. Schmoogle‘s 100 offices want to be able to stream huge amounts of data to each other (e.g., to broadcast talks to all their offices). In order to do this they need to connect all of their offices with new Phlogiston-net cabling. A single cable can be used to connect any pair of offices, and they can figure out the cost of buying any cable beforehand (the cost depends on which pair we’re connecting, due to their distance, geographic factors, etc).

What’s the cheapest way to buy cables so that data can be sent from any office to any other other office? Note, even if it takes multiple steps to send data from office A to office B, e.g., using offices C, D, E along the way (like a multi-leg flight), Schmoogle is happy. We’ll say two offices are connected if data can be sent between them.

The first step is to notice that the network should have no cycles of cables, e.g., we would not want cables connecting pairs (A, B), (B, C), (C, D), (D, E), (E, A) of offices, since we can remove any one of these cables and everything would still be connected. This means that our desired cheapest network is a tree.

Kruskal in 1956 proposed the following solution:

while not everything is connected,

purchase the cheapest cable amongst those between two offices not yet connected.

This algorithm is commonly called greedy since it always does the cheapest thing at the time, without worrying about later consequences. Amazingly, it correctly computes the cheapest possible overall network! In general, greedy algorithms do not always work optimally (e.g., pick the largest set of offices pairwise separated by at least 1000 km). For other things called matroids greedy algorithms are correct; strangely, there are a lot of matroids in real life and so the greedy algorithm has many applications beyond network design.