### Better Know a Theorem: LP Duality

25Aug07

This better-know-a-theorem entry requires a bit of terminology and a story… but it is kind of neat because my actual research is closer to this stuff than any of my previous b-k-a-t posts. (I mean in an abstract sense, I have not actually gone to the fruit market as part of my classes.)

A linear program is one of the simplest kinds of optimization problems. In such a problem, there are a number of variables a, b, c, d, … that you have to assign values (a solution is an assignment of a value to each variable). You want to find the “best” “valid” solution — and the specific meaning of “best” and “valid” is where the word linear comes into play.

A linear expression is a sum of many terms of the form <constant number>*<variable>. For example, 5a + 3b + 2c. In a linear program, validity is determined by constraints of the form <linear expression> ≤ <constant number>; a solution is valid if it meets all constraints. Similarly, the goodness of a solution is measured by another linear expression; it is called the objective function and depending on the problem, we want either to maximize it or minimize it.

An example is usually helpful to see that this mathematical mumbo-jumbo is actually useful. Suppose that you run a fruit salad company, the 100% All Natural Good Time Fruit Salad Conglomerate. Each day at the market, the prices of fruit vary and you want to make your salad as cheaply as possible, but it needs to satisfy certain nutritional requirements. These can be written as a linear objective function and linear constraints. For example,

• Suppose that apples cost \$4/kg, bananas cost \$3/kg, cherries cost \$6/kg and dragonfruit costs \$10/kg today. We will use a to represent the number of kilograms of apples that we will buy, and b for bananas, etc. The objective is to minimize the total cost 4a + 3b + 6c + 10d.
• We need to make 100 kilograms of fruit salad. This leads to the constraint a + b + c + d = 100.
• Suppose that each kilogram of fruit salad needs to have at least 8 mg of vitamin X, and that the vitamin X content of apples is 6 mg/kg, of bananas is 7 mg/kg, etc. We get the constraint 6a + 7b + 9c + 20d ≥ 800. (The 800 represents 8 mg/kg times the 100 kg we are producing.)

Going through the list of necessary vitamins, we end up with a linear program along the following lines.

minimize 4a + 3b + 6c + 10d
subject to the constraints
(C1) a + b + c + d = 100
(C2) 6a + 7b + 9c + 20d ≥ 800
(C3) 8a + b + 2c + 4d ≥ 300
(C4) 3a + 5b + 8c ≥ 400
(C5) a, b, c, d ≥ 0

There are many valid solutions to this problem. E.g., you can check that a=0, b=0, c=50, d=50 (making the salad half-cherry and half-dragonfruit) is valid. However, we need more than a valid solution, we want the cheapest valid solution. Whereas the solution (0, 0, 50, 50) we mentioned has cost \$800, another valid solution (25, 25, 25, 25) has better cost \$575. We can’t just check all solutions… how can we be sure that we have the cheapest one once we find it?

This is where LP Duality comes into play. The idea is to obtain lower bounds on the optimum cost by mixing the constraints. For example, we know that every valid solution satisfies (C2) “6a + 7b + 9c + 20d ≥ 800″ and (C3) “8a + b + 2c + 4d ≥ 300″. Since the sum of two valid inequalities yields another valid inequality, every valid solution also satisfies (C2 + C3) “14a + 8b + 11c + 24d ≥ 1100″. I’ll call this a mix of inequalities. We can take multiples too, and in the case of equality constraints like C1, subtract; e.g., every valid solution satisfies the mix (C2*0.5 + C3*0.5-3*C1) “4a + b + 2.5c + 9d ≥ 250″. However, comparing this last inequality with the objective function we see that

The price 4a + 3b + 6c + 10d of any valid solution is ≥ the expression 4a + b + 2.5c + 9d which is ≥ 250.

Hence, the cheapest valid fruit combination is at least \$250. By mixing the constraints in different ways we can obtain better lower bounds. But, just like finding cheap solutions, when do we know that we can stop?

The Theorem of Strong LP Duality. The best lower bound that you can obtain from a mix is the same as the cost of the cheapest valid solution.

Hence, once you can find the best solution and the best mix of inequalities, you have a short proof that your solution is correct; namely, if OPT represents the cost of the best solution, then there is a mix that proves every valid solution has cost at least OPT — and this proves that you can’t improve upon your solution.

The problem of actually finding the best solution and the best mix is something that will have to wait for another post. For this particular example, using Maple I found that the best solution is (a = 1150/47, b = 3100/47, c = 0, d = 450/47) which has cost \$18400/47 ~ \$391.49, and the best mix is (23/47*C2 + 10/47*C3 – 30/47*C1) which works out to

4a + 3b + 197/47c + 10d  ≥ 18400/47.

So, as you can see, the lower bound provided by the best mix matches the cost of the optimal solution.