Archive

Archive for the ‘b.k.a.t.’ Category

Fold-Fashioned

January 12, 2012 1 comment

Continuing on last week’s post, there were a couple of hands-on activities led by Joseph O’Rourke at the geometry minicourse that I attended. In one of them, you start by printing out the following sheet of paper (available at howtofoldit.org):

The red and green lines indicate mountain and valley folds and at this point your job is to follow all of the folds, like in origami. With a little bit of playing around it will look like the following. The left shape is simply creased, and the dreidel-shaped object at right is the fully-folded version.

As an aside, the hands-on activities were a fantastic part of the mini-course. One trivial reason is that it was a long break from lectures. But a deeper thing that happened is that people were struggling, being confused, eventually gaining some insight and getting it to work, and then playing around! A good lecture is very seductive but there’s a limitation to the lecture format in that, without exercising your own mental muscles, it is difficult for the material to “stick” or hook in to your existing corpus of knowledge. In my Game Theory courses/math circles there’s the opportunity to actually play games which was fun. But for me, this was the first time doing origami in a mathematical setting.

Anyway, back to the task at hand. If you notice, there was a black line outlining the original A. If you have done your job correctly, all of that line will be neatly folded so that with one scissor-cut, you will chop the outline of the A, as well as the hole in the centre!

Not too bad, eh?

Unrelated joke. When they were naming the northernmost part of North America, the founders were very much inspired by the USA. It would be so great to have a three-letter acronym like USA! So they put all of the letters of the alphabet in to a hat, and the mayor pulled them out, one by one. “C, eh? N, eh?, D, eh?” And so, Canada was born.

Inspired by the alphabetic shenanigans, I tried to do a free-hand maple leaf afterwards. Here is how it turned out (left):

Is it a recognizable attempt? I would like to think so. But there is a more important question here. What if we also wanted to cut out the red bars on either side of the maple leaf? What if we wanted to do the US flag with all of its stars and stripes? How about an arbitrary line drawing? This elegant question was solved, then discovered to be buggy, and then fixed by two pairs of researchers (history here):

Theorem (Bern-Hayes ’09, Demaine-O’Rourke ’07). For any planar graph (made of straight line segments) embedded in a rectangle, there is a way to fold the rectangle, so that a single straight-line cut will exactly cut out that graph.

Pretty amazing! The proof is quite heavy, as indicated by the bug. At the same time, the methods are quite elegant, and there are two different proof methods.

Let me just briefly mention the other activity we participated in. Take an ordinary rectangular piece of paper. You can fold the left-centre point on to the right-centre point, and get a rectangle. Similarly, folding the top-centre point on to the bottom-centre point gives a different rectangle. But you can fold other shapes, in fact 3D shapes, out of this humble piece of paper! Start by folding any boundary point X on to the boundary point Y that is directly opposite. Then, you should think of the remaining edges as a “zipper” and tape them together both ways from the X-Y point. With a little patience, the paper will turn into a 3D polyhedron with triangular faces.

Categories: b.k.a.t., geometry

The Joy of Randomness

November 2, 2011 3 comments

Ravi Kannan gave a distinguished lecture today at Georgia Tech covering a wide swath of algorithms, randomness, and high-dimensional geometry. There was one particularly excellent slide that gave a very intuitive explanation for a result I have seen, but not fully understood, before.

The result is in the field of streaming algorithms. Specifically, there is a “streaming” data set which consists of a large number m of elements, where each element is some b-bit string — actually it is more convenient to think of each element as integer between 1 and n := 2b. You only get to see one element from the stream at a time, do a computation and change your working memory, and then the next one arrives. We want to capture some statistical properties of this data set. Storing the entire data set could be done easily in either log n bits, or n log m bits. But what if this is much more memory than our computer has available? (E.g. if these are a continuous stream of data points from a sensor, or requests on a web server, or Google’s cache of the internet.) We would like to say something useful about the data set using only polylog(m, n) memory.

First, an easier puzzle. Let’s say that we don’t precisely know in advance the total number of items (including repeats) m that will arrive, and want to find m out. (For the puzzle, we don’t care about the items’ contents.) Evidently we can just count this in a streaming manner using log bits of memory.  Can we do it with less? More precisely, can we come up with a good estimate of m using only O(log log m) bits of memory? I give the answer at the end.

Part of a Stream

Back to Ravi’s talk. The particular statistic we’d like to count is the second moment. For each string i, let fi be the frequency of that string, or in other words the total number of times it appears in the data set. Then the second moment is defined as \sum_{i} f^2_i. Put differently, the second moment is the probability that two random elements from the stream are the same, up to a factor of n2. How can we estimate this value?

Ravi’s explanation of how you could imagine doing this was very nice. Think of f as a vector in high-dimensional space. What we want to measure is (the square of) the Euclidean length of f, but we don’t have the space to explicitly store all of the bits of f in memory. The streaming model amounts to discovering the vector f in unit steps. The key insight in his explanation is to use the projection of f on to a random d-dimensional subspace where d is logarithmic; analogous to the Johnson-Lindenstrauss lemma, the random length of the projection actually gives us a pretty good estimate of the original length. Moreover, the projection of the sum of the steps is the sum of the projections of the steps. So it suffices to keep a running (vector) total of the sum of the projections of the steps.

A possible problem in implementing this is that projecting on to a d-dimensional subspace amounts to multiplying each incoming item (n-dimensional unit basis “step” vector) by a n-by-d matrix, and we cannot store anything of length n. To get around this, we let our n-by-d matrix actually be pseudorandom — that is to say, there should be some small random “seed” so that a computer program can determine the n-by-d matrix from just the seed. In this way, we don’t need to store the whole matrix in memory, rather just the seed, so that we can just recompute the ith row each time an element i shows up. In this way we can keep a running count of the d-dimensional vector sum and complete the idea.

This result is due to Alon, Matias, and Szegedy, which I have heard called one of the first papers in streaming computation (1996). The paper is nice to read and gives a clean explanation where you project on to random ±1 vectors in a derandomizable way. But this intuition helps me follow the proof idea for second moments (section 2.2). I read it previously (without this intuition), back when I was working on my Master’s thesis, because I had looked at an earlier paper of Morris for estimating the first moment… and this brings me to the puzzle’s solution.

***

You may have already deduced that you would need randomness to solve the puzzle, because with only O(log log m) bits of memory, our counter could only have polylog(m) states, and so only count polylog(m) items reliably.

So, let’s say that we have a “log counter” instead. The log counter stores a value k, such that our running estimate of the number of items seen so far is 2k. Then, we maintain this counter using the probabilistic method: if the counter currently reads k, when the next item shows up, only increment the counter with probability 2-k. So, the counter reads 1 after the first item arrives; the expected time until it reads 2 is three items; in general the expected time until it reads j is 2j-1 items. You would need some more formal calculations to see exactly how good of an estimate it is, but this is the main idea.

The paper of Alon et al has a lot of other nice stuff. We saw small-space approximations for the first and second moments; they also formalize a previous idea for the 0th moment (counting distinct items) and show that some higher moments do not admit any small-space approximation. For recent work on this, not a bad place to start is to look at the work of Kane & Nelson et al.!

Better Know Your Metroids

October 17, 2011 3 comments
  • Metroid. Introduced in 1986  by Gunpei Yokoi and Satoru Okada. Name is a portmanteau of metro and android (android is a combination of andr-, man, and -oid, likeness). Metroids are fictional jellyfish-like creatures with quadripartite nuclei. They are chased by the bounty hunter Samus. Killing them can be done with the help of the ice beam.

    A metroid in space. Source: Super Metroid instruction manual, Nintendo, page 35.

  • Metroid. Introduced in 1986 by A. Dress and T. F. Havel. Name is a portmanteau of matroid and metric (matroid is a combination of matrix and -oid, likeness). Metroids are a subclass of all polyhedra defined by constraints with {0, +1, -1} coefficients. They are totally dual integral. Optimizing linear functions over them can be done with the help of a greedy algorithm.

A metroid in the plane. Source: Submodular Functions and Optimization, Satoru Fujishige, 2nd ed, page 107.

Categories: b.k.a.t., nintendo

Better Know A Theorem: Online Matching

October 4, 2011 2 comments

I saw a nice result sketched a few weeks ago, on the “online matching” problem. Below I try to re-explain the result, using some idiomatic (and as a bonus, inoffensive) terminology which I find makes it easier to remember what’s going on.

Online Matching. There is a set of items, call them Lunches, which you want to get eaten. There is a set of people who will arrive in a sequence, call them Diners. Each Diner is willing to eat certain Lunches but not others. Once a Diner shows up, you can give them any remaining Lunch they like, but that Lunch cannot be re-sold again. The prototypical problem is that a Diner might show up but you already sold all of the Lunches that they like. Your task: find a (randomized) strategy to maximize the (expected) number of Lunches sold, relative to the maximum Lunch-Diner matching size.

This is an online problem: you don’t have all the information about Diner preferences before the algorithm begins, and you need to start committing to decisions before you know everyone’s preferences in full.

Consider the following strategy:

Algorithm: pick a random ordering/permutation of the lunches; then when a diner arrives, give her the first available lunch in the ordering that she likes.

How does this perform? In expectation at least M*(1-1/e) lunches are sold, where M is the maximum (offline) matching size in the Lunch-Diner graph. But proving so is tricky. (This ratio 1-1/e turns out to be the best possible.) Birnbaum and Mathieu recently gave a simpler proof that this ratio is achieved. Amin Saberi, whose talk sketched this, rightly pointed out that it feels like a “proof from The Book!”

With a small lemma (e.g. Lemma 2 in Birnbaum-Mathieu), we may assume that the number of diners and lunches are equal, and that there is a perfect matching between diners and lunches. Let the number of diners and lunches each be n, and so M=n.

Let xt be the probability that in a random lunch permutation, upon running Algorithm, the lunch in position t is eaten.

Experiment 1. Take a random lunch permutation. Score 1 if the lunch at position t is not eaten, 0 if it is eaten. The expected value of this experiment is 1-xt.

Experiment 2. Take a random lunch permutation, and a random diner. Score 1 if that diner eats a lunch in one of the first t positions, 0 otherwise. The expected value of this experiment is (x1 + … + xt)/n.

The key is to show that the second experiment has expected value greater than or equal to the first one, then we are done via an easy calculation. We do this with a joint experiment, similar to “coupling” arguments.

Joint experiment. Let t be fixed. Take a random lunch permutation π1, and look at the lunch L in position t. Let D be matched to L in the perfect matching. Obtain π2 from π1 by removing L and re-inserting L in a random position.

Key claim: π2 and D are independent, uniformly distributed random variables. This implies that we can run both experiments at the same time using this joint distribution on π1, π2, L, D. Moreover,

Deterministic Lemma. For any π1, and for an uneaten lunch L at position t in π1, obtain π2 from π1 by removing L and re-inserting L in any position. Let D be matched to L in the perfect matching. Then in π2, D eats one of the first t lunches.

This implies that whenever experiment 1 scores a point, so does experiment 2. So taking expectations,

(x1 + … + xt)/n ≥ 1-xt.

This equation is the crux. Let xt = x1+…+xt, then we see xt ≥ (n/n+1)(1+xt-1). This recursion is easy to unravel and it gives a lower bound of n(1-(n/n+1)n) ≥ n(1-1/e) on xn, the expected size of the matching!

Virtual Valuations

September 7, 2011 Leave a comment

Last weekend I attended a conference in Budapest where I saw a nice trick in an unexpected place. (A longer report about the whole trip will appear on the CS Theory Blog.) Wikipedia calls it de Moivre’s martingale and the reason that I liked it is that it somehow gives me some intuition, that I formerly lacked, about virtual valuations in Myerson’s seminal result about obtaining profit-maximizing auctions from welfare-maximizing auctions.

To give some background to de Moivre’s martingale, consider the following problem: take a fair random walk in which you walk left or right by 1 unit in each step with probability 1/2; calculate the expected squared distance from the origin after n steps. One method is to explicitly calculate the value using some binomial coefficients, and do painful simplications, giving the answer n. But there is a simpler method to get this simple answer: notice that, conditioned on your position p(t) at time t, the expected increase in square distance at time t+1 is

E[p^2(t+1)-p^2(t)|p^2(t)] = 1/2(p(t)+1)^2+1/2(p(t)-1)^2-p^2(t) = 1.

So unconditioning and using linearity of expectation, we get E[p(n)^2] = 1 + 1 + ... + 1 + 0 = n.

de Moivre’s martingale

Suppose you are participating in a biased random walk where on each step you move left with probability L≠1/2 and right with probability R=1-L. You can think of this as gambling. Say you start at position x, representing that you have $x. Now let’s play until either you run out of money (hit position 0) or you break the bank (hit position y>x). What is the probability of each of these two cases (win/lose), as a function of L, x, y?

Again, a direct calculation seems possible but is nasty. But instead, we can define a virtual position. We will define a function f : Nand when the real position at time t is p(t), the virtual position at time t is f(p(t)). In fact, we will define the virtual positions so that the martingale formula holds:

E[f(p(t+1))|p(t)] = f(p(t)).

This would mean that f(p(t)) is a martingale in t, that is to say, its expected value is the same in each time step. If we can accomplish this, then by comparing the final virtual position to the initial one, we deduce a formula for what we want (end is the stopping time, a random variable):

f(x) = f(p(0)) = E[f(p(end))] = Pr[win]*f(y) + Pr[lose]*f(0)

so

Pr[win] = (f(x)-f(0))/(f(y)-f(0)).

If you think about the martingale formula for a few minutes, you find that f satisfies a second-order linear recurrence relation, L*f(m-1)+R*f(m+1) = f(m). Standard methods give all the solutions of this recurrence, almost any will do, but the simplest is to take f(m) = (L/R)^m. Then the probability of winning is ((L/R)^x - 1)/((L/R)^y - 1)!

Myerson

What Myerson accomplished is to show basically that the celebrated “VCG mechanism” for combinatorial auctions, which is truthful and maximizes social welfare, can be converted into a truthful mechanism to maximize the auctioneer’s income, under mild technical assumptions, in the model that bidder’s distributions on valuations are known in advance. Myerson’s method is to replace each valuation with a “virtual valuation” that looks mysterious at first, but is similar to the above example, in the sense that you define the function based on the properties that you want it to have.  My lecture notes from last term explain this in a more precise way. I am glad now to have this valuable intuition!

Better Know a Theorem: Parametric Search

July 1, 2011 Leave a comment

Last week I was at an excellent summer school organized by TU Berlin and held at a nearby cottage/hotel. One topic that came up in a couple of talks was new to me: parametric search. First I’ll describe the setting and the basic result, due to Megiddo from 1979.

In an optimization problem, we are given some discrete structure (e.g. a graph) and some numerical data attached to the structure  (e.g., lengths in shortest path problems, capacities in max flow problems, costs in min spanning tree problems). Normally we assume that these data are just rationals expressed in binary. But the parametric version of the problem is more general: we introduce a parameter λ and each numerical data value is a linear function of λ, such as 5λ or (7 – 11λ). Observe that substituting any particular value of λ gives an instance of the original version of the problem; let the optimal value of this problem as a function of λ be OPT(λ). But, can we efficiently solve the problem for all real values of λ at once, which amounts to being able to draw the graph of the function OPT(λ) in terms of λ? Or can we answer other questions about this function?

We’ll assume that our optimization problems minimize sums of costs subject to feasibility constraints, which implies OPT(λ) is continuous, piecewise linear, and concave. Maximization is similar.

The min-ratio version of the problem gives one practical motivation to care about the parametric version. Consider the shortest s-t path problem in a graph. In the typical “shortest path problem,” each edge e would only have a single datum, its length l(e), and we would seek a path from s to t with minimum length (sum of lengths of the edges it contains). But now, suppose each edge also has a profit p(e) and that we seek a path such that the ratio of its total length to its total profit is minimized. This type of problem is easy to motivate as seeking the best benefit-to-cost ratio. And, if we have enough information about the parametric version, we could easily solve the min-ratio version: give edge e parametric length p(e)-λl(e), let OPT(λ) be the shortest path for these lengths as a function of λ,  then it is not hard to see that the minimum l-to-p ratio equals min {λ OPT(λ)≤0}.

So, how can we deal with parametric versions of flow, path, and tree problems? One observation is that many items of interest like min {λ OPT(λ)≤0}, or maxλ OPT(λ) can be expressed as the optima of linear programs. This gives a high-degree polynomial-time solution method to compute these items, which is unfortunately too slow to be practical in real-world instances. Also, it is worth keeping in mind that when Megiddo wrote his paper, poly-time LP algorithms were not known.

To explain Megiddo’s basic theorem, more definitions. An algorithm is combinatorial if it only applies the mathematical operations +, -, *, /, >, = to the data (and subsequent intermediate results relying on the data), but it does not rely on their bit-representation, and the running time does not depend on the length of the bit-representation1. It is strongly combinatorial if moreover * and / are not used, except to multiply data by constants. For example, the simplest and most well-known flow/path/tree algorithms are strongly combinatorial. Then Megiddo’s first theorem says:

Theorem (Megiddo 1979). Suppose there is a strongly combinatorial algorithm A with time complexity T for solving some minimization problem. Then its min-ratio version can be solved in time O(T2) by a combinatorial algorithm.

The idea is as follows. For ease of exposition assume p and l are nonnegative, so OPT(λ) is nonincreasing in λ, and assume the nontrivial case where OPT(0)>0. One crux is that under the operations {+, -, multiplication by constants}, linear functions in λ again yield linear functions in λ. So we could imagine running algorithm A where in place of constant data, we put linear functions of λ; any new intermediate results would again be linear functions of λ. This “almost” gives a parametric algorithm which runs symbolically for all values of λ at once. The problem is that when we compare two parametric values (e.g., compare 4+λ > 12-2λ) we actually need to know what is the value of λ. Megiddo’s solution is, at each such event, let λ* be the critical value determining the comparison (e.g., λ*=8/3 here); determine OPT(λ*) by running algorithm A on the rational data obtained at λ=λ*; if OPT(λ*)>0 then from now on assume λ>λ*, otherwise from now on assume λ≤λ*. In the future, comparisons either will be forced by these assumptions, else we again run algorithm A on a specific rational value. With a little work we prove that when the outer algorithm terminates, its optimal value is linear within the interval of assumptions, and includes the smallest λ with OPT(λ)=0.

For the running time, we are running an outer algorithm A symbolically on λ, and at each comparison of this algorithm, we run the inner algorithm A numerically on a specific value of λ. So the running time is at most the number of comparisons of A, times the running time of A, giving the theorem. ■

Megiddo goes on to show that in some situations, the running time can be improved further. One example is finding a min-ratio cycle: it can be obtained in time O(n6) (with n the number of vertices) by applying Megiddo’s theorem to Floyd’s algorithm; but Floyd’s algorithm has the structure that in each of n phases, we only need the n2 values computed in the last phase, so by working more carefully and by using  the inner A and binary search on n2 potential critical values in each phase, we get an overall runtime of O(n(n2log(n2) + log(n2)n3)) = O(n4log n).

Parametric Flows

My interest in these parametric problems arises from the lectures that Tom McCormick gave at the aforementioned summer school. He talked about the parametric max-flow problem2. For parametric flow, a result of Carstensen (1983) shows that the graph of OPT(λ) can have exponentially many breakpoints (as a piecewise linear function). Thus the following result of Gallo-Grigoriadis-Tarjan, which Tom showed, is interesting:

Theorem (G-G-T 1989). For a special case of parametric max-flow, where the dependence on λ is positive in δout(s), negative in δin(t), and zero elsewhere, then

  1. the graph of OPT(λ) consists of at most n linear segments;
  2. and, we can adapt several known max-flow algorithms (let their time complexity be T) to compute the entire graph of OPT(λ) in time O(T).

This is much better than the generic O(T2) bound from Megiddo’s approach: going from the original problem to the parametric one only cost a constant factor of time! This G-G-T result is fairly complex and combines several interesting algorithmic ideas. Valid choices of algorithm include those of Karzanov, Goldberg-Tarjan, or Hochbaum with respective time complexities T=O(n3), O(nm log (n2/m)), O(nm log n), where m=|E|.

I asked a question at the course: what if we don’t have the special G-G-T structure, but we want to compute maxλ OPT(λ)? (This is natural: it’s the maximum flow value if we can both pick our choice of λ and our choice of flow.) It follows from a very general method of Tardos (1986) for linear programming that there is a high-degree polynomial-time combinatorial algorithm for this problem. But it turns out (as profs at the summer school suggested) that we get something more reasonable by using a variant of Megiddo’s method. Instead of searching for the λ with OPT(λ)=0, we (roughly speaking) search for the λ where the derivative of OPT(λ) is zero. (Recall OPT(λ) is concave and piecewise linear.) That is to say, when a comparison has λ* as its critical value, run an inner max-flow algorithm on the values λ=λ*, λ*+ε, λ*-ε to find out the slope of OPT(λ) to the left and right of λ*, e.g. if these slopes are both positive we should thereafter assume λ>λ*. So in O(T2) time, for any combinatorial max-flow algorithm, we get the desired maxλ OPT(λ).

As in Megiddo’s paper, the next natural question is whether this generic approach can be accelerated: for example, can we get a time complexity more like O(n5) instead of O(n6)?


Update: Indeed! I started reading Megiddo’s follow-up paper (1983) and it gives an idea to accomplish this and several other ideas. The key insight is the same as for the algorithm I sketched above for accelerating “parametric Floyd’s algorithm.” Remarkably, this approach (of reducing the number of calls to the inner algorithm) works whenever we have a good parallel (PRAM) algorithm for the problem. Specifically, we simulate a PRAM algorithm on a serial machine; but if C comparisons are done in parallel on a single time step, binary search allows us to reduce the number of calls to the inner algorithm to log C. So, we have:

Theorem (Megiddo 1983). Suppose there is a strongly combinatorial parallel algorithm for solving some minimization problem, which runs in T time steps on P processors, and a serial (i.e., non-parallel) algorithm running in time S. Then its min-ratio version can be solved in time O(T*log P*S) by a combinatorial algorithm.

Megiddo shows, for example, in parametric flow we can find λ such that OPT(λ)=x in Õ(min{n5,mn3}) time, using parallel max-flow algorithms with T = Õ(n2) and P = O(n). The same approach works for my question too. With one more idea, this approach also accelerates the parametric Floyd’s algorithm (min-ratio cycle) to Õ(n3) time.

Future reading: Fleischer and Iwata’s submodular function minimization algorithm (2003) is said to use many techniques related to parametric max-flow, in a more general setting.

1: Actually, in the definition of combinatorial we also require that the algorithm does not blow up the length of intermediate computations by more than a polynomial factor in bit-length. And, instead of “combinatorial with polynomial run-time” we usually say “strongly polynomial run-time.” See Schrijver’s 3-book set (2003) as a reference.

2: A nice exercise is to show that, given two cost-functions, find a flow F maximizing c1(F)/c2(F) always has an optimum which is a cycle or a path. But surprisingly, given two capacity-functions, finding a cut C minimizing c1(C)/c2(C) is not as simple. The latter is an example of a parametric flow problem that is actually useful in practice.

Deconstructing the Pentagon

December 11, 2010 1 comment

No, it’s not about the latest Wikileaks scandal. Thursday at EPFL, the last Bernoulli guest lecture of the discrete/computational geometry theme semester was given by Miklos Laczkovich, on geometrical tilings. (I posted about another of the guest lectures; the remaining 2 talks, by Gil Kalai and Micha Sharir, were excellent too.)

Miklos’ talk was a survey with lots of pictures and just brief tastes of some proofs, which made it very entertaining. I liked that algebraic field isomorphisms came out of nowhere, as I’ll elaborate below.

The general theme of the talk was about decomposing shapes into smaller shapes. It started with the following classical puzzle. Note that this L-shaped figure can be decomposed into 3 identical pieces. Find a way to decompose it into 4 identical pieces. (Click here for the solution)

He moved on next to a hard result that sounds obvious, then another result which sounds obviously false, given the first one. (In the whole post, I only conider dissections into a finite number of pieces.)

Theorem (Max Dehn, 1903). A rectangle with side lengths P and Q can be dissected into squares (possibly of different sizes) if and only if P/Q is a rational number.

It’s obvious that when P/Q is rational, a tiling exists, like the chocolate bar shown at left (for P/Q = 3/2). But if P/Q is irrational, the proof that no tiling exists is quite tricky.

Given Dehn’s theorem, many people would immediately agree that the following dual theorem “should be” true:

A square can be dissected into rectangles (possibly of different sizes) with side lengths in ratio P/Q, if and only if P/Q is a rational number.

But this is false! There are irrational ratios P/Q yielding a dissection. For an example, we draw the diagram shown on the right-hand side (in blue), and hope that we can pick the side lengths to all be similar rectangles. Does this work? If the square has side length 1, we can deduce some side lengths of the rectangles (1/2 and 1/3). Then introduce x to stand for the ratio of the “long” side of the rectangle to the short one.

We can label all the lengths as shown in the figure below:

Then, we see that we must have x/2 + 1/3x = 1, which is a quadratic equation whose solution is x = 1±√(1/3), which is indeed irrational. The characterization of all ratios is the following:

Theorem (Laczkovich-Szekeres 1990):  A square can be dissected into rectangles (possibly of different sizes) with side lengths in ratio P/Q, if and only if P/Q is an algebraic number, and all of its algebraic conjugates* have positive real part.

*: This means all the complex roots of the minimal rational polynomial.

If I understood the idea in the talk correctly, the proof uses a nice idea about morphisms between finite field extensions Q(x) of the rational numbers, roughly saying that we can replace x with any of its conjugates, and still have a rectangle tiling. The same methodology gives several other optimal results about dissections. He also mentioned the following nice open problem:

Open problem. Can the regular pentagon be dissected into 36°-36°-108° triangles (possibly of different size)?

The algebraic requirements that are always necessary and “often” sufficient to get a drawing, are true here. So if no drawing exists, proving this fact seems to require new ideas.

Categories: b.k.a.t., math

Better Know a Theorem: Necklace Splitting

October 18, 2010 6 comments

Pursuant to my previous post on alternative career paths for math students, I should mention that there is a vibrant range of careers on stealing and splitting necklaces, and proving related theorems. For example, a pair of drunken post-docs break in to the Swatch store and steal a necklace with k different types of gems on it.

Although drunk, the two sticky-fingered postdocs are fair and wish to split the necklace evenly; since they don’t remember the monetary values of all the different kinds of gems, their goal is that for each of the k types of gems, half (rounding up or down) should go to each robber.

In order to split the gems, the necklace must be cut in several places; each robber takes some of the pieces home. How many cuts are needed? It depends on how the gems are arranged, but the following theorem shows what is the worst case.

Theorem (Goldberg & West 1985). At most k cuts are needed.

(Above and later, we model the necklace as having a single clasp that opens and does not need to be cut, so it is like a line rather than a closed circle.) Moreover, if there is a larger group of post-docs,

Theorem (Alon 1987). To split the necklace evenly amongst n thieves, at most (n-1)k cuts are needed.

I became interested in this problem as it turns out to be very similar to a problem that Rico Zenklusen mentioned in a talk about an unrelated algorithmic problem; here is the interesting open problem that showed up.

An Open Question about Asymmetric Robbers. Suppose that we again have two robbers, one of whom did more work than the other, and so the robbers desire to split the gems in some ratio P:Q other than 1:1. How many cuts are needed?

(In this setting, let us assume gems are infinitely divisible, so for example you can cut a diamond into two halves or ratio 30%-70%. Then we seek all gem types divided exactly in ratio P:Q, without any rounding necessary. This makes stating the facts below more natural.)

  • The case k=2 is solved: only 2 cuts are needed, the same as the 1:1 case. The paper of Grandoni & Zenklusen linked above gives one proof, and Alon-West 1986 gives another proof.
  • For larger k, if P and Q are integral, then by Alon’s theorem we can use (P+Q-1)k cuts into P+Q equal-valued sets, and then afterwards give P to robber 1 and Q to robber 2.
  • A somewhat fancier use of Alon’s theorem shows that something like 8 * ln(P+Q)k cuts is sufficient when P and Q are integral.
  • An example of Dömötör Pálvölgyi shows that for ratio 2:1, the optimal number of cuts is approximately 4k/3.

There is a lot of space between the known upper and lower bounds: it might be true that 2k cuts is enough for any ratio, or it might be that O(ln(P+Q))k is optimal and irrational ratios are impossible for all k>2.

The methods used to prove the theorems of Goldberg-West and Alon are fascinating; I needed to read a lot of topology to understand the ideas. A nice book of Jiří Matoušek has been a great digestive aid for me. In essence, one reformulates the necklace problems as being about a symmetry-preserving map from one symmetric space to another symmetric space.

In trying to apply these techniques to arbitrary P:Q, a major problem is a lack of symmetry: dividing equally amongst n robbers means that the set of possible “splitting configurations” automatically inherits an n-fold symmetry, but for a ratio like π:1, this is not true. Later this week Imre Bárány will give a talk here at EPFL about some related problems.

Categories: b.k.a.t., math

Pfaffian Trickery

April 26, 2010 1 comment

This week at EPFL I am presenting a recent paper by Andreas Björklund, which I found to be pretty elegant. It uses algebraic combinatorics to get a faster algorithm for a combinatorial problem: given a bunch of k-element subsets of {1, 2, …, kn},  determine if one can find a “perfect matching” of n subsets which are disjoint (and thus cover every point exactly once).

Björklund uses a generalization of the Pfaffian of a matrix, and the Pfaffian itself is awesome enough that you should know about it. The heart of what is so cool is the following. Suppose we have a matrix of integers which is skew-symmetric: the entry at row i and column j is the negative of the one at row j and column i. Then its determinant is a perfect square. For example,

\displaystyle \det\begin{pmatrix}0 & 3 & -4& 5 \\ -3& 0& 2& -1 \\ 4 &-2& 0 &-6 \\ -5& 1& 6 &0\end{pmatrix}

equals 144, which is indeed a square number.

A similar thing holds for a skew-symmetric matrix of variables:

\displaystyle \det\begin{pmatrix}0& a& b& c \\ -a& 0& d& e \\ -b& -d &0 &f \\ -c& -e& -f& 0\end{pmatrix} = (af - be + cd)^2

The determinant is the square of a polynomial in the variable which is called the Pfaffian (here the Pfaffian is afbe + cd).

Even better, the terms of the Pfaffian are always given in a nice combinatorial way that enumerates perfect matchings. Note that in the skew-symmetric matrix above, the variable a appears in row-column positions (1, 2) and (2, 1). Let’s say that we associate the unordered pair {1, 2} with the variable a; similarly, each unordered pair {i, j} of elements from {1, 2, …, 2n} is associated with some variable, which we denote by x{i, j}.

Theorem. The Pfaffian equals

\displaystyle \sum_M \mathrm{sgn}(M) \prod_{\{i, j\} \in M} x_{\{i, j\}}

where M ranges over all perfect matchings of {1, 2, …, 2n} and sgn(M) is +1 or -1 for each M.

There is a nice combinatorial proof of this theorem. More precisely, in a proof we would define the right rule to compute sgn(M) and then show that the square of the above expression equals the determinant. The book Algebraic Combinatorics by Godsil, available on google books, seems to have it with a minor bug.

What Björklund did in his paper is find a cousin of this theorem, changed in two ways: first, we work modulo 2 (more precisely in a field of characteristic 2, so the signs disappear) and second, we insert diagonal entries into the matrix, which breaks skew-symmetry. The determinant is no longer a perfect square but it can be expressed using a formula much like the Theorem above; Björklund’s proof is short and self-contained because the characteristic 2 really simplifies things in an essential way.

This all is useful in order to compute a generating function over the field — the generating function enumerates some variant of perfect matchings that are allowed to include the loops corresponding to the diagonal variables — which is used with inclusion-exclusion to determine whether any perfect k-set matching exists. The paper is sort of a tour de force, also using randomization to get the desired result!

Hasenpfeffer

How do you pronounce Pfaffian? If you have watched enough Bugs Bunny & Elmer Fudd you should know: like Hasenpfeffer, the P is silent.

Categories: algorithms, b.k.a.t., food

Better Know a Theorem: Arrow’s Voting Theorem

February 2, 2010 4 comments

The following is a book review I wrote thanks to a blog-based call for volunteers (technically I got paid with a free book). One of the main themes in the book is a theorem known colloquially as “Arrow’s Impossibility Theorem” for voting, which says that every voting system fails one of three simple axioms: non-dictatorship, agreement with unanimous pairwise comparisons, and “irrelevance of independent alternatives.” However, after reading the book, my view became that the last axiom is much too strong to expect any reasonable system to have. Also, this is the first math book I have ever read to involve Michele Kwan.

The review appeared in the SIGACT NEWS Book Review Column.

Review of Decisions and Elections: Explaining the Unexpected by Donald G. Saari

Reviewed by David Pritchard

In Decisions and Elections: Explaining the Unexpected, Donald Saari gives a highly accessible introductory tour of decision theory: the study of mechanisms that amalgamate several separate preferences into a single preference (for example elections, in which voters’ preferences over candidates are combined). The book has an excellent style that combines a deep understanding of mathematics, good pedagogy, and lighthearted prose; Saari provides compelling arguments, high-level insights, and interesting applications.

The book combines the nice flavors of paradoxes and puzzles, important real-world applications, and a good sense of humor. There are a number of examples drawn from engineering and group dynamics which make interesting food for thought. One example is about a non-voting chair of a meeting: even if she has no vote and thus seems powerless, she can tremendously determine the outcome of an election (page 91). Another example (page 54) is that in an engineering company, when solving a huge problem like the design of an aircraft, divide-and-conquer/decentralization is not always a good idea, as it can introduce new problems due to having to combine the answers to the various parts.

The book is aimed at undergraduates and professionals according to its introduction, and it seems it would serve this purpose well. I think many other people (e.g. researchers from related fields) would appreciate the book as an interesting, relatively quick read. The pedagogical style mostly eschews formality except where it is needed. Saari introduces new terms only after providing enough background to warrant them; I wish all mathematics authors had this skill! An example of the level of discourse is that Saari takes the time to explain a proof by contradiction (“If the Moon were Purple,” page 47). Thus, for example, I think that the book would make nice secondary reading material for a game theory class aimed at people from other disciplines. I would urgently recommend any future teachers/researchers in decision theory to read this book.

Summary of Contents

The central formal result in the book is Arrow’s theorem (1951): any group ranking mechanism with at least 3 candidates is either a dictatorship, fails to respect unanimous pairwise rankings, or allows some pairwise rankings to interact with others, i.e. fails to have “Independence of Irrelevant Alternatives.” Saari gives a full proof, but defers it to the appendix due to its technical nature. Literature references also occur only in the appendix, giving the book high readability but also giving the interested reader pointers to more details. Proofs are usually sketched, and sometimes omitted. The flip-side of this approach is that sometimes the reader is not precisely sure what Saari means; although usually, ambiguities either can be clarified with a little effort of the reader, or are made clear later in the book.

Saari spends a considerable amount of time looking at the ramifications of Arrow’s theorem, and discussing variants. Over the course of the book, one accumulates the view that Arrow’s theorem should be thought of as saying that Independence of Irrelevant Alternatives (IIA) is too strong to expect from a reasonable voting system. Here are two nice results in this vein: first, even with a weaker version of IIA where we consider triples instead of pairs, Arrow’s theorem still holds (Section 5.4); second, if we replace IIA by a more detailed variant “Intensity of Binary Independence,” then Arrow’s theorem fails to hold (Section 6.5). The book acts in part as an introduction to some of Saari’s research from the last two decades, e.g. both of the aforementioned results come from publications of Saari and coauthors.

Sen’s theorem (1970) is discussed at length. In a decision mechanism, a given voter is “decisive” over two given alternatives if the mechanism’s ranking of those two always agrees with that voter. Then Sen’s theorem says that for any ranking mechanism with at least two decisive voters (over any two pairs of alternatives), that mechanism will sometimes disagree with a pairwise ranking agreed on unanimously by all voters. Saari phrases this differently: if unanimous pairwise rankings are to be respected, then the mechanism sometimes produces cyclic preferences (i.e., not a ranking). Saari gives a number of examples which illustrate applications — sharing an apartment, book censorship, and a stock-buying club — which at same time illustrate the most important ideas in the full proof of Sen’s theorem.

An explicit mantra in the book has to do with what goes wrong in the standard impossibility results of Arrow, Sen, and others. Saari notes that in many cases, a potential explanation is that we are disregarding valuable information: e.g. in Arrow’s theorem, we essentially throw away the voters’ full preference lists and focus only on pairwise rankings. We read on page 97 that “suspect outcomes can occur should the decision procedure ignore portions of the available information.” Saari notes more concretely that even irrational voters with cyclic preferences can submit pairwise rankings, but if we want to think rationally about a good voting procedure, we should certainly demand that the voters be rational!

Saari pinpoints a certain sub-structure, a “Condorcet tuple,” as the source of many problems. E.g. in a Condorcet triple, we have three players, one of whom prefers outcome A to outcome B to outcome C, another who prefers B to C to A, and a third who prefers C to A to B; then any global ordering upsets a majority of the voters in at least one pairwise comparison. Small modifications to this sort of example are fodder for the book, and Saari describes a certain result saying that indeed all impossibility results of a certain type come from modifications of Condorcet tuples, although I did not find his description of this result clear enough to properly parse.

If one were to read Saari’s book with the goal of choosing a voting system to put in to practice, the reader would be very persuaded to use the Borda Count: each voter ranks all n candidates and gives n points to their top-ranked one, n-1 to their second-ranked one, and so forth; then we order the candidates by total points. This avoids some problems of the usual plurality vote (where each voter contributes just one point to their favorite candidate) and has several other nice properties, for example the Borda Count is the only ranking mechanism which is never backwards in all pairwise comparisons, i.e. it ranks at least one pair of candidates in accordance with the majority of voters (Section 5.4.3). On page 193 Saari mentions a few other natural conditions that together get “close to creating an axiomatic representation of the Borda Count” although he later (in Section 8.3.2) tempers this by pointing out that axiomatic representations alone do not warrant a given protocol is best. For the election-designer, Saari’s book has no explicit advice (unfortunately) on what to do if more than one candidate should be elected, although there is certainly lots of implicit advice in the book. I should also note that, according to Wikipedia, the Borda Count is generally not used in practice, but replaced by other mechanisms with better properties; it would have been nice if Saari attempted to describe what sort of sophisticated voting methods are used in practice and what distinguishes them from the Borda Count.

Style and My Opinion

The book is sprinkled with a lot of specific small abstract examples, and also relevant real-world examples steeped in modern culture. The initial abstract examples in the first couple of chapters illustrate that even for a fixed set of preferences, different choices of reasonable-looking voting procedures can yield drastically different results; and often, the results of a vote seem clearly not to agree with the intent of the voters. Some of the real-world examples include Ross Perot’s Reform Party, the election of Jesse “The Body” Ventura, and scoring in professional figure skating (where IIA fails to hold). It is quite helpful to see these connections between theory and real life. The ubiquitous Prisoner’s Dilemma is given an atypical presentation — usually notions from strategic games are used but since this is the only strategic game in the book it would not make sense. I liked the Prisoner’s Dilemma example of two lanes merging on the highway, which I had not seen before. (Here two drivers may either be polite or a jerk, where each driver gets home faster as a jerk regardless of the other drivers, but two polite drivers get home faster than two jerks.)

I have a few general criticisms of the book. The lack of formality is a promising approach but sometimes important terms are used before they are introduced or informally defined.  E.g. I was several times confused whether a “decision procedure” or “choice procedure” is supposed to give a ranking of candidates, or just a (possibly cyclic) list of pairwise rankings. Smoothing problems like this would save readers time. Likewise, there are a number of grammatical and numerical bugs which slow the reader down. At a higher level, I found the numerous small examples in the first couple chapters to lack structure, in the sense that I didn’t quite see how each one differed from the other, and so they appeared repetitive. Likewise, the discussion and corollaries surrounding the discussion of Arrow’s theorem in Chapter 2 seemed repetitive. Section 7.2 mentioned some nice theorems involving “strong” rankings; but since this term was never even precisely defined informally, that section only read like a mathematical parable, taking the details on faith.

The book sometimes has too many “scare quotes” in a small section. Let me take one funny quote from page 185 as an example:

All this “coordinate component” and “fractional voter” tech talk makes Theorem 10 seem impractical and restricted only to those weird “number-crunchers” whose sense of a “hot Saturday night” is checking new web features while designing a C++ program. Fortunately, this is not the case. What saves us from this nerdy fate is [a property of the Borda Count.]

To conclude I will mention the top reasons that I liked the book. It shed light on fundamental results that I never fully understood before (Arrow’s theorem, Sen’s theorem, Black’s single-peaked condition). I learned of new interesting fundamental theorems (e.g. the extensions of Arrow’s theorem to quasi-transitive or restricted preferences in Chapter 6, where a dictator is respectively replaced with an oligarchy or partial dictator). Its mantras gave new high-level perspectives which would be useful for guiding research or in practice. Finally, I enjoyed the folksy writing style; if you appreciate real-life (or plausible-fake) stories in mathematical discussion, then you probably would enjoy this book.

Categories: b.k.a.t., game theory
Follow

Get every new post delivered to your Inbox.