### Saarbrücken

16Nov10

Last weekend I went to a two-day workshop on combinatorics at the Max Planck Institut für Informatik. The place is sort of a European mecca for computer science, as it hosts a gigantic collection of post-docs and students, but had only one professor up until recently.

Historical Figure

In the city, nearly every restaurant has a logo of this man’s silhouette (see right). Does this indicate the founder of the town? Some historic figure of great repute? Yes, a local told me it is the picture of the founder of the Karlsberg brewery.

There were four invited talks, one of which was by Daniela Kühn. She gave a survey of a remarkable collection of her results with  Deryk Osthus and other co-authors. In brief, there are a whole bunch of open questions about directed graphs from the 1970s, and her results prove that they are true up to any desired tolerance ε, provided the graphs are large enough. The results rely on uses of the “Szemeredi regularity lemma” and the method seems only to work on dense graphs.

Breakfast

As for food, there were tasty things to be had at the Stiefel brewery where the initial get-together was held, but on the last morning I ventured into town to find something even better. The cinnamon pastry shown (which mysteriously had some bites missing by the time I remembered to take a picture of it) came from the Schubert cafe. The Saarlander in my group at EPFL confirms the tastiness of their goods.

Finally, here is a question raised in the talk of Carola Winzen. Alice selects an n-bit binary string x uniformly at random. Then Bob gets to make queries of the form “what is the dot product of x and q” for any binary string q (so the answer is in {0, 1, …, n}) and wants to determine x using as few queries as possible. How many queries does he need (in expectation w.r.t x)? One can easily show a lower bound ~ n/log n, while there is a deterministic upper bound of n. One may show that even if Bob’s queries are totally random, he can uniquely pin down x using ~ n/log n queries. However, the proof doesn’t give Bob an algorithm running in time polynomial(n). Is there an efficient algorithm for Bob using O(n/log n) queries?

Update Yes! The earliest likely candidate is a paper in the Publications of the Mathematical Institute of the Hungarian Academy of Sciences, but I have not tried to dig it up. I did read the slightly later paper “Determination of a subset from certain combinatorial properties” which gives a deterministic poly-time solution, using recursion.

#### 12 Responses to “Saarbrücken”

1. Very nice question. Let me reprehase it. There is a base set with N elements and we want to determine a subset X of it. We can ask questions of the form how big is Q intersection X. Do we need N such questions or is N/log N enough?
Or the inverse way: If we can ask k questions, then what is the biggest set that we can query [using poly(k) computational power -dave]? k or k log k?

• I edited your reply a bit (including lumberjack operations, i.e. deleting unneeded logs.) You can show that O(N/log N) questions is enough IF we allow Bob to take exp(N) time afterwards – the computation is sort of long but straightforward. More specifically, random questions w.h.p. will pin down the solution uniquely, and Bob could find it by searching all 2N possibilities. But, what if Bob doesn’t have so much time on his hands?

http://www.tau.ac.il/~nogaa/PDFS/av1.pdf

3. So forget (delete if you can) all my previous comments. The result using the random sets, is its worst case or only its expected running time bounded by n/log n? I would be interested in proving the first version if the result only proves the second but less interested in coming up with a poly algorithm. Btw, I think also in the randomized communication complexity model to decide if the strings of A and B are the same is known to solved in log n steps only using random sets, I never heard of any constructive proof achieving the same bound.

4. I tried some stuff by hand/Maple and posit that for n=4 you really need 4 queries, for n=5 I believe there is an adaptive strategy that works in 4 queries but no nonadaptive one, and for n=6 there seems to be at least one nonadaptive 5-query solution:

111111
111000
110100
110010
100001

5. So in Carola’s proof, is n/log n an expected number of questions or worst case?

• I think the strongest claim you can make easily is that the probability of Bob getting the right answer is 1/exp(Θ(n)) until Θ(n/log n) questions have been asked. (So, expected.) I like the following simple proof idea: initially there are n bits of entropy unknown, and since each reply is in {0, 1, …, n}, at most lg(n+1) bits of entropy are gained by Bob each time, so he can’t hope to be done in less than n/lg(n+1) guesses.

6. Yes, this is a standard argument known as the Information Theoretical Lower bound.

But then there is still the question whether the right answer can be determined in O(n/log n) deterministic worst case time.

• Yes that’s true. I don’t know if this exists even when Bob can use exponential computing time in addition to O(n/log n) queries — a “standard argument” I am repeatedly too lazy to write down here is too weak for this. Really the best that I know (by repeating the thing above) is that ceil(5n/6) queries is enough.

7. I asked Katona yesterday, he has an amazing memory and could recall this paper which seems to solve the problem:

MR0168488 (29 #5750)
Lindström, Bernt
On a combinatory detection problem. I. (Russian summary)
Magyar Tud. Akad. Mat. Kutató Int. Közl. 9 1964 195\u2013207.
05.04
PDF Clipboard Journal Article Make Link

The present paper is an outgrowth of a paper by H. S. Shapiro and S. Söderberg on a coin-weighing problem [Amer. Math. Monthly 70 (1963), 1066–1070]. A family of subsets $T_1,T_2,\cdots,T_m$T1,T2,\u22ef,Tm of a given set $S$S of $|S|=n$|S|=n elements is a detecting family for $S$S if each subset $M$M of $S$S is uniquely determined by the $m$m numbers $|M\cap T_i|\ (i=1,\cdots,m)$|M\u2229Ti| (i=1,\u22ef,m). The basic problem is to evaluate $f(n)=\min m$f(n)=minm over the class of all detecting families for $S$S. It is easy to verify that $f(4)=3$f(4)=3 and $f(5)=4$f(5)=4, but the evaluation of $f(n)$f(n) for general $n$n appears hopelessly difficult. P. Erd\u0151s and A. Rényi, as well as B. Gordon, L. Moser, and the author, have shown that $$\underset n\rightarrow\infty\to{\lim\text{}\inf}\frac{f(n)\log n}{n}\geq\log 4. \tag1$$
liminfn\u2192\u221ef(n)lognn\u2265log4.(1)
The main result of this paper is that $$\underset n\rightarrow\infty\to{\lim\text{}\sup}\frac{f(n)\log n}{n}\leq\log 4, \tag2$$
limsupn\u2192\u221ef(n)lognn\u2264log4,(2)
whence $$\underset n\rightarrow\infty\to\lim\frac{f(n)\log n}{n}=\log 4. \tag3$$
limn\u2192\u221ef(n)lognn=log4.(3)
This confirms a conjecture of Erd\u0151s and Rényi on the existence of the limit. A critical theorem in the derivation of (2) is the inequality $$f(k2^{k-1})\leq 2^k-1\quad(k=2,3,\cdots), \tag4$$
f(k2k\u22121)\u22642k\u22121(k=2,3,\u22ef),(4)
and this is derived by the construction of a certain “detecting matrix”. The paper also contains another derivation of (1) based on information theory. The paper concludes by developing an analogous theory for a related detection problem of Erd\u0151s and Rényi.
Reviewed by H. J. Ryser

• That is excellent! I googled a bit to try to see if it’s available, and while I didn’t find it exactly, there are a few other resources that might have the same result:
http://www.jstor.org/stable/2975353

I would like to eventually check if the ‘weighing scheme’ is indeed polynomial time (and how it works). I also see now that for n=4 the following scheme works:
1110
1101
1011
whereas I had only tried schemes using 1111 as the first query, thinking some greedy notion of maximizing information in the first query would be best.

8. Well, to be honest I lost my interest a bit at this point, digging up unavailable papers is not that much fun, especially knowing that the main problem (of my interest) is already solved.