The longest-running project I’ve ever been involved with just turned 10! Computer Science Circles (https://cscircles.cemc.uwaterloo.ca) is a website that teaches introductory programming via Python. It has automatically-graded programming exercises, and in addition, students and teachers can register together, using the site as a place to ask/answer questions and track progress.

The ruminations and brainstorming for the website began in 2009. The name is similar to Math Circles, though Troy also justified it thusly:

I imagined the circles perhaps overlapping in some Venn-like way: doing this module in language X leads to a similar language in language Y. Or maybe they are even touching at various points: go around the Python Circle, and it connects at other points.

Whatever curvature we obtained, the project worked pretty well. At the time of writing, 1,993,555 problems have been solved on the site, with the most popular problem solved by 58,298 registered accounts. (And 1601 accounts have solved the hardest problem.)

We finally decided to ditch our original webserver this year. To that end, I thought some comparisons would be apt:

2010-2011 | 2019 | |
---|---|---|

What happened: | Site launched! (First users Sept 2011) | New server! |

The WordPress: | 2.9.2 “Carmen McRae” (Installed May 20, 2010) | 5.3 “Rahsaan Roland Kirk” |

The hot newness: | Python 3 | WordPress upgrades with a button click |

The crufty oldness: | Python 2 | Role Scoper, a plugin last updated 3 years ago that I had to delete |

OS | CentOS 5.x (too obsolete for recent PHP) | Ubuntu 18.04 |

Localizations | English, French, … | …, German, Dutch, Chinese, Lithuanian |

Troy, Sandy and I still answer questions from students on the site. It definitely has been helpful to split the load three ways, but the number of questions was never aggressively overwhelming. There are multiple places online where answers are posted and I wonder to what extent that contributed 🙂

Probably the hardest single maintenance issue over time was when we wanted to add HTTPS to the old server. The awesome letsencrypt.org project gave us a way to do it—sort of—hacking around the incompatibly old PHP version to somehow run the acme certbot. Anyway. The overall construction and maintenance of everything added up to about 2k commits in total on github.

In addition to a paper and a couple of posters on the site itself, the most interesting outcomes to me have been:

- visiting a jail in Penetanguishene where they cloned the open-source version to teach this without internet access
- a 2015 paper on frequency distributions of programming errors, using the CS Circles submissions corpus
- a 2016 paper with educators at a Serbian university on extensions they added to their fork of the site

What do I hope to accomplish in the next 10 years of Computer Science Circles?

- I’ve been using git subrepositories for ten years, and I still have no idea how they work and how to properly maintain them… that seems like worth learning since I’ve broken things more than once as a result.
- New content: maybe idiomatic Python data structures such as dictionaries?
- More localizations. We’ve had multiple false starts on different languages and it would be nice to get another big one out the door (e.g. Spanish).

Last but not least I owe a debt to all the free software on which the site was built, including Linux, Python, WordPress, CodeMirror, Polylang, and FlexiGrid; Mooshak, which we built our safeexec from; and Philip Guo and Peter Wentworth who built the original visualizer and its Python 3 port.

Filed under: computer science, education | Leave a Comment

### A Tale of Two Boxes

Back in New Jersey, we subscribed to the Great Road Farm CSA service for a year. Since we were basically in the middle of a bunch of farmland, it worked out great and I got some keeper recipes out of it like this Kale and White Bean soup. Moving forward a couple of years and on the one hand we currently live in the concrete jungle of Santa Monica, but on the other hand it’s California which is well-known for its produce.

I eventually got my vegetable game together and signed up for two CSAs here:

- Out Of The Box Collective, which came to Google Earth Day a couple of years ago
- Good Life Organics, which I found on Yelp

Though we cancelled these during our trip to Canada this year, when we came back I thought to take a more careful look at these. I remember some high and low points from the past:

- Good Life Organics gave me an awesome fresh herb bundle just before thanksgiving last year: sage, bay, rosemary and thyme. That ended up used in the turkey, the stuffing, the soup and in other goodies, which was great since buying them individually is usually an expensive waste.
- On the other hand, Good Life Organics also gave me a huge daikon radish back in February. Like, a one-pounder. Wasn’t very useful; one person can only eat so many bitter daikon pickles.

I’d signed up for the smallest box each service has to offer. With Out Of The Box this is $49, plus a $5 delivery fee, and with Good Life, it is $27, and we pick it up at a nearby address. In order to decide which one to stick with, I took some photos to document our most recent pair of boxes.

### Out Of The Box

For my delivery this month, I got:

- a few things that I grilled: 1 cantaloupe, 3 shishito peppers, 2 zucchini
- ~2 lbs tomatoes, some of which I grilled, and the rest of which became persian omelette
- the world’s saddest head of butter lettuce and mini-herb bundles
- a package of watercress and a box of grapes, mislabeled as seedless
- some staples: avocado, cucumber, garlic, lime, onion

Overall, they’ve had better boxes. When I tried to estimate the total cost compared to buying individually, it was a pretty high price.

### Good Life Organics

This box contained:

- 2 each of pluots, plums and nectarines
- a canary melon and a box of seedless grapes
- roma tomatoes and onion that became part of a pasta sauce
- a whole box of shishito peppers which were awesome sauteed
- an avocado which Hanna ate whole over the course of several meals
- a bunch of rainbow carrots

Drumroll please… we ended up cancelling our Out Of The Box subscription and decided to go with just Good Life Organics. I’m still not convinced it’s much cheaper than buying à la carte but at least it gets us eating more fruits + veggies and gives us some variety. I’ll have to try harder to find a good application for the next daikon they send me.

Filed under: food | Leave a Comment

### Family Tree Visualizer

I remember once, long ago in high school, there was some project where I had to look up and draw my family tree. Being the not-so-organized guy that I am, I lost that specific project many years ago. But since then, I also had a whole bunch of changes to the tree: I visited Iran and met all my spouse’s cousins, and we welcomed a new baby. The arrival of the little one gave me some motivation to take the tree I scribbled down about my in-laws and combine it, along with my parents and siblings’ info.

Who doesn’t love recursion? I thought it would therefore be a fun exercise to write a program to draw the tree. On top of which, I’m a lot less likely to lose the data again if I save it somewhere online. So in January I started making some super basic prototypes in Python. After some iterations, including one that I printed out and was about 8 feet wide, I uploaded the current web JavaScript version to GitHub this month.

Here’s what it looks like — you can access the interactive web interface at https://daveagp.github.io/family-tree:

This, of course, is the Simpsons. I figured it is maybe not a great idea to publish all the information about my family to a public place and that’s why I created a mock tree for the GitHub repository. On the other hand, in the course of looking up information on past relatives, I ran across geni.com which has exactly that: family tree information that various folks have posted publicly online (and in my case, I did actually find information on relatives some generations back). Long story short, the visualizer with own real family tree is online but behind a password, if you’re interested to see just email me to ask how to see it.

In January, I started with a very basic prototype which generated something like this:Its initial purpose was to help me figure out the input biographical record format and get an idea on what would be the simplest working layout algorithm. Fast forward a while to the past couple days, when I took the “finished” JavaScript version and tried to figure out what the heck my own code is doing. Recursion is complicated! I’m now pretty happy with the state of the code (https://github.com/daveagp/family-tree), having separated the styling, visibility, and geometric layout logic. The basic tree layout algorithm is this:

- The initial recursive call is to lay out the tree for some “root” user.
- There are “nodes” both for every user, and for every union (marriage/child-bearing couple).
- To lay out the tree for a node,
- first lay out trees for every node one step further from the root.
- then combine these trees, fitting them side by side and adding the node in question.

The current version creates very long horizontal edges leading to ancestors of the root (which you can see also in the prototype), though I think these are difficult to avoid if you also want to respect the left-to-right layout of children of each family as going from first-to-last born. It would be interesting to see what arises from other layout algorithms.

Filed under: algorithms, graphs, programming | 1 Comment

### Cinnamon Cocoa Coffee Cake

Coffee cake is one of my favourite foods to bake: it’s simple and delicious, and beats paying big bucks per slice at cafes. I had some cocoa lying around in my pantry and decided I should figure out a way to get it into a coffee cake.

However, recipes for this sort of thing seem to be sort of rare. So, I hodge-podged together these three recipes: 1 with cocoa filling, 1 with cocoa-walnut filling, and 1 with cocoa-walnut topping; this resulted in a cake with walnuts, cocoa and cinnamon both in the middle and on top. (Note: all those recipes are for a full bundt-sized cake. The one I made was for a simple ~9″-by-5″ loaf pan.)

Topping/filling:

- 2/3c brown sugar
- 3 tbsp dutch-process cocoa
- 2 tbsp cinnamon
- 1 cup walnuts, chopped
- heavy pinch of salt

Cakey goodness:

- 3/2c flour
- 1 tsp baking powder
- 1 tsp baking soda
- 1 tbsp cinnamon
- 1/4 tsp salt
- 3/8c butter, at room temperature
- 3/4c sugar
- 2 eggs
- splash of vanilla
- 3/4c whole-milk yogurt

Recipe:

- Preheat the oven to 350 degrees Fahrenheit.
- Mix the topping/filling ingredients all together.
- Mix the cake’s dry ingredients (flour, baking powder + soda, cinnamon, salt).
- Cream the butter and the sugar in a stand mixer. Once uniform, add the eggs one a time, then finally the vanilla.
- Add the dry mix and the yogurt in alternate batches to the stand mixer.
- Grease a loaf pan.
- Pour half the batter evenly in the pan, then half the topping/filling.
- Pour the rest of the batter on top, then the other half of the topping/filling.
- Bake for about 50 minutes.
- Cool about 15 minutes in the pan, then carefully remove and cool more. Put in on its side or on a wire rack so the bottom doesn’t get full of moisture.

## The Results

It came out pretty good overall, with some caveats about the topping. The nuts on top over-toasted a bit, becoming very roasty-tasting, though not burnt. Furthermore, the spices on top didn’t really have a way to fully adhere to the cake, and partially fell off at the end. Next time I would either add some more traditional stuff to the topping (butter, flour) to make it stick better, or smash the topping into the top of the batter. I’d also like to try using sour cream or buttermilk instead of yogurt, for extra tang (yogurt was the laziest option, as I had some already).

Filed under: food, recipes | Leave a Comment

### The Lazy Gourmet, Part 2

This year hasn’t been the first that I discovered the joy of lazy recipes — I posted about cold brew coffee and no-knead bread a while back. However, I’ve made more additions to my slacker recipe collection since then.

- Breakfast is a glorious thing and pancakes are delicious. But I have to admit that even with a relatively simple recipe, there’s still the extra work of getting buttermilk, and the time it takes to babysit the pancakes, pouring them out in batches and watching them for the right time to flip. Because of this, I felt relieved when I started making dutch babies instead, which are basically one giant pancake you cook in the oven and slice up to share. Plus, they are a lot more fluffy than any pancake I’ve ever made, and taste great even with just a simple topping of lemon juice and berries.
- I own a pie bird, I’ll admit it. I think I’ve used it exactly once. And the apple pie that contained it was pretty delicious, but it was an awful lot of work. There’s the fact that you need two crusts for top and bottom (or if you’re masochist, a lattice crust on top). There’s the need for a starch in the filling so that it properly gels. There’s pre-baking the bottom crust. The egg wash. And, that bird. So from these over-engineered origins I recently switched to making galettes instead. This still has the crust, but only on the bottom, with the sides folded up on the edges to prevent leakage, or as someone in my family observed, it’s kind of an apple pizza. And, I was happy to find a recipe by Jacques Pépin that doesn’t need any fancy stuff. (Alton Brown’s apple pie recipe, by way of contrast, uses a half-dozen things I never have bought other than to make a pie.)
- Back in the day, I liked to make fajitas. These are pretty delicious, but require a ton of time spent in front of the stovetop, checking on the meat, peppers, onions, mushrooms and sauteeing them all in batches, only to have them get cold by the time you eat anyway. I switched this out for a shredded chicken recipe, which braises the meat in some simple ingredients and gives you a tasty sauce at the same time.

The recipes all use a food processor/stick blender, so I guess I have to recommend them to achieve optimal laziness. Here are the basic recipes.

- Dutch Baby
- Preheat the oven to 375 degrees F; bring 1/2 c milk and 2 eggs out of the fridge to come up to room temperature.
- Put 1 tbsp butter in a 8-10″ cast iron skillet, and leave 10 min in the oven. Melt 2 more tbsp of butter.
- Whiz 1/2 c flour, 3 tbsp sugar, 1/2 tsp salt in a food processor.
- Add eggs, milk, the 2 tbsp butter, 1/2 tsp vanilla to food processor and whiz about a minute, until homogeneous and frothy.
- Pour into pan, and bake 20 min without opening the oven.
- Serve with: berries and lemon wedges, or maple syrup.

- Apple Galette
- In a food processor, combine 1.5 flour, 1.5 tsp sugar, 1/4 tsp salt.
- Cut 1 stick (1/2 c) of cold butter into small pieces and put into processor. Prepare 1/3 c of vodka or rum, chilled with ice. (Or, ice water.)
- Process the dry ingredients + butter, drizzling the cold liquid into it, for 15 seconds or less. The dough should still be a little crumbly, but stick to itself when you squish a bit by hand.
- Form into a rough disk and refrigerate 1-24 hours. When ready to continue, take this out of the fridge to warm up a bit, to be rollable.
- Peel 4 apples. Slice: I use a gizmo like this to get slices that cook uniformly. Take half the slices and chop into small pieces (1/2 cm).
- Roll the dough. Avoid doing this in a hot area (i.e. right next to the oven), since you want the butter to remain solid. Aim for something like a 14″ diameter circle or a 12×14″ rectangle.
- Layer on the tasty things, leaving a 1″ border to fold later: add the chopped apples; 1 tbsp honey; the sliced apples; 2 tbsp sugar mixed with 1/2 tsp cinnamon; and 1 tbsp butter cut into small pieces. Fold up the sides.
- Bake 1 hour, or until browned. Let cool before slicing and serving.

- Shredded Chicken for Tacos
- In a dutch oven or heavy pot, briefly brown 1.5 lb of boneless chicken (thighs and/or breasts) in oil.
- Add a 14oz can of crushed tomatoes, 2 garlic cloves, 1 tsp each of oregano, cumin, and powdered onion, 4 chipotle peppers in adobo sauce, and salt+pepper to taste.
- Simmer, covered, about 45 minutes. If it looks dry, add some beer.
- Remove the chicken and shred with 2 forks.
- Puree the cooking liquid, e.g. with a food processor or stick blender and add back to the chicken.
- If you need to reheat, use a frying pan + oil, working in small batches.

Filed under: food, recipes | Leave a Comment

### Minimum-cost Flows

About a month ago, the Facebook Hacker Cup included a nice programming problem involving a certain kind of matching. The *min-cost bipartite matching problem* can be stated like this:

- You have a number P of people, and a number Q ≤ P of jobs. Each person is willing to do any of the jobs, but each person will charge a different price for each job. You are only allowed to hire each person to do at most one job, and you need every job to get done. What is the cheapest way to hire employees for all the jobs?

Just as a quick concrete example, let’s say we have P=3 people (Alice, Bob, Carl) and Q=2 jobs (Vacuuming, Window-washing).

- Alice charges $50 to do vacuuming and $40 to do window-washing.
- Bob charges $30 to do vacuuming and $20 to do window-washing.
- Carl charges $60 to do vacuuming and $30 to do window-washing.

In this case, the minimum cost to get all the jobs done is $60: pay Bob $30 for vacuuming, and pay Carl $30 for window-washing.

I had known about one way to solve this sort of problem, but I later learned another approach called *The Hungarian Algorithm*. It’s a very efficient approach, taking O(Q²P) time. In this post I’ll explain how it works, and in the next post, I will explain how it generalizes to an algorithm to the *min-cost max-flow* problem, specifically the *capacity-scaling successive shortest paths algorithm.*

## Basics

There are some preliminaries that I am going to build on here, which are generally covered in any introduction to flows:

- bipartite matching can be thought of as a flow problem, by attaching a source node
**s**to all people, directing edges from people to jobs, and attaching all jobs to a sink node**t** - given some flow in a graph, the
*residual graph*represents the unused capacity of the edges, together with added*reverse edges*that represent the cost that can be saved by decreasing the flow on that edge- in the min-
*cost*flow setting, a reverse edge’s cost is the negative of the original edge’s cost (i.e., cost savings)

- in the min-
- an
*augmenting path*is a path in the residual graph (using edges with nonzero capacity) from the source**s**to the sink**t**

The simpler *max-flow* problem, where we don’t care about costs, can be solved like this:

- As long as the residual graph has an augmenting path from
**s**to**t**, increase the flow on any augmenting path.

There are several ways to prove this algorithm’s correctness. Here is one way, which will be relevant later on:

Let F be some non-maximum flow, and X be any maximum flow. View the difference between X and F as a flow in F’s residual graph. By the

flow decomposition theorem, this difference can be written as a nonempty list of augmentings–tpaths plus a possibly empty list of cycles. Therefore, there is an augmentings–tpath, and we can keep improving F.

Let N be the number of nodes, and M be the number of edges. For unit capacities, this simple max-flow approach takes at most N iterations of a path-finding algorithm, which itself takes M steps, and so the total time complexity is O(MN).

Moreover, this simple algorithm works quite nicely when we add capacities to the edges (but still ignoring costs):

- Taking the
*shortest*path each time yields a maximum flow in O(MN) iterations; - taking the
*fattest*path (highest residual capacity) yields a maximum flow in O(M log K) iterations, where K is the maximum capacity.

## Greed and Costs

A *greedy* algorithm is usually one that just makes the cheapest decision at every point in time, without worrying about troubles that can come down the road from not planning ahead. Here’s the simplest greedy algorithm you could imagine for our matching problem now that costs are involved:

- Repeatedly do the following: out of all the people you haven’t hired and jobs you haven’t filled, pay the cheapest price you can to get one more job done.

However, this algorithm is not correct. In our example above, we’d first pay Bob $20 to do window-washing, since that’s the cheapest of all prices. After that, the cheapest price that doesn’t involve Bob or window-washing is to pay Alice $50 to vacuum. This total cost of $70 is not optimal.

Nonetheless, even once you add costs into the picture, there still remains a very elegant greedy algorithm that successfully solves min-cost flow:

- As long as the residual graph has an augmenting path from
**s**to**t**, increase the flow on any**minimum-cost augmenting path**.

So in our example, the second iteration would use the augmenting path [pay $30 for vacuuming by Bob, refund $20 from Bob for not window-washing, pay $30 for window-washing by Carl] which has cost $30-$20+$30 = $40, less than the direct path [pay $50 for vacuuming by Alice].

I was surprised the first time that I read this algorithm is correct! But, it has a very nice inductive proof. In the proof, the *value* of a flow is the amount of flow it sends from the source to the sink (so max-flow is really the problem of finding a flow with maximum value).

For simplicity, assume all capacities are unit. Let F be a minimum-cost value-

kflow. Let X be any minimum-cost value-(k+1)flow. View the difference between X and F as a flow in F’s residual graph. By the flow decomposition theorem, this difference can be written as a single augmentings–tpath P plus a possibly empty list of cycles.None of the cycles can have negative cost, because they could have been used to augment F into a cheaper value-

kflow. Therefore, augmenting F by P gives a value-(k+1)flow whose cost is at most X’s cost, and which is therefore a minimum-cost value-(k+1)flow.

Let’s assume unit capacities for now (which is true in the application to min-cost bipartite matching). This algorithm needs O(N) iterations. How can we find the minimum-cost augmenting paths? The residual graph has negative-cost reverse edges, but crucially, the above proof shows that it never has a negative-cost cycle. So the Floyd-Warshall algorithm in O(N³) time, or the Bellman-Ford algorithm in O(NM) time, can be used in each iteration. This gives a total time complexity like O(N²M).

We can do better!

## Using Your Full Potential

The Hungarian Algorithm, when viewed in the context of flows, lets us improve the above algorithm simply by switching to a better algorithm for shortest paths. Namely, Dijkstra’s algorithm. The only problem is those pesky negative edges; Dijkstra’s algorithm won’t work with those!

There is, however, a way to transform some graphs to avoid negative edges. The transformation is to use a *potential function*, which I’ll denote by **y**, since it’s like a dual solution for the associated linear program.

- A potential
**y**is a vector that assigns a real value to every node. - A
*valid*potential is one that satisfies, for every edge uv, y(v) ≤ cost(uv) + y(u). - The
*reduced cost*of edge uv is defined as reduced_cost(uv) := cost(uv) + y(u) – y(v);- so a potential is valid if all reduced costs are nonnegative.

Potentials are a way of transforming a graph in a superficial way:

- Let the
*reduced distance*from u to v be defined as the shortest path distance from u to v, along the reduced costs. - Then you can prove that the reduced distance from u to v doesn’t depend on the path:
- the reduced distance equals the normal distance from u to v, plus y(u), minus y(v).

Conversely, if we compute the reduced distances, we can easily recover the real distances. So as long as the algorithm can maintain a valid potential, the reduced costs are nonnegative, and it can use Dijkstra’s algorithm to find the shortest **s**–**t** path.

How can we maintain a valid potential? It turns out that not only are potentials useful for finding shortest paths, but also shortest paths are useful for computing potentials.

- In a graph with no negative cycles, define
**y**(v) for every node v to be the shortest-path distance from**s**to**v**. Then this**y**is a valid potential.

This is useful algorithmically because it lets us leapfrom from a valid potential, to shortest paths, to the augmenting path needed to augment the min-cost flow, to the next potential. Expressed this way, the Hungarian algorithm looks like this:

- Let
**y**be the all-zero vector. - Using
**y**as a valid potential, find a shortest**s**–**t**path**P**, stopping if none exists. - Redefine
**y**to be the shortest path distances from**s**. - Push a unit of flow along
**P**and update the residual graph. - Go back to step 2.

To prove that this actually works, we need to show that when we update the residual graph, **y** remains a valid potential.

This is because the edges

uvon any shortest path fromssatisfy satisfy d(s, u) + cost(uv) = d(s, v), which is a form ofcomplementary slackness.So adding the reverse edgevuwith cost(vu) = -cost(uv) will not violate the validity constraint y(u) ≤ cost(vu) + y(v) for the potential, as it holds with equality.

That’s it!

## Remarks

Some explanations of the Hungarian algorithm are slightly different in step 3, rephrasing it as the equivalent “Increase **y**(v) by the reduced distance from s to v, for each node v.”

Historically, the algorithm is based off of ideas of Kőnig and Egerváry, with Kuhn in 1955 being the first to consider a polynomial-time implementation. This algorithm is the oldest one that I know of where you get a substantial benefit from thinking both in the primal (the flow) and dual (the potential) at the same time. Similar ideas were used in 1965 by Edmonds to derive an algorithm for min-cost non-bipartite matching; more on that to come in the following post.

The algorithm above will work eventually for graphs with arbitrary edge capacities; but if the maximum capacity is K, the number of iterations might be something like K or larger. In the next post we’ll get that back down to polynomial-time by giving an algorithm that takes O(M log K) iterations.

Filed under: algorithms, graphs, programming | Leave a Comment

### ¾ million problems!

I wanted to post a short update on CS Circles, my project with the University of Waterloo CEMC to teach introductory programming online.

I’ve been checking the total number of problems solved on the site, and it has just surpassed 750000! I’m pretty sure I’ve never done 750000 of anything in my life except maybe eating grains of rice. More than 20000 users have solved the first few problems, while for the hardest problems, only slightly more than 500 students have solved them.

One of the most interesting things about CS Circles is how the “ask for help” feature has worked in practice. This feature is available on every programming problem, and allows students who are stuck to send in the current version of their code alongside a few sentences explaining where they are stuck. They can ask for help from their own teacher if they are registered on the site, otherwise they can ask the “CS Circles Assistant” which is me and my colleagues Sandy and Troy. The interesting thing is that more than half of the questions I get are already solved before I log in! I take the weekend shift so the questions I see have been sitting there about 24 hours. I remember using the Piazza system at university where we strived to answer every question within hours! I guess we should just be more lazy and most people will figure it out sooner or later, after they’ve taken the time to verbalize exactly what is the problem.

The other update is about the source code. CS Circles became open-source back in 2013, thanks to a SIGCSE special projects grant. Over the last couple of years, a team of educators in Serbia extended the project in two ways: first, they created a Serbian translation; second, they extended the underlying infrastructure to add new functionality. This includes look & feel improvements, an involved improvement to run multiple courses at once (in their case, multiple difficulty levels), and the ability to run time-limited and feedbackless quizzes. It was great just to know that the project was useful to them, and we also were able to publish a report on the changes and how the students liked it.

Are you interested in learning either Serbian or Python? Then you can check it out at http://imi.pmf.kg.ac.rs/imipython/

Filed under: CS Circles, education, language, open-source | Leave a Comment