Websheets

03Jul15

Next week I am headed to the ITiCSE conference (Innovation and Technology in Computer Science Education), in Vilnius, Lithuania, to present Websheets, an exercise system for programming problems in C++ and Java.

Websheets combines a couple of ideas.

  • It is an online system where students get immediate feedback on their programming solutions. This has become very common in recent years, though not many support C++, and not many are open-source (the biggest exception being the awesome CloudCoder system).
  • An exercise isn’t just a big blank page, rather it’s a template with specific gaps to be filled in. The motivation for this is a system of hard-copy handouts used at Princeton… the idea being that, if you are just teaching one topic that day, and you have limited class time, it would be better to give the boilerplate and off-topic bits of the program to the students ready-made, so they can make better use of their limited in-class time.

Try it out! Go here and read all about it:

http://cscircles.cemc.uwaterloo.ca/websheets.html

If you create any cool exercises to share, or find ways to break it horribly, I’d love to know.

In the last couple of weeks I’ve done a few fun things to it:

  • I combined the Java version from Princeton and the C++ version from USC.
  • Taking inspiration from CloudCoder, you can now create and edit exercises online.
  • Teachers can now keep track of their students, like we did last term at USC. (We hosted some very low-value homeworks on the system; basic stuff to complement the readings and help students stay focused.)

Moving past the technological part, I’d like to eventually do a study to measure the effects of different types of scaffolded exercises in the classroom. Is it better to provide more scaffolding or less? Though one may be faster, do you learn more from doing more work? Or is the code reading of the scaffolding itself an important skill?


Visiting Iran

23Jun15
Tehran's awesome mountain backdrop

Tehran’s awesome mountain backdrop

I had an amazing vacation this year! For 3 weeks I travelled to Iran where I got to visit several historic cities, meet my wife’s extended family, and participate in my sister-in-law’s amazing wedding. This was followed by one more fantastic week in Mexico for my brother’s destination wedding, which deserves its own post later.

One thing several friends have asked is: how difficult is to get to Iran in the first place? The answer for us was, pretty easy, it was just a lot of paperwork. (Caveat: this is only our answer from the perspective of a non-Iranian married to an Iranian, not just a visiting friend.) Here’s what it was like for us:

  1. We registered our marriage with a “daftar” in Toronto, which took about an hour during the week we got married last summer.
  2. We registered our marriage with Iran’s offices in Washington, DC. This is a long form that requires copies of everything from step 1, our Canadian marriage documents, as well as a police record check, passport copies, signatures from her dad, passport photos… and I sent a blood test for AIDS and drugs too though I no longer see that requirement listed.
  3. Sara’s parents sent a visa invitation letter to Tehran.
  4. Once that was approved, I got a visa application number and sent in my visa application paperwork to Washington along with my passport. I got the passport back in a few days.
Fakin' bacon

Fakin’ bacon

We booked our flight somewhere in the middle of the process since there was a good sale, but luckily all of the steps went without a problem. We sent in the Step 2 documents in February and Sara’s parents did Step 3 in March (be careful that the government stops everything for 2 weeks in late March for the Persian new year).

Our trip actually started by visiting Dubai in the United Arab Emirates, just for the sake of seeing something new. There, we made a Canadian pilgrimage to several Tim Hortons locations. They had freshly squeezed orange juice, faux bacon breakfast sandwiches, and delicious hot brown water, needless to say. We also went in the UAE coast of the Persian Gulf, had a nice IHOP breakfast opposite the world’s tallest building, visited the spice/gold bazaars, did some shopping at the gigantic Dubai Mall, and went on a short boat tour. We walked a LOT which was pretty challenging because it was super hot and humid.

Jamshidiyeh Park

Jamshidiyeh Park

After that short visit we took our next flight, bringing us to Tehran. It is the capital city of Iran, where Sara grew up. Tehran is only about 250 years old (pretty young by Persian standards) and it is a massive city. Its population is over 8 million, approximately equal to New York City, and it also has about the same size. The first couple of days there were spent exploring her neighborhood and seeing some of her family. Her dad offered to buy me some ice cream from the nearby Bastani Madar, so I took him up on his offer and got bastani ob havich, which is a scoop of vanilla ice cream floating in a cold glass of carrot juice. Sounds weird, I know, but anyone who’s had it can attest to its deliciousness. We managed to check out some sights at Jamshidiyeh Park and Niyavaran Palace before moving on to our next stop.

IMG_0505We took an overnight train with Sara’s mom to Isfahan. This city, a former capital of Persia, is known for its art and its architecture — even the train station had lovely arches rising up to the ceiling. On the right you can see a picture of the big “bazaar bozorg” with several notable features. First, more beautiful ceilings with geometric cutouts leaving lighted spots on the floor. Second, a car headed in my direction, which was typical traffic, pretty amazing given how narrow and twisted the pathways were.

---

Si-o-se pol

Esfahan had a lot of different sights to see. There is a huge rectangular public square called naqsh-e-jahan; at its center are gardens and pools, around the sides are vendors (part of the bazaar), and it is flanked by 4 main features, one on each side: a big mosque, another even bigger mosque, a palace, and the bazaar’s main entrance. The city’s wide boulevards bring you from this area through several parks, palaces, and si-o-se pol, a beautiful old bridge. Out of town, we visited an ancient zoroastrian fire temple (ateshkadeh).

One of the most memorable parts of our visit to Esfahan is actually the hotel we stayed at, Hotel Abbasi, which had not only a kickin’ breakfast buffet, but also a central tea garden where we refreshed ourselves in the evenings with tea, sweets, soup (ash-e-reshteh) and ice cream.

Hotel Abbasi

Hotel Abbasi

Part of Persepolis

Part of Persepolis

Our next destination was Shiraz, yet another former capital city. We started with one of Iran’s most famous historical sites, Persepolis, located an hour’s drive outside of the city. This is where the Persian empire was centered during the Achaemenid days of Cyrus the Great, Darius, and Xerxes. It was burnt down by the army of Alexander the Great in 330 BC — reputedly in revenge for Xerxes’ burning of the Acropolis in Athens 150 years earlier.

Persepolis was a large series of palaces, the size of a village, with foundations and pillars made out of stone. From a historical perspective it is pretty amazing that it’s survived 2500 years because that period includes several conquests of Persia by caliphates and mongols that tended to destroy all cultural artifacts they could get their hands on. The reason that it survived is that it was buried by dirt and sand not long after it was burned to the ground, and only in the mid-1800s did archaeologists re-discover and start to excavate the site.

Mikhai bekha, nemikhai, nakha

Mikhai bekha, nemikhai nakha

Visiting Persepolis brought me face to face with a lot of ancient Persian history: the cuneiform script in which Old Persian was written, Zoroastrian history, and the Farvahar symbol. But a lot of the most interesting parts of the story are still visible. On the Apadana Staircase you can see an inscription showing the annual tribute festival where distinctive people and gifts are depicted from each of Persia’s 28 provinces at the time (spanning the globe from India to Ethiopia). This was definitely an important place to hire a guide since he was able to explain this history, make us aware of the symbolism, and bring life to the ancient ruins.

Here are a couple more cultural highlights from Persepolis thanks to our guide:

  • Iran Air logo imitation

    Iran Air logo imitation

    The main entry stairs to Persepolis were constructed with an especially short height because it was important that world leaders, walking up these stairs together in their flowing robes, could talk without fear of tripping.

  • The logo of Iran Air is based on the two-headed cat statues of Persepolis (see right).
  • Those cats had interchangeable ears depending on the occasion (like Mr. Potato Head).
  • If a Persian grabs you by two fingers, they are just bringing you to a party: every delegation pictured on the Apadana staircase was led by a Persian guide doing this. Make sure to bring presents.
  • Unlike the great pyramids, Persepolis was not built by slavery. Archaeologists have found records of payments to builders, and health insurance.

Shiraz is also home to relatively newer attractions. We went to an amazing kebab place that evening and had salad shirazi (diced salad of tomatoes, cucumber, onion and lime juice). A surprising source of culinary inspiration was the tomb of Hafez, the most famous Persian poet. It was one of several places that we had faloodeh, the famous rosewater sorbet with vermicelli noodles served with lime juice.

IMG_0631

The tomb of Hafez

The teahouse also had masghati, a.k.a Turkish delight. I remember eating this once in Turkey but it was just kind of weird. The one at Hafeziye was pliable, fragrant, and delicious! In explaining it later to my friends I have described it a big spherical gummy bear with nuts inside of it, and this seemed to help more people try it :)

We visited a few other things in Shiraz including a lively public park (a nice break from the tourist attractions), an old citadel, a beautiful palace called Naranjestan (literally, land of the oranges) and the Nasir al-Molk mosque due to its famous stained glass windows.

Just as we were leaving Shiraz, some random people on the street offered us some free beverages. The cab driver explained that this is common on holidays so we grabbed a few and it turned out to be an extremely delicious lime cordial, simply made from water, sugar, and juice from local shirazi limes. Merci, Shiraz!

Once back in Tehran, several members of Sara’s family had dinner gatherings which were awesome. Her aunts and cousins were great to meet and hang out with, and extremely gracious in letting me practice my Farsi. They cooked so much food for us! I may have arrived in Iran with a little bit of a Persian tongue, but I left with a fully Persian belly.

Sofreyeh Aghd

Sofreyeh Aghd

A few days later, Vida and Navid’s wedding took place in the outskirts of Tehran. It was completely awesome. Sara and I commissioned a sofreyeh aghd for our own Persian-Canadian wedding last year, so I had some general idea of what to expect. But, the location of their wedding was not only beautiful and elegant, but huge! It was an outdoors setting with gardens, streams, ponds, musicians, tea, but more importantly, a ton of food and lots of dancing. The area with traditional instruments included a daf (like a large tambourine) and tombak (another drum) where, as I hope the wedding footage will attest, we got some of Navid and Vida’s aunts and uncles on the dance floor. I want to thank both of them heartily for including us in their wedding party on their special day.

Darband

Darband

We had one night in Tehran to relax by ourselves. Thanks to the suggestion of Sara’s cousin Azita, we went to visit Darband, and it made for one of the best nights in the whole trip. Darband is a path up in the mountains to the north of Tehran, nestled in a long crevasse populated by kebab stands, dried fruit merchants and cafes. The path continues up to hiking though we didn’t make it that far, getting distracted by kebab barg and shishlik for dinner, followed by shisha, tea and Persian dates on a traditional platform (takht). The next time I come back, it will definitely be an area ripe for further investigations.

Following this, we made the trip up to Mazandaran province where Sara’s parents are originally from, partied with even more of her family, and got to put our toes in the Caspian Sea thanks to her niece Negar.

If you are planning to visit Iran, two other practical things are useful.

  1. A practical thing that I strongly recommend is getting a data SIM card for your phone. We got one in an Isfahani mall; it was very cheap though it required some paperwork that Sara’s mom filled out. We went with Irancell (MTN), which seemed like the most popular national carrier, and bought a data plan for it later on (we bought prepay-refills from corner stores and then bought “plans” using text messages).
  2. I brought a Lonely Planet travel guide and it was invaluable, especially because we were visiting so many cities (and because several of those visits were before I had the data SIM). Though not all of the recommendations were totally accurate (the Shahrazade restaurant in Isfahan was pretty mediocre) I have to say that there was a ton of great food (Shater Abbas in Shiraz!) and zero food poisoning, plus we wouldn’t have known about most of the tourist sites otherwise. If you want to borrow it let me know!
  3. There’s a surprisingly large number of tourists in Shiraz and Isfahan; most of the ones we saw were from France, Germany, and northern Europe. Everything was safe and the locals everywhere were friendly, though you will of course be able to navigate things more easily if you travel with someone who is fluent in Persian.

If you do go, safar khoob daashte boshi — have a good trip!

Naranjestan, with people in traditional Shirazi cloth

Naranjestan, with people in traditional Shirazi dresses


Today I pushed some code that I used last semester for C++ visualization to github. The bad news is that it’s smoke and mirrors, but the good news is that it did the trick: it helped to convey important ideas about flow control and programming semantics to my students.

This is based off of the Python visualizer at http://pythontutor.com by Philip Guo. His visualizer takes an arbitrary Python program as input, and then creates an animated HTML illustration showing which lines of code are executed in what order, and also showing the values of the variables as they change. Eric Grimson had an antique visualizer running at MIT when I was a TA for their course 6.001, but the great and novel aspects of Philip’s one are that it was freely available online, could be embedded in other web pages, had a much nicer interface and output, and used a language that has fewer scary parentheses (and is for this reason more widely taught).

A couple of years ago, I scavenged the frontend of that visualizer and wrote a Java backend for it, creating the Java visualizer. This was very helpful during my stint at Princeton, where the introductory programming course I taught used Java.

Flash-forward to the most recent academic year, teaching C++ at the University of Southern California. I was very used to having a visualizer to show new concepts like for loops, function calls, object-oriented programming and recursion… I’d used it both in Python for CS Circles and for the Java course. The bad news is that the implementations in Python and Java are heavily dependent on modern features, such as a debugging API and reflection, which C++ doesn’t readily offer.

So here’s the “good enough” solution: write programs that are actually Java, but ‘skinned’ to look like C++.

Click on this link to see an example.

The actual code being run by the visualizer is the following:

public class C { public static void main(String[] args) { //><int main() {
 int x = 2;
 int y = 2;
 System.out.println(x+y);//>< cout << x+y << endl;
}} //><}

So it’s Java code, but the parts before //>< are being selectively hidden, so it looks like C++. More examples are available in the course notes.

The biggest disadvantage of this approach is that students can’t enter their own C++ code for visualization, which is a huge tool for learning (and debugging!) when available. But, it’s definitely better than nothing, and it saved me dozens of hours compared to creating PowerPoint animations for every single new control flow topic.

There is some hope that a real C++ visualizer could come about. At least two people have told me they have students working on it, both using text piping to GDB as a means to accomplish it. I think there’s some hope that LLDB’s debugging API could be used; or alternatively by using the same technology as valgrind. Until then, you can read about the fake C++ visualizer in all its glory here.


Happy Nooroz!

21Mar15

IMG_20150321_101634

For the first time since living in the US, we put together a haft seen ( هفت‌ سین : seven S’s) table! It has:

  • center: garlic (sir / سیر)
  • left: lentil sprouts (sabzeh / سبزه)
  • counterclockwise: apple (sib / سیب),
  • wild olive / oleaster (senjed / سنجد),
  • vinegar (serkeh / سرکه),
  • wheat germ pudding (samanoo / سمنو),
  • and sumac (سماق)

There are some bonus items too:

  • coins / sekkeh / سکه
  • wild rue incense / esfand / اسفند
  • a book by Hafez / حافظ
  • noon nohodchi (from Toronto; thanks Navid and Vida!) / نون نخودچی
  • on sofreh termeh fabric / سفره ترمه

And not pictured, the board scraper that I used to recover the samanoo when it exploded in its bag on the bike ride home.

I hope you have a prosperous new year!


About 2 weeks ago, I went to SIGCSE, which is the major computer science education conference. I talked about Computer Science Circles, its translations to other languages and translations of Python error messages to English; see the poster here. Like my last trip to SIGCSE, it was pretty great. I got to re-encounter previous colleagues, meet new ones, commiserate about C++, look at cool tools, learn more about pair programming, and I got to attend the famous “Nifty Assignments” session.

The food scene was also a good surprise! The event was held in Kansas City, Missouri. It turns out that not only is there a really great cafe (“The Filling Station”) tucked inside of an old gas station:

IMG_20150307_163459

But there was also an amazing BBQ joint inside of another gas station (this one in Kansas City, Kansas):

IMG_20150307_174949

Just to round things out and prove that there is food outside of gas stations, here are some shots from the farmer’s market in the River Market district:

IMG_20150308_115243 IMG_20150308_121215


What Is ASCII?

15Feb15

Disclaimer: I am not a historian. I am just a Computer Science educator who likes reading Wikipedia and printing funny characters. This explanation is a long optional appendix for a programming assignment in my course.

ASCII is the American Standard Code for Information Interchange. Developed starting in 1960, it was a way of standardizing means of communication between computers.

To explain, it’s useful to note that “8 bits” means two things from the perspective of a C++ programmer.

  • These days all computers have standardized on organizing memory in bytes. That is to say, each memory address contains 8 bits.
  • In C++, a char variable, representing a single character for communication, is also 8 bits wide.

Neither of these has been true forever. (Actually, some of the earliest computers did not use binary! But we’ll only be talking about bit-based computers here.)

  • There were machines where each memory address contained some other number of bits like 22, 12, 18, 25, etc; see this table on Wikipedia.
  • People used 5-bit Baudot codes for early communication. At 32 possible codes, that was just big enough for the English alphabet, but not lowercase letters or digits. This gradually expanded over the years to 6-bit, 7-bit and finally 8-bit codes.

However, 7 bits was sort of a sweet spot. At 128 possible codes, there was enough room for both lower-case and upper-case letters, digits, punctuation, with space left over for “control codes.” The development of ASCII in the 1960s was led by the American Standards Association. It proposed a standard meaning for all 128 values in a 7-bit code:

  • The first 32 codes (0-31) were specific “control characters.” They didn’t have a direct graphical representation but instead had a specific semantic meaning. Notable examples are #0 the null character, #8 backspace, #9 tab, #10 line feed, and #13 carriage return.
  • Character 32 was a space.
  • Characters 33-126 meant the following, in order:
    #33-63: !"#$%&'()*+,-./0123456789:;<=>
    #64-95: @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
    #96-126: `abcdefghijklmnopqrstuvwxyz{|}~
  • Character 127 meant “delete”. Here is a quote from Wikipedia that gives a nice explanation:

    This code was originally used to mark deleted characters on punched tape, since any character could be changed to all ones by punching holes everywhere. If a character was punched erroneously, punching out all seven bits caused this position to be ignored (or deleted).

Anyway, the idea of proposing ASCII as a way of standardizing was pretty good. It meant that if you had two machines communicating with each other, they had a chance to understand one another.

ASCII is ubiquitous now, but that was not always the case. Quoting this CNN article,

[T]here was an 18-year gap between the completion of ASCII in 1963 and its common acceptance. … Until 1981, when IBM finally used ASCII in its first PC, the only ASCII computer was the Univac 1050, released in 1964 (although Teletype immediately made all of its new typewriter-like machines work in ASCII).

The most well-known system not compatible with ASCII was EBCDIC. EBCDIC was developed in 1963. It is deeply weird in the sense that the letters of the alphabet don’t all contiguous codes. But there’s reason for this; it has to do with the Hollerith card code. Hollerith worked for the US Census and founded a startup in 1896 that was the precursor to IBM. The common punchcard standard that eventually dominated had 12 rows but not all 2048 combinations were viable, since it would destroy the card’s physical strength. So typical characters in a Hollerith code would punch up to 1 out of three “zone” rows plus up to 1 of out of nine “numeric” rows. Encode this as decimal and you get 36 choices from “00” to “39”. Then encode that as binary and all of a sudden you have a gap, e.g. formerly-adjacent codes 19 and 20 are all of a sudden 011001 and 100000. (This oversimplifies a bit, but a summary of the main point is that the BCD in EBCDIC means binary-coded decimal.)

029-64-250Also, EBCDIC did not contain all the punctuation characters that ASCII did. Some keyboards set up for EBCDIC did not have keys for characters like ^, { or }. See the 1972 international standard ISO 646, which is sort of a missing link between EBCDIC and ASCII. The presence of hardware and operating systems unable to support all characters is the reason that “C digraphs” and “C trigraphs” exist, e.g. that int x = 3 ??' 5 sets x equal to 4. (For applications of C di/trigraphs, see the IOCCC.)

So in a nutshell EBCDIC existed due to the fact that hardware and software’s evolution was gradual and intertwined.

There’s another acronym that you’ll see a lot in researching this subject. Specifically, you’ll sometimes see ANSI as a synonym for ASCII, or to mean a variety of related concepts. What actually happened is that the American Standards Institute, which was the organization to develop ASCII, renamed itself the American National Standards Institute (ANSI) in 1969. From this ASCII gained another name retroactively, the “ANSI X3.4″ standard. (And ANSI released more standards later on.)

Towards 8 bits

Eventually, ASCII was adopted more and more, which is why you don’t have an EBCDIC table in the appendix of your textbook. To this day, the word ASCII still technically refers to the 128-character 7-bit system mentioned above.

Over time, computer architectures did eventually standardize on word sizes and addressable locations that used bytes (8 bits). The PDP-8, sold commercially between 1963 and 1979, was the last popular machine whose word size was not a power of 2 (it used 12-bit words).

Hence, there was some wiggle room. A file on your hard drive, or a character being transmitted from computer to computer, or a char variable, actually had 8 bits of space, but ASCII only used 7. If you were to use all 8 bits, you could encode 256 possibilities, which was an extra 128 characters! This was particularly useful for non-English speakers, since most languages use accents or different letters not present in English. Spanish users would be interested in having ñ get one of those values in the unused space from #128-255, while French and Turkish speakers could use ç, Germans would like to add ß, Russians could use И, etc. But, there wasn’t enough space to satisfy everyone.

Enter the code page. This system, which became particularly common with DOS 3.3 in 1987, meant that every user could make their own choice about what system of letters would appear in those extra 128 slots. The most common English code page, CP 437, used those slots for the following characters:

#128-159: ÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜ¢£¥₧ƒ
#160-191: áíóúñѪº¿⌐¬½¼¡«»░▒▓│┤╡╢╖╕╣║╗╝╜╛┐
#192-223: └┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀
#224-254: αßΓπΣσµτΦΘΩδ∞φε∩≡±≥≤⌠⌡÷≈°∙·√ⁿ²■

(The last character #255 is “non-breaking space,” another control character.)

Those lines and boxes were extremely useful in creating an entirely new domain, ASCII art. (Or maybe it is called ANSI art? Technically it should be called CP-437 Art but that name didn’t seem to take off.)

Of special mention is that on many machines of the DOS era, you could directly manipulate the screen’s contents. A program could set your monitor to text mode, for example 80×25 where there are 25 rows and 80 columns, each with room for a single letter. (Your VM looks like this when  it boots up.) This is extremely similar to a bitmap: a 80-by-25 array of char values. In fact, the system assigned visual symbols or glyphs for all possible values from 0 to 255, even the non-printable control codes. So text-based games/art/applications of the day also had access to these symbols (in CP 437):

32

There were many other code pages. Almost all of them stuck to ASCII for the first 128 values (#0-127) and then added the 128 additional characters of local interest. E.g. here is what CP 866 contains at these positions:

#128-159: А‌Б‌В‌Г‌Д‌Е‌Ж‌З‌И‌Й‌К‌Л‌М‌Н‌О‌П‌Р‌С‌Т‌У‌Ф‌Х‌Ц‌Ч‌Ш‌Щ‌Ъ‌Ы‌Ь‌Э‌Ю‌Я
#160-191: а‌б‌в‌г‌д‌е‌ж‌з‌и‌й‌к‌л‌м‌н‌о‌п‌░‌▒‌▓‌│‌┤‌╡‌╢‌╖‌╕‌╣‌║‌╗‌╝‌╜‌╛‌┐
#192-223: └‌┴‌┬‌├‌─‌┼‌╞‌╟‌╚‌╔‌╩‌╦‌╠‌═‌╬‌╧‌╨‌╤‌╥‌╙‌╘‌╒‌╓‌╫‌╪‌┘‌┌‌█‌▄‌▌‌▐‌▀
#224-255: р‌с‌т‌у‌ф‌х‌ц‌ч‌ш‌щ‌ъ‌ы‌ь‌э‌ю‌я‌Ё‌ё‌Є‌є‌Ї‌ї‌Ў‌ў‌°‌∙‌·‌√‌№‌¤‌■‌

The picture is further complicated by right-left languages and joined-letter scripts like Arabic, and languages like traditional Chinese with way more than 128 writing symbols.

In English communication, the Windows-1252 codepage became dominant. Compared to CP 437, it lost the line art, but gained smart quotes, fancy dashes, and other useful punctuation. (This made since because in Windows, they had real windows made out of pixels, rather than line art text mode graphics.)

For our Caesar Decipher assignment, the files will be given to you in the Windows-1252 format. (Technically, the internationally standardized Latin-1 subset.) But it shouldn’t matter, since your program will only do anything to those char values between ‘A’ and ‘z’, which are in the “official” ASCII range between 0 and 127.

Unicode

Most modern communications today, especially international ones, use a more recent system called Unicode.

One problem with the codepage system above is that you don’t really know what a file is saying just by looking at its bits. You can’t be sure if (char)161 means б‌, í, or something else. And there’s absolutely no way to represent a file that contains more than 256 distinct characters. (Such as this very article.)

This was eventually solved by Unicode, which began as an attempt to bring all of the code pages into a single system. Unicode is based on two principles:

  • One unified and fixed numbering of all possible symbols. For example, б is 1073 (“CYRILLIC SMALL LETTER BE”) and í is 237 (“LATIN SMALL LETTER I WITH ACUTE”). Each possible symbol is called a unicode code point.
  • A variety of different encodings, i.e. systems to encode a given sequence of code points as a series of bytes. (The most common encoding these days is UTF-8, a variable-width encoding.)

This system still had its growing pains and we’ll just mention the most noteworthy miscalculation. Joe Becker said in 1988,

Unicode aims in the first instance at the characters published in modern text (e.g. in the union of all newspapers and magazines printed in the world in 1988), whose number is undoubtedly far below 2^14 = 16,384. Beyond those modern-use characters, all others may be defined to be obsolete or rare; these are better candidates for private-use registration than for congesting the public list of generally useful Unicodes.

The Java language was even designed based on this assumption: a Java char is a 16-bit variable. However, this didn’t last long, and as of writing, the latest version of Unicode, version 7.0, contains 113,021 different characters (code points). In Java, this entailed the use of the UTF-16 encoding. In C++, you can read about the platform-dependent wchar type.

For historical context, you might read Kiss your ASCII Goodbye, written in 1992.

One thing is clear: at some point, maybe related to surpassing the 16,384-character limit, the Unicode consortium changed their definition of the “modern-use characters” that could fit in the standard. For example, character 9731 is ☃ (SNOWMAN). Try copying

http://☃.net

into your browser address bar.


Over the last couple of weeks, I’ve been building a graphics/animation library to use in the class that I’m teaching at USC. It’s now in a stable state and has the features that I think I’ll need this semester.

A lot of the motivation comes from Princeton where I taught previously: the class there, taught in Java, uses the awesome book, assignments, and libraries of Kevin Wayne and Bob Sedgewick. StdDraw.java, their library, Just Works™. You don’t need to create any objects, or initialize any canvas, or create a hook. If you want a circle, just call StdDraw.circle(x, y, r). This makes it reasonable to write code for a beginner class that students will actually read, and reasonable for students to write their own clients. So I’ve tried to emulate that.

Of course, with Java, you have the benefit of native platform support, and their library is a wrapper over the java.awt and javax.swing libraries. In my case, after a lot of false starts, the only C++ library that

  • I could get working on my VM
  • I could get making graphics and animation

was “Qt.” More on that is below. But I’m pretty happy with how it turned out.

Here’s a little picture showing it in action:

If you’re interested, here is the:

Caveats

There were a few things that made life unnecessarily difficult, at least for some shlub with a Java/Python background like me.

  1. I have no idea if someone not using the course VM will be able to get it to run, since they’ll have to install Qt. And adding audio & .mid support is another order of magnitude difficult for the student.
  2. Every function call causes a pretty long chain of events, and as a result, there’s a ridiculous amount of boilerplate in draw.cpp, so every function has to be declared 4 times in slightly different ways:
    • The user calls a library function.
    • The library function, a static function, calls a public member function of a QObject.
    • That member function “emits” a protected “signal”.
    • It’s transmitted (through a cross-thread queue) to the “slot” of another object, the GUI’s “QWidget”.
    • That slot actually does the work.
  3. You need threading in order for the student’s calls to be non-blocking, which is important for animation. But clang++ doesn’t support threads properly. So we have to use g++, which is not our course compiler, and is not as portable to OSX.
  4. We really would like to run some stuff (wait for the window to close) after the student’s main() function, but there’s no exposed API for this that worked. So we use the preprocessor to rename their main() method to _main(). It’s awful because you need to transform both a no-argument main() and a main(int, char**) to the same thing. This really makes me feel dirty…
    // transform main() or main(int, char**) to _main(int, char**)
    #define main(...) vamain(__VA_ARGS__)
    #define vamain(...) vamainhelp(,##__VA_ARGS__, int, char**)
    #define vamainhelp(blank, first, second, ...) _main(first, second)

But, I would have to say that I learned a lot in the process. The library even works with gdb for debugging and stepping. It’s weird that Qt seems to try to do a lot of stuff to C++ that other languages support natively: hooks (signals/slots), reflection and multi-threading in particular.

We’ll see how it goes during the semester. Hopefully many of the undergrads working in my course are wizards enough at Qt, from their experience using it in the data structures course, that more people can configure it on whatever machine they like. My fingers are crossed!




Categories

Follow

Get every new post delivered to your Inbox.