Treacherous Variable Names

Now for the chronicle of a bug clearly caused by poor variable names. (And arguably also the lack of semantic meaning given to string types.) This  buggy function searches for a device with a given uuid:

/*
 * Search the specified glob for devices; return error code. On
 * success dev->fd is set to a valid file descriptor and the file
 * is locked.
 */
static int __find_dev(struct vdev *dev, struct uuid *uuid,
                      const char *const path)
{
	glob_t paths;
	int ret;
	size_t i;

	ret = glob(path, 0, NULL, &paths);
	/* Error handling omitted... */

	for (i = 0; i < paths.gl_pathc; i++) {
		const char *const fname = paths.gl_pathv[i];

		dev->fd = mopen(fname, O_RDWR);
		if (dev->fd < 0)
			continue;

		if (is_requested_device(dev->fd)) {
			ret = __try_dev(dev, uuid, path);
			if (!ret)
				goto found;
		}

		close(dev->fd);
	}

	ret = -ENOENT;

found:
	globfree(&paths);

	return ret;
}

mopen() is given fname, the path produced by glob(), but __try_dev() is given path, which is actually a glob. This caused the device to import correctly but report as its path the glob used to find it instead of its actual path. path is now named search_glob, and fname is now named path. Our existing tests did check that the device path was reported correctly, but they didn’t catch this because they all explicitly specified the exact path to the device to use, so the search glob was the correct path. If the search glob had been some kind of glob type instead of a string then the patch that caused this bug wouldn’t have compiled.

Fun With Concurrency

First a bug, then an exercise in assuming the common case.

We were parallelizing a single-threaded operation to increase throughput and came upon a race condition that at first manifested as crashing only on Ubuntu. One thread read jobs from the disk and passed them to an existing worker pool for processing. Once all processing was complete, the reading thread exited. When a worker finished processing it marked its job as complete, then released its locks. This could result in releasing a freed lock – because the thread pool was also needed for other purposes it wasn’t joined first. My best guess as to why it only showed up on Ubuntu (and Debian, it turned out) is that it has different defaults for what we assume to be stack protection, though that wasn’t clear from listing them with gcc -Q --help=target. Though changing the order of lock releases solved the immediate problem there were also issues with deadlock due to dependencies involving the thread pool. We ended up having an additional processing thread that was joined before exiting, which avoided the problem.

While this parallelization does get its speed from doing previously single-threaded work in a thread pool, it has a twist. A final processing step must be done in disk order, but waiting for the processing slowed down reading. By reading speculatively as though processing had succeeded, we achieved a ~2x performance increase. Yay!

More Freenet Stats

Freenet is decentralized, so while it’s (intended to be) a small-world network and thus short routes exist between any two nodes, it can be difficult to have a routing algorithm which can find those routes using only local information. As far as I understand it, Jon Kleinberg’s work (brief, paper) forms the basis of Freenet’s networking model. We don’t have local connections, as they’re expensive to form and in empirical testing actually detrimental to performance, but apparently the model still holds. A core finding from this work is that if connections are distributed so that more-distant connections are less likely, an implicit structure forms which allows forwarding a message to the closest peer at each hop to form a shortpath. As Freenet uses 1-dimensional locations, this distribution is based on a probability which is proportional to the inverse of the distance. The ideal distribution is logarithmic, but from what I’ve gathered, Freenet’s distribution isn’t too close to it. Making the actual match the ideal is difficult – the network size is a factor in the distribution, but it cannot (practically) be known. Techniques by Oskar Sandberg (paper) are intended to produce this distribution without such knowledge, but Freenet’s implementation seems to not behave as intended. I hope to help discover why, and how to fix it.

Edit May 2, 2012: I replaced a rather dubious ideal distribution plot made with a quick Python script with a much better-looking one made with a real simulator. Here are both distributions on the same plot:

Edit August 1, 2012: Corrected Y axis label to refer to the percent of links, not nodes.

Both plots on the same graph

Adventures in Python

I’ve been spending most of my waking hours with Python over break, and I really like the language. Unlike the standard C++ library schoolwork is limited to, in Python I can generally find a library to make my task a great deal simpler. I find assumptions that I make about syntax while figuring things out tend to hold and work as I would expect, and it’s incredibly convenient to pop open an interactive shell to try out an idea before dropping it into a larger program. I actually like the whitespace-sensitivity of Python due to the rudimentary level of organization, style, and readability it provides. It seems like there’s much less boilerplate code and syntax compared with something like C++. That said, it can be odd to have it be an open question what type something is or what attributes it has and have that lead to problems. It can be frustrating to change code and not know if the types are right until that part runs. These are problems which would not be present in a statically typed language, but such a language will probably not be so flexible.

I’ve managed to get one project into a state in which I’m willing to show it the light of day: RelayBot. Not finding a working IRC bridge bot, I worked off an existing (but for me non-functional) implementation which heavily informed its design. I built my version by removing parts until it connected properly, then writing more functionality and removing still more until it did what I had in mind. I hope to use it to bridge a channel on FLIP and Irc2P.

The project (again in Python) which is not yet ready is a network probing and analysis application. It collects network topology information (optionally in a threaded fashion) and commits the results to a sqlite database for later analysis. It’s hoped that this will allow evanbd to replace a collection of Bash scripts which take an incredibly long time to run and are prone to breaking. The basic functionality is there, but it has many rough edges still. I’m partial to the peer distribution graph:

Histogram of Number of Nodes vs Claimed Number of Peers

GNUPlot really does give lovely images. What I find interesting about this is how there are clear peaks – many nodes claim 12 or 36 peers, which seems very likely to be a function of the peer connection caps and bandwidth limits. There were some outliers, with one node claiming 92 peers! What’s encouraging is that this overall pattern seemed quite stable even as many more probes were collected.

This project has made clear to me how much I need to learn SQL properly. I initially wrote a collection of three queries to generate this: one query retrieved keys which were used to iterate over the other two. Generating this graph took about two hours. I figured out how to rewrite it to use the proper SQL commands for getting the result, and the exact same graph generated in approximately 30 seconds! What’s more, there’s a command I’d like to write that I don’t know how: “Take the sum of the count of the distinct traceNums for each probeID.” It sounds so SQL-y I’m not sure quite how I haven’t been able to do so.

It’s been a fun break, and a shame it couldn’t last longer. Learning in this kind of an organic way with immediate results and self-demonstrating practicality is fantastic.

Freenet and Linux Confusion

I’m working on putting together Node-to-Node Darknet chat for Freenet. I was originally working on it as a mainline Freenet feature, but then I realized that it was too large of a feature to be present outside of a plugin. The plugin interface seems, from what I’ve seen, disturbingly close to the internals of Freenet: much of the code I was writing when internal to Freenet is very easily translated to the plugin. I hope I’m either misjudging things or doing it wrong. For example, I need to have a listener for my chat messages, and to register it, I call the same method from the plugin as I would internally. To send a message, though I can no longer add code to DarknetPeerNode, I can still call the basic message-sending method. Perhaps these things aren’t likely to change, but I somehow expected a plugin interface based more on actions than the internal structure. I hope I can get it working anyway; this has been taking longer than I’d like. I’m trying to learn JQuery as well for a shiny interface.

I upgraded my graphics card driver (AMD Catalyst) in the hope that they’d fixed the bug introduced in 11.5 where the mouse cursor gets stuck and moves choppily in the lower right and upper left corners. They hadn’t, and in the process of installing the driver somehow my sound broke. It started working again this morning. It may have been that I re-added myself to pulse and pulse-access groups and needed to reboot for the changes to take effect, but if not I can’t think of why. EDIT: I think it’s working again now. The HDMI on my Radeon HD 5770 may be conflicting with the normal sound card now for some reason: it’s not listed in sound preferences and the sound is working; when it was the sound wasn’t. It’s worth noting that the “Use audio devices” privilege checkbox is apparently not supposed to be checked. This article is helpful as well. EDIT 2: This article ended up doing the trick to get surround sound output and front panel input working. What worked for me was appending options snd-hda-intel model=auto to /etc/modprobe.d/alsa-base.conf and running sudo alsa force-reload to apply the changes. This is running Ubuntu Natty 11.04.

I’m looking into self-hosted or open source alternatives to Dropbox. After looking at lipsync and sparkleshare, I tried dvcs-autosync as it seemed more lightweight and used XMPP (instead of IRC like sparkleshare) for communication. It didn’t seem to work – changes were committed one file at a time, which was annoying as it popped up a notification bubble for each one. Although one side would appear to log in to the XMPP server and send its status updates, the other side always ignored them. When using Jabber.org instead of a local server, it seemed like only one side could log on at a time. sparkleshare, as it uses git, is incapable of syncing git repos. I’m now trying lipsync, but it’s not working quite right yet. On the upside, it’s a small enough thing that it’s fairly easy to pick up.

Assembly

My section of Engineering 100 is Microprocessors and Toys, and having started with combinational logic, we progressed to finite state machines, then instruction set architectures, and finally to Assembly. We implemented simple instructions in Verilog, and we are now finally programming on that processor in Assembly. It’s oddly gratifying, although I have many gripes about Assembly. We are using E100: a heavily simplified instruction set without programmer-accessible registers, but even so these difficulties seem basic enough to be widely applicable. So far, the bugs we’ve run into have not been problems with overall logic but with return and jump addresses. An incorrect return address can often restart the program: memory by default holds a value of zero, which is also the address of the start of the program. This makes for an especially baffling form of infinite loop. Using the incorrect destination label for an unconditional branch leads to chunks of code mysteriously remaining unexecuted. I hope we figure out how to do this better, as currently it is often difficult to tell even what the values of variables are because on the emulator they are just one of many hexadecimal values out of several pages of them. Without the emulator it would be even more difficult. I’d rather stare at a page of numbers than have to add wait-for-key loops and hex digit outputs for each breakpoint and value I was interested in.

Heading Off to College with The Monolith

It’s official: I’m registered for classes at the University of Michigan and I move into the dorms August 31st. Although I’m excited to learn more, go onto another stage of life, and experience what college has to offer, this summer has been excellent, and if I could repeat it or just sections of it I gladly would. I’m rather scared to move on; I think I will be able to handle the independence, though it will surely take time to get used to it. Most of all, I feel going off to college will mean saying goodbye – the phone just isn’t the same as being in the same room.

My cat died – our guess is either a seizure or a stroke. It was very sudden, and marks my second pet to die unexpectedly and swiftly. I’m glad that at least she didn’t suffer long. In a few hours my knowledge went from “your cat is sick and at the vet” to “your cat is dead.” I was shocked. By this point I’m okay, though.

As a graduation present, I built a $1,500 desktop and upgraded to a widescreen LCD.

Phenom II x 4 @ 3.4GHz

8GB DDR3 RAM

1.5 TB HDD

ATI Radeon HD 5770 1GB GDDR5

It runs incredibly smoothly. The bottleneck is of course the hard drive. I’d have gone for an SSD if they were reasonably priced, but that day has yet to come.

I’m enjoying my internship at IDV. I’ve gotten to look at both the sysadmin and developer side of things. There’s quite a learning curve to coming up to speed with a new codebase, but I got it. Coding requires long bouts of focus, while sysadminning can be more intense, but frequently stopped by waiting for the computer to complete some semiautomated process. I’ve also done more work on Cavez of Phear.

Keep on keepin’ on

My social life is shifting away from these precious tubes of ours in many ways. Being within walking distance of places I would actually want to go really helps. I find it far more fulfilling to talk face-to-face with people, and the Internet, Reddit in particular, seems to just fill that void with funny captioned pictures and the occasional interesting article. I’ve been keeping a journal of sorts and that’s proven enjoyable. The freedom that the summer and my parents give me is fantastic.

I have an internship at IDV and it’s interesting. I am happy to report that when actually coding, such as in C#, much less so in configuration files, programming techniques are generally applicable. It’s nice after floundering around to have little sections of activity where I actually feel like I know what’s going on.

As a graduation present I’m putting together a monster gaming rig. I was very happy to be notified that this would happen. Once I get the parts and snapped together ordered I’ll put together a writeup on the build.  I’m looking forward to it.

Even More Progress

I’ve made big changes to Cavez of Phear!

Because most terminal windows default to 80×24, it now fits in that space. There is no longer a lives system because it was only frustrating and didn’t add anything to the gameplay. A game should not be difficult through starting over again if you make too many mistakes. With the saves and without the lives, it allows for focus on the puzzles. Upon death, the savefile will load if it’s of the same level. I recoded the key bindings to be cleaner, and it now supports multiple keys for a single command. Sortie made very nice tutorial levels.

The editor is much improved and can now place bombs in addition to bomb packs, apply physics to a level, and make changes in rectangles and whole-level fills in addition to single points.  It also is more flexible in opening and saving files. I also fixed the persistent and seemingly vague problem that it sometimes required multiple keypresses to get back to the main menu. It ended up being caused by recursive calls to main_loop(). There’s now yet another while(true) in the main loop that gets break;‘d out of for it to go back to the beginning and load a level or save as needed. I greatly improved the screen dissolve function – it only writes to each position once, and is guaranteed to cover every location in 1840 writes. I fill an array with all possible positions, perform a Fisher-Yates shuffle on it, then iterate over the array, writing spaces. Writing the shuffle algorithm was interesting. The previous one did overwrite – it was pure rand() – and took 10000 writes, which led to horrid performance over SSH, and it wasn’t even guaranteed to clear everything. In general I’ve tried to improve the UI – the reaper on the main menu only comes in from the left on the first run, after which the menu comes up without pause. I’ve cleaned up a lot of code, but there remains is_ready(), which Sortie tells me has something to do with asynchronous file operations, and do_the_monster_dance(), which I’ve been avoiding because it looks messy.

In playtests I’ve had trouble with people getting confused because they don’t read the directions. I’m resisting making big boxes proclaiming what the controls are because I’m assuming players are intelligent, and so far they’ve managed to figure it out. I got embarrassed when I didn’t realize which commits had gone through and made an inaccurate, duplicate commit. I’m worried that I might get caught up in a never-ending parade of new features and not stop long enough to add more levels. I suppose with all the improvements to the editor it’ll be much easier.

Next on the list of feature additions is support for level packs, cleaning up the level code by actually reading the directory instead of relying on a #define, figuring out why the game spontaneously quits on some keypresses and freezes on Ctrl-S, and perhaps reimplementing the menus in light of the knowledge that ncurses has menu functionality.

If you’d like to try it out, either grab PuTTY and ask me for a shell account, or download the source, compile it yourself, (requires ncurses) and have fun!

EDIT: I like to think that my desk’s keyboard platform is falling off because I coded so hard all day and it couldn’t take the greatness. The friggin’ monitor is hot!

Progress

Nine school days until seniors leave! Both my APs are now what amounts to study hours now that testing is over. It’s nice. Comp Physics created some quality quotes yesterday:

“Light pours forth from your rear and keeps you from falling through your chair.” (In reference to photons, the electromagnetic force boson behind most Newtonian forces.)

“We call it the ‘Ultrapurple.'” (Ultraviolet)

I’m working more on Cavez of Phear. The save file format is now one file, the code has been further cleaned up, (Mostly eliminating copy-paste) and I’m working on adding features to the editor. I had some problems with not closing file pointers initially, which caused very strange behavior until I realized what I had forgotten. If anyone wants look at the github repo, it’s here.