Adventures in Python

I’ve been spending most of my waking hours with Python over break, and I really like the language. Unlike the standard C++ library schoolwork is limited to, in Python I can generally find a library to make my task a great deal simpler. I find assumptions that I make about syntax while figuring things out tend to hold and work as I would expect, and it’s incredibly convenient to pop open an interactive shell to try out an idea before dropping it into a larger program. I actually like the whitespace-sensitivity of Python due to the rudimentary level of organization, style, and readability it provides. It seems like there’s much less boilerplate code and syntax compared with something like C++. That said, it can be odd to have it be an open question what type something is or what attributes it has and have that lead to problems. It can be frustrating to change code and not know if the types are right until that part runs. These are problems which would not be present in a statically typed language, but such a language will probably not be so flexible.

I’ve managed to get one project into a state in which I’m willing to show it the light of day: RelayBot. Not finding a working IRC bridge bot, I worked off an existing (but for me non-functional) implementation which heavily informed its design. I built my version by removing parts until it connected properly, then writing more functionality and removing still more until it did what I had in mind. I hope to use it to bridge a channel on FLIP and Irc2P.

The project (again in Python) which is not yet ready is a network probing and analysis application. It collects network topology information (optionally in a threaded fashion) and commits the results to a sqlite database for later analysis. It’s hoped that this will allow evanbd to replace a collection of Bash scripts which take an incredibly long time to run and are prone to breaking. The basic functionality is there, but it has many rough edges still. I’m partial to the peer distribution graph:

Histogram of Number of Nodes vs Claimed Number of Peers

GNUPlot really does give lovely images. What I find interesting about this is how there are clear peaks – many nodes claim 12 or 36 peers, which seems very likely to be a function of the peer connection caps and bandwidth limits. There were some outliers, with one node claiming 92 peers! What’s encouraging is that this overall pattern seemed quite stable even as many more probes were collected.

This project has made clear to me how much I need to learn SQL properly. I initially wrote a collection of three queries to generate this: one query retrieved keys which were used to iterate over the other two. Generating this graph took about two hours. I figured out how to rewrite it to use the proper SQL commands for getting the result, and the exact same graph generated in approximately 30 seconds! What’s more, there’s a command I’d like to write that I don’t know how: “Take the sum of the count of the distinct traceNums for each probeID.” It sounds so SQL-y I’m not sure quite how I haven’t been able to do so.

It’s been a fun break, and a shame it couldn’t last longer. Learning in this kind of an organic way with immediate results and self-demonstrating practicality is fantastic.