Windows Reinstall, IDEA vs Eclipse, Ubuntu Natty

Soon after installing SP 1, it looked like my Windows 7 installation was hanging on boot. A few impatient hard reboots later I had a consistent BSOD. (0x0000007B (0xFFFFF880009A9928, 0xFFFFFFFFC0000034, 0x00000000000000000 0x0000000000000000)) Startup repair didn’t help and neither did any utilities I could think of, whether available from startup repair or not. Safe mode wouldn’t boot and would hang on classpnp.sys. I tried fixmbr, and then realized that restoring GRUB when running with an encrypted root partition is more complicated. I should have just backed up the bootloader to an image, but I didn’t. Luckily, someone more knowledgeable than I had the same problem (though not for as foolish a reason) and I found their guide. sfc /scannow didn’t fix it. Some forums suggested deleting classpnp.sys, which didn’t help; it just started rebooting without displaying a message. Copying the classpnp.sys from a machine also running Home Premium 64-bit just restored it to the BSOD. I ended up reinstalling Windows 7. I moved my files to my Linux backup partition and back while booted into Linux, and in retrospect given how much fragmentation it caused I should have installed over it and just copied from the Windows.old directory on the new installation. I assume that the Linux NTFS drivers are to blame for the high levels of fragmentation, and that Windows would handle it far better.

While Eclipse works, it is rather sluggish and the UI is clunky. Thanks to ##java on Freenode, I’m now using IDEA. It’s somewhat different, and it doesn’t have the ANT error-checking that Eclipse has, but I prefer it. The interface is less cluttered and it feels easier to devote more of my screen to the actual source code. I think the reason is as simple that the buttons to hide windows are larger, toggle, and don’t move. The ANT problem I ran into took a while to figure out. IDEA was nice enough to offer to pull code down from GitHub, but although the resulting code compiled, the .jars didn’t run correctly. Being reduced to diffing the directories, I discovered that IDEA didn’t catch the ANT problems that Eclipse did, and upon fixing those it worked.

I’ve upgraded to Ubuntu Natty, and I don’t like the new interface. I don’t think it works very well. For instance, it’s very easy to remove an item from the dock, but dragging an item to change the order takes too long. This is probably due to the necessity of differentiating between scrolling the contents and dragging a single item. It’s not intuitive. Similarly, adding an icon to the bar is as easy as dragging, but it only stays in the bar if the thing that was dragged there remains. I’m glad GDM presents the Ubuntu Classic option. My desktop’s upgrade process to Natty was very much unlike the three other machines I’ve upgraded uneventfully. Not only did GRUB install unsuccessfully, which resulted in symbol not found : 'grub_env_export', (following the above guide again fixed it) but something about my motherboard led to a very long hang on boot after NET: Registered protocol family 1. At least there are bugs filed for the former and the latter, but I’m glad I wasn’t also hit by this one. Perhaps this will lead to me reading the release notes before an upgrade. However, I don’t see mention of these issues in the release notes, which worries me that it won’t be enough.

Problem Types

I find that among my least favorite types of problems are those that I’m unable to learn from. My main system drive was spontaneously remounted read-only, and upon Alt-Sysreq-reisub’ing, the OS didn’t come up and I got “error: partition not found” and a grub rescue> prompt that couldn’t do anything; not even “help.” I pushed in all the SATA cables and it came up, but upon reboot the drives were out of correct boot order. Bizarre. The part about this that scares me is that I’m for the most part unable to learn anything from this, and I wasn’t able to do anything to stop it from happening again because I don’t know why it happened. The same thing applies to the mysterious times this machine goes completely unresponsive while idle or suddenly doesn’t have video on boot, then spontaneously regains it. The former has happened a few times, the latter only one.

From programming in assembly, I finally realize how segmentation faults are really nice compared to the alternative. Data and instruction separation is a luxury. Miss a bounds check and suddenly you’re executing things not intended to be instructions and you get really weird opcodes and the whole thing dies. It can get really frustrating.

I realized the only reasons my server has gone down at dad’s are due to external forces: either the power has gone out at the power outage or circuit breaker level, or cables have been unplugged by unwitting family members. I wonder how much better colos are. ChunkHost was really nice, and I’d have likely continued once my “free beta” ended (I have my suspicions it’s a marketing thing for “free trial”) if I had more disposable income to the point where I felt I could justify a monthly fee.

EDIT: I had forgotten the time it went down as I was upgrading from Debian Lenny to Squeeze. I had set up a virtual machine for fallback, but I didn’t end up using it: in restoring the VM from backup I unknowingly uncovered a configuration problem with one of the hosted sites that showed up a few days later on the main server. Whoops.

Assembly

My section of Engineering 100 is Microprocessors and Toys, and having started with combinational logic, we progressed to finite state machines, then instruction set architectures, and finally to Assembly. We implemented simple instructions in Verilog, and we are now finally programming on that processor in Assembly. It’s oddly gratifying, although I have many gripes about Assembly. We are using E100: a heavily simplified instruction set without programmer-accessible registers, but even so these difficulties seem basic enough to be widely applicable. So far, the bugs we’ve run into have not been problems with overall logic but with return and jump addresses. An incorrect return address can often restart the program: memory by default holds a value of zero, which is also the address of the start of the program. This makes for an especially baffling form of infinite loop. Using the incorrect destination label for an unconditional branch leads to chunks of code mysteriously remaining unexecuted. I hope we figure out how to do this better, as currently it is often difficult to tell even what the values of variables are because on the emulator they are just one of many hexadecimal values out of several pages of them. Without the emulator it would be even more difficult. I’d rather stare at a page of numbers than have to add wait-for-key loops and hex digit outputs for each breakpoint and value I was interested in.

Gitolite, Lighttpd, and GitPHP

I had some trouble setting up GitPHP running under Lighttpd when using Gitolite. GitPHP is a PHP clone of gitweb, which allows web-browser-based repository viewing, and Gitolite is a very nice Git permissions manager. I needed the webserver to have read access to the repos, so I set the group to www-data, but as Gitolite was managing them, each commit would reset the permissions. I couldn’t get it working by adding www-data to the git group as suggested and would make sense, which I think is a lighttpd issue. The group sticky bit solved this problem by stopping the group owner from changing. I didn’t want the Gitolite configuration repos displayed on GitPHP too, so I pointed GitPHP at a different directory than the one the repos were actually in, and filled that with symbolic links to the repos it should display.

Still alive!

I’m still very happy that I’ve dropped physics. My class load is much more manageable. In Psych we have to do behavioral conditioning to improve something about ourselves, so I’ve chosen to try to reduce the amount of time I spend distracted on Reddit when I’m trying to do schoolwork. For the week before break I kept track of time spend trying to do schoolwork, and what of that time I spend browsing Reddit. This coming week I will record my time usage again, but this time whenever I focus on schoolwork for an hour or more without getting distracted on Reddit, I will give myself a 50% chance of the option to play Minecraft for 15 minutes guilt-free. Let’s hope it works.

I’ve begun to contribute to open source projects that I use. It does require knowledge of source control and communal development procedures, but it’s a great feeling to give back, even though my efforts are relatively small. Also, chmod uses X to give execute/search permissions only if someone else already has them for that file/directory.

Progress

Faced with a persistent inability to complete homework to a satisfactory degree, my academic advisor and I decided to drop my physics lecture, which had yielded encouraging increases in available time. I’ve also discovered GitPHP, which works as a PHP equivalent to gitweb and works very nicely. I set up a private git repo (accessible through the user git with everyone’s public keys in authorized_keys for my Engineering 100 class. We’ll use it to coordinate parallel development of an E100-compliant processor and our project, which at this point looks like it’s going to be creating robots which interact with one another through sound.

Learning Through Disaster

Gather ’round children, and I’ll tell you a tale of what happened to a Linux box when its sole filesystem was remounted read-only due to disk errors. This coincided with the backup server being taken offline with an errant circuit breaker.

I first became aware of something rotten in the state of Webserver when the sites hosted on it became messes of PHP errors in place of content. I could ssh in, but after entering my password I was greeted with:

-bash: /etc/profile: Input/output error
-bash: /home/steve/.profile: Input/output error
-bash-3.2$

That was a scary greeting.

lighttpd and ssh continued running, but PHP died, commands other than Bash builtins refused to run, and Bash profiles failed to load. I don’t know if some of this is due to damage or the partition being read-only. I’m pretty sure that commands expect /tmp, /var/run, and /var/lock to be writable. I now have those mounted as tmpfs as per the instructions on the Arch wiki here. The warning about lighttpd seems to not apply in my case. cURL seemed to run at first but died when I tried to do anything with it. I had hoped to POST files over, as scp, ftp, and sftp would not run. su still worked. Tunneling worked too, so I was able to still access other machines behind the firewall through the server even though I couldn’t run ssh from the machine itself. I ended up using cat to copy over text files. For binary files I had to get a great deal more creative. The only way I could interact with the server was over ssh; the server was an hour away and even if I did have physical access, mount refused to run (unable to write to /etc/mtab?) and I was afraid that the files I could access might be only buffered in memory and that rebooting into a LiveCD/USB would lose them. My options were limited. I had to use only Bash builtins to pull binary files off the server in text form. I modified a version of this hexdump script to pull files over ssh using | tee file.log to avoid having to copy-paste. tee takes output from stdout and redirects it to stdout and a file given as an argument. Here’s the script:

exec 3<"$1" while read -s -u 3 -d '' -r -n 1 char do printf "%02x" "'$char" done

I lacked a text editor and couldn't write to anything on the root filesystem. I found a tmpfs mount point (I used /lib/init/rw but /dev/shm would also work.) and stored the file by echoing the script line-by-line. In retrospect, I could have used \n and the -e (interpret backslash escapes) option to do it in one line: echo -e "exec 3<\"\$1\"\nwhile read -s -u 3 -d '' -r -n 1 char\ndo\nprintf \"%02x\" \"'\$char\"\ndone" > scriptfile. I ran it with bash scriptfile target_file.

All of this effort, though fun, ended up being unneeded as I had forgotten about my set-and-forget backups. Hooray rdiff-backup!

I ran mysqldump nightly and let rdiff-backup handle any differences. I restored it on the new machine with source mysql_dump.sql on a mysqladmin prompt, but as it contained users and privileges things got messy as the root and debian-sys-maint accounts were partially overwritten. I used mysqladmin to sort out the root password confusion and phpmyadmin to replace the debian-sys-maint password with the one found (in plaintext?!) in /etc/mysql/debian.cnf.

It was a fun puzzle even though it was ill-timed.

The Wolverine Soft 48-hour game competition revealed to me just how difficult physics engines are to make. I spent two days coding and recoding collision resolution only to get different sets of bizarre, game-breaking glitches. At least collision detection was easy because everything was a circle. It was fun and I'd like to do it again. Maybe I should become familiar with a physics library such as Bullet and ask for it to be approved for use in the competition. The guideline is unless it's an approved library, all code and assets (with exceptions for music and sound effects) must be created primarily on-site within the 48-hours. Next time I'll have to plan to do homework in advance. Ignoring homework for a weekend is inadvisable.

I am currently taking 17 credits, and the time management is very difficult, though has not yet proven to be entirely impossible. I'm considering taking classes at LCC this summer to lighten the load during the next school year. I applied to Camp CAEN to be a counselor, but they emailed back saying camp was ending due to the director retiring. The odd part is their website, as of this writing, has no mention of it that I can find. I'll have to see if I can get an internship over the summer.

Patience

I’ve been getting lessons in the importance of calm patience from various sources.  For instance, when I realized that Windows Backup would require too much effort to function as it should, I expanded my Linux backup partition to fill the entire backup drive, with the intent of adding Windows directories to the backup. To do this, as Truecrypt volumes have no official expansion capability, I moved the backups to my home folder, wiped the partitions, created a new LUKS volume, and moved the backups back. The partition operations were insanely easy with Red Hat’s Palimpsest. I put them in the root of the drive, and Deja Dup kept failing a CRC check and dying. My impulse was to freak out, delete the backups, and start fresh, but I researched it instead and ended up moving them from the root of the drive to a directory, and it appears to have fallen back to what it was supposed to do in the event of a corrupt backup which is make a new full one. I don’t yet know which backup was corrupt, I hope it wasn’t the full that the earlier incremental ones were based on, and I’m dismayed to find errors after copying, but I probably should have used rsync instead of Nautilus.

EDIT: It looks like the backups weren’t actually damaged. It may have been that the lost+found directory in the root of the drive was causing problems.

ADC

The ADC makes more sense now. It turns out Professor Atkins has been waiting as we figured out that the weirdness we’ve run into is due to tremendous electromagnetic interference. My math GSI was incredibly kind and willing to spend about an hour helping me fix the statistics. I don’t know why the corruption I ran into was occurring, but we did establish that what GSL calls total sum of squares is actually variance. I’ve added a real TSS function, as well as an output of absolute value of residual. Here’s the best graphs we got previously, rendered with the latest graphing routine:

ADC3 with latest graphing.

ADC5 with latest graphing.

ADC6 with latest graphing.

I didn’t want to disrupt the servo guy’s work much, so I moved Gumstix over to the power supply and set it back up. I didn’t use the breadboard to ground the unused ADCs, and put them all in the same alligator clip instead. I thought I would calibrate two more channels so that we’d have a usable input for the gyro reference voltage. I was very surprised with the results:

Behold ADC2!

Behold ADC7!

This makes so much more sense for many reasons. As I pointed out yesterday, there was a consistent, significant distortion under 1v. This is nowhere to be found in the new line. It’s actually a line, and there is only a minuscule difference in counts for the same voltages between graphs. This is acceptable as imperfections in the voltages we fed it as in this respect our power supply is… abstract. This line also goes up to 1024 at 2.5v, which is what it should actually do as it’s the maximum count at the maximum voltage. What I find amazing is how huge the effect of electromagnetic interference is! We got completely different information when using the breadboard, and even its imperfections were consistent! Professor Atkins revealed that she had let us spend hours on this fruitless calibration of electromagnetic interference so that we would thoroughly learn the importance of electromagnetically clean wiring. Lesson learned!