Monthly Archive for September, 2007

Building the Home Supercomputer

I was configuring a new computer to be used for testing concurrent software, and was using my standard self-guidelines: second-fastest processor available (a nod to economy), as much DRAM as I can jam on a motherboard, and the latest dual-graphics card technology. Whohoo! But then I found this site on building a very economical cluster system and I realized my guidelines were old-fashioned. I’m now in the mood to build my own micro-Beowulf, so I can experiment with parallel clusters as well as multicore concurrency.

Check it out: The system described produces 26Gflops at a cost (August 2007) of $1256! It consists of 4 microATX motherboards, each with a dual core CPU and 2GB RAM, 4 power supplies, 1 hard disk, and 1 8-port gigabit switch. The “structure” is scrap plexiglass and threaded rods – definitely minimal! – and the whole thing is 11″ x 12″ x 17″! Kudos to Professor Joel Adams and his student Tim Brom for designing, building, configuring, and benchmarking this small Beowulf.

Here’s another such system – LittleFe.  And here is a homebrew 10-node system from 2000, with the same idea w.r.t. minimal packaging.

(My main conclusion about my self-guidelines: I don’t need even the second-fastest processor anymore. Nearly any current processor is fast enough for development purposes, compiler and system bloat notwithstanding. This system uses cheap multicore processors, a reasonable amount of memory for each node, and doesn’t need anything more than the built-in motherboard graphics. I would still like a system with a hot new graphics card however, so I can experiment with GPGPU.)

Update Sept 18 2007: Lot’s of people are doing work in this area—which will make it easy to get started!  Here are some more links:

ParallelKnoppix – A LiveCD that let’s you boot up an MPI cluster in 5 minutes!

And on the ParallelKnoppix site, some user’s have sent in pictures of their clusters—lot’s of different (and primitive, yet working) building techniques here!

And this page from Dec 2005 describes how some guy built a “mobile wireless linux cluster” (2 nodes) in order to have access to “big computer resources” while exploring a cave, mountain climbing, a weekend trip to the mountains, or who knows what else.

Persistent Data Structures – now (possibly) practical

The typical data structures most programmers know and use require imperative programming: they fundamentally depend on replacing the values of fields with assignment statements, especially pointer fields.  A particular data structure represents the state of something at that particular moment in time, and that moment only.  If you want to know what the state was in the past you needed to have made a copy of the entire data structure back then, and kept it around until you needed it.  (Alternatively, you could keep a log of changes made to the data structure that you could play in reverse until you get the previous state – and then play it back forwards to get back to where you are now.  Both these techniques are typically used to implement undo/redo, for example.)

Or you could use a persistent data structure. A persistent data structure allows you to access previous versions at any time without having to do any copying.  All you needed to do at the time was to save a pointer to the data structure.  If you have a persistent data structure, your undo/redo implementation is simply a stack of pointers that you push a pointer onto after you make any change to the data structure.

This can be quite useful—but it is typically very hard to implement a persistent data structure in an imperative language, especially if you have to worry about memory management1.   If you’re using a functional programming language—especially a language with lazy semantics like Haskell—then all your data structures are automatically persistent, and your only problem is efficiency (and of course, in your functional languages, the language system takes care of memory management).  But for practical purposes, as a hardcore C++ programmer for professional purposes, I was locked out of the world of persistent data structures.

Now, however, with C# and C++/CLI in use (and garbage collection coming to C++ any time now …2) I can at last contemplate the use of persistent data structures in my designs.  And that’s great, because it gave me an excuse to take one of my favorite computer science books off the shelf and give it another read.

The book is Purely Functional Data Structures, by Chris Okasaki.  I find it to be a very well written and easy to understand introduction to the design and analysis of persistent data structures—or equivalently—for the design and analysis of any data structure you’d want to use in a functional language.

There are two key themes of the book: First, to describe the use and implementation of several persistent data structures, such as different kinds of heaps, queues, and random-access lists, and second, to describe how to create your own efficient persistent data structures.

Continue reading ‘Persistent Data Structures – now (possibly) practical’