info on bitslice clients

Darrell Kindred (dkindred@cmu.edu)
Sun, 15 Jun 1997 18:30:49 -0400


Hi folks,

First, thanks for the kind words regarding the new clients.
The performance we're seeing now wouldn't have been possible
without Eli Biham's and Rocke's work.

I've put together some general information on the bitslice
clients here:

http://www.cs.cmu.edu/People/dkindred/des/bitslice.html

This includes some explanation of why x86 processors are
poorly suited to running bitslice clients. (Yes, I've tried
it.)

There's also a little background on the UltraSPARC client here:

http://www.cs.cmu.edu/People/dkindred/des/twoheads.html

Comments and feedback are welcome. Let me know if you'd like
to see more info on any particular aspect of the bitslice
clients in general or the UltraSPARC hack.

Now, I'll answer a question that's come up a couple of times:

Andy Brown writes:
> [...] I noticed the odd behavior of the clients which
> use this technique:
> [sample output showing 1A and 1B reporting different speeds]
>
> The two halves seem to work in parallel for a while, then side B seems to
> take larger blocks while side A remains at 2^30 blocks at a time. After
> this, side A seems to take over some of the work side B gets credit for,
> since the overall keyrate is roughly the same. Several machines show this
> behavior.
>
> Is this caused by a context switch triggering a register save and
> corrupting the A side? If the two sides get differently sized blocks,
> will B give A more to do once A has searched its assigned block?

Nope, no corruptions are involved. The two-headed client
works roughly as follows: Each of the two clients requests a
keyblock from the server. If the server returns blocks of
equal size, one block is processed in the low-order ("reliable")
32 bits of each word, while the other is processed
simultaneously in the high-order 32 bits.

If the server returns two blocks of different sizes, the
approach above would be inefficient since one block would be
finished sooner than the other. Instead, the blocks are
processed one at a time, using all 64 bits of the word.
(Essentially, the two "virtual processors" work together on
both blocks rather than each handling its own block.)
This ensures that the blocks are processed efficiently.

- Darrell