Re: bitslice client and PPC (maybe 68k too?)

andrew meggs (
Sun, 15 Jun 1997 21:26:59 -0400

At 4:20 PM -0700 6/15/97, Aaron Whiteman wrote:
>I know there have been several requests for the bitslice client for the
>x86, and the answer has always been no, intel chips suck (well ok maybe im
>being simplistic). Now, its my turn to chime in for my platform... Can
>PPC have bitslice too? :) Or, with Andrews work, would it even be worth

My assembly version of the original algorithm seems to do a few percentage
points better than the compiler-generated version of the bitslice
algorithm. So right now the current PowerMac client is the best choice for
a PowerMac.

Before people ask, I'll just take on the question: "But what about an
assembly version of bitslice?" After all, the reason the x86 and PPC
clients can do as well as they do without bitslice is because they have
hand-optimized assember in them, so why not optimize the bitslice client
"the same way"? There are a number of reason why that hasn't happened yet.

First of all, optimizing something in assembly isn't a step-by-step process
that makes code run N percent faster. It's more like a black art, and
there's no such thing as "the same way". RISC compilers seem to generate
much better code for bitslice than for Rocke's original algorithm, so from
a cursory examination there appear to be fewer gains to be had by
eliminating the compiler. But that's just from looking at a disassembly; I
haven't even seen the source code.

Second, assembly coding is tedious, and assembly debugging is even more
tedious. The bitslice algorithm is a bigger chunk of code than what we have
now, which means an assembly version would require a bigger amount of both.
Unless I can develop a strategy for getting some serious performance gains
over the compiled version, doing that work wouldn't be worthwhile for me
and a marginal performace increase wouldn't be worth the download for you.

Third, Darrell's bitslice implementation is still evolving. The dk003
client for UltraSPARCs has a tighter s-box implementation than the dk002
clients that were released for 32 bit systems, and those are tighter than
the original 64-bit dk001 clients. Once an assembly version was created,
any improvements Darrell made to his source code wouldn't be carried over
to that assembly version. If I spent a couple of days doing an assembly
translation of the dk003 code that got a 30% performance increase and then
he came out with some new ideas and developed dk004 clients that ran 27%
faster without the benefit of assembly, then much of my work would have
been wasted and I might as well start over and try to do an assembly
version of the dk004 code.

Once the bitslice source is finished and stable, assuming we aren't almost
finished with the keyspace I'll take a look at it and see if I have any
ideas about how it could be optimized well enough to make the task
worthwhile. Any word on that, Darrell?

> Also, it was my understanding that the old 68k chips dont have the
>register problems, maybe the bitslice can apply to it as well (even if you
>have to drop the 020 to do it). Is it possible?

Dropping the 020 wouldn't change anything. 68k has more registers, but some
of the same other problems as x86, like a limited set of 2-operand logical
instructions. It would *definitely* require assembly, so on top of all the
reservations expressed above you would have to find someone to do the work.
I can read and write 68k assembly, but I'm definitely not an artist at it.

Andrew Meggs, head 3D superfreak Antennahead Industries, Inc.
<> <>