Re: Computer Flake Outs (was: Re: On Overclocking - READ THIS!)

Justin Dolske (dolske@cis.ohio-state.edu)
Wed, 7 May 1997 00:58:58 -0400 (EDT)


On Tue, 6 May 1997, Rodney R. Korte wrote:

> It claimed something like one one-bit error every 4-6 months for
> the average Pentium-class computer with 32MB
[...]
> The memory footprint of DESCHAL5 on my machine is about 1/2 MB,
> so I could expect an error in memory in which the DESCHAL code
> resides about once in 3 years.

Hmm, don't think so. If you assume a 1 bit error in a 32meg core every 4
months, you should expect a 1 bit error in any particular .5meg region
every 21 years (it's 1/64th the size). With 6000 clients, this would
correspond to a single bit error every 3 or 4 days.

I've no idea how useful this number really is. You've also got to factor
in the probability of it being detected (by parity or the program
crashing), and the probability that it won't affect the actual
calculation (ie, an error in non-essential code, an unused bit on the
stack, etc). Then there's the issue of cache. If most of the code is
sitting in cache, you'd have to figure error rates for it (which are
probably different, due to SRAM vs DRAM), and the probability that the
error will affect stale or expired cache entries.

> Fortunately, I have parity memory

Are you sure it's real parity? For awhile, memory with pseudo-parity was
being sold, where the "parity" bit was generated on the SIMM to correspond
to what the DRAMs currently held when accessed, so there was really no
error correction. :-)

Justin Dolske <URL:http://www.cis.ohio-state.edu/~dolske/>
(dolske@cis.ohio-state.edu)
Graduate Fellow / Research Associate at The Ohio State University, CIS Dept.
-=-=-=-=-=-=-=-=-=-=-=-=-=- Random Sig-o-Matic (tm) -=-=-=-=-=-=-=-=-=-=-=-=-
1 + 1 = 3, for large enough values of 1.