reliability (no poker content)

tylerdurden · #1 08-02-2005, 09:37 PM

Assume you are in charge of management for 2000 computers in a datacenter. Over one year, you expect 400 hardware failures. Each failure will take an average of 3 hours to resolve.

Per year, how many failure events would you expect to overlap?

spaminator101 · #2 08-02-2005, 09:59 PM

What might i ask, is the point of this post.

uuDevil · #3 08-02-2005, 10:53 PM

[ QUOTE ]
What might i ask, is the point of this post.

[/ QUOTE ]

Considering your screenname and post history, there is no small irony here.

uuDevil · #4 08-03-2005, 02:21 AM

I'm unreliable, but my answer is 25.

Method:

The expected #failures in a 3-hr period is 400/(24*365/3)=.137

The probability of 2 or more failures in a 3-hr period is

1-P(X=0)-P(X=1)= 1-exp(-.137)*(.137)^0/0!-exp(-.137)*(.137)^1/1!= .0857

The expected number of times 2 or more failures will occur in the same 3-hr period over a year is

.0857*(24*365/3)=25.0

emp1346 · #5 08-04-2005, 05:00 AM

i think uuDevil is on the right track... I simply used the Poisson formula, and got a bit different number, with the probability being ~.0081, resulting in about 23.9, so 24... basically the same though...

and as for you spaminator, i simply agree with uuDevil...

tylerdurden · #6 08-04-2005, 10:32 PM

Thanks guys. I have heard of the Poisson Distribution, but wasn't sure how to apply it here. I did a little reading and I think I have the hang of it now.

emp1346 · #7 08-06-2005, 08:13 PM

[ QUOTE ]
i think uuDevil is on the right track... I simply used the Poisson formula, and got a bit different number, with the probability being ~.0081, resulting in about 23.9, so 24... basically the same though...

[/ QUOTE ]

look, will one of you two tell me why you're disregarding the poisson? it was designed for situations of this sort...

and btw, for a triple occurence, only approximately 1 will occur in a year...

tylerdurden · #8 08-06-2005, 08:39 PM

[ QUOTE ]
look, will one of you two tell me why you're disregarding the poisson? it was designed for situations of this sort...

and btw, for a triple occurence, only approximately 1 will occur in a year...

[/ QUOTE ]

The poisson distribution assumes discrete intervals, not continuous ones (sorry if my terminology is awkward) - i.e. poisson will give you the expected number of failures in three hour intervals such as 00:00 - 03:00, 03:00 - 06:00 etc. A failure at 2:30 and another at 3:30 would overlap, but in the poisson they would be in different three-hour periods, and wouldn't get counted.

I think a better way to phrase it might be that the poisson is concerned with events in a given period, whereas I'm concerned with the proximity of events.

irchans · #9 08-05-2005, 07:28 AM

uuDevil,

I think your method underestimates the number of overlaps because it implies that failures start exactly on a three hour boundary.

Here is a second method for estimating the number of overlaps. Suppose there are exactly 400 failures. We will say that the ith failure and the jth failure overlap if their start times differ by 6 hours or less. The probability that the ith failure overlaps the jth failure is approximately

6/(24*356) = 0.000684932.

There are 400*399/2 = 79800 possible pairs of i's and j's, so the expected number of overlaps (with "overlap" defined as above) is

6/(24*356) * 400*399/2 = 54.6575.

tylerdurden · #10 08-05-2005, 03:03 PM

[ QUOTE ]
I think your method underestimates the number of overlaps because it implies that failures start exactly on a three hour boundary.

Here is a second method for estimating the number of overlaps. Suppose there are exactly 400 failures. We will say that the ith failure and the jth failure overlap if their start times differ by 6 hours or less.

[/ QUOTE ]

I think you're onto something (I see the point about starting exactly on a three hour boundary), but I think your remedy is off-base. If the start times are more than three hours apart they don't overlap. For our purposes we can assume the variance on the repair length is zero, and that all failures always take exactly three hours to fix.