reliability (no poker content)

tylerdurden · #1 08-02-2005, 09:37 PM

Assume you are in charge of management for 2000 computers in a datacenter. Over one year, you expect 400 hardware failures. Each failure will take an average of 3 hours to resolve.

Per year, how many failure events would you expect to overlap?

spaminator101 · #2 08-02-2005, 09:59 PM

What might i ask, is the point of this post.

uuDevil · #3 08-02-2005, 10:53 PM

[ QUOTE ]
What might i ask, is the point of this post.

[/ QUOTE ]

Considering your screenname and post history, there is no small irony here.

uuDevil · #4 08-03-2005, 02:21 AM

I'm unreliable, but my answer is 25.

Method:

The expected #failures in a 3-hr period is 400/(24*365/3)=.137

The probability of 2 or more failures in a 3-hr period is

1-P(X=0)-P(X=1)= 1-exp(-.137)*(.137)^0/0!-exp(-.137)*(.137)^1/1!= .0857

The expected number of times 2 or more failures will occur in the same 3-hr period over a year is

.0857*(24*365/3)=25.0

emp1346 · #5 08-04-2005, 05:00 AM

i think uuDevil is on the right track... I simply used the Poisson formula, and got a bit different number, with the probability being ~.0081, resulting in about 23.9, so 24... basically the same though...

and as for you spaminator, i simply agree with uuDevil...

tylerdurden · #6 08-04-2005, 10:32 PM

Thanks guys. I have heard of the Poisson Distribution, but wasn't sure how to apply it here. I did a little reading and I think I have the hang of it now.

irchans · #7 08-05-2005, 07:28 AM

uuDevil,

I think your method underestimates the number of overlaps because it implies that failures start exactly on a three hour boundary.

Here is a second method for estimating the number of overlaps. Suppose there are exactly 400 failures. We will say that the ith failure and the jth failure overlap if their start times differ by 6 hours or less. The probability that the ith failure overlaps the jth failure is approximately

6/(24*356) = 0.000684932.

There are 400*399/2 = 79800 possible pairs of i's and j's, so the expected number of overlaps (with "overlap" defined as above) is

6/(24*356) * 400*399/2 = 54.6575.

tylerdurden · #8 08-05-2005, 03:03 PM

[ QUOTE ]
I think your method underestimates the number of overlaps because it implies that failures start exactly on a three hour boundary.

Here is a second method for estimating the number of overlaps. Suppose there are exactly 400 failures. We will say that the ith failure and the jth failure overlap if their start times differ by 6 hours or less.

[/ QUOTE ]

I think you're onto something (I see the point about starting exactly on a three hour boundary), but I think your remedy is off-base. If the start times are more than three hours apart they don't overlap. For our purposes we can assume the variance on the repair length is zero, and that all failures always take exactly three hours to fix.

irchans · #9 08-05-2005, 04:36 PM

pvn,
You are correct! There was a typo in my previous post. Below is the corrected version substituting 3 hours for 6. The expected number of overlaps did not change when I made the correction.

We really should do a simulation.

---- corrected post -----

uuDevil,

I think your method underestimates the number of overlaps because it implies that failures start exactly on a three hour boundary.

Here is a second method for estimating the number of overlaps. Suppose there are exactly 400 failures. We will say that the ith failure and the jth failure overlap if their start times differ by 3 hours or less. The probability that the ith failure overlaps the jth failure is approximately

6/(24*356) = 0.000684932.

There are 400*399/2 = 79800 possible pairs of i's and j's, so the expected number of overlaps (with "overlap" defined as above) is

6/(24*356) * 400*399/2 = 54.6575.

tylerdurden · #10 08-05-2005, 07:51 PM

[ QUOTE ]
uuDevil,

I think your method underestimates the number of overlaps because it implies that failures start exactly on a three hour boundary.

Here is a second method for estimating the number of overlaps. Suppose there are exactly 400 failures. We will say that the ith failure and the jth failure overlap if their start times differ by 3 hours or less. The probability that the ith failure overlaps the jth failure is approximately

6/(24*356) = 0.000684932.

There are 400*399/2 = 79800 possible pairs of i's and j's, so the expected number of overlaps (with "overlap" defined as above) is

6/(24*356) * 400*399/2 = 54.6575.

[/ QUOTE ]

I came up with a similar number in a different manner.

As uuDevil pointed out, The expected number of failures in a three hour period is 400/(24*365/3)=.137

We expect to have 400 failure events (averaging three hours each) in a year. During each one of those, the probability that another machine will fail is 0.137.

Now if we take 400*0.137 = 54.8. However, that means we'd actually have 454.8 failures, not 400 (we're counting duplicates twice.

We just need to solve this for x: (x*0.137)+x=400

That gives us x=351.8. 351.8 single failure events.

351*0.137=48.2

48.2 overlapping events.

351.8+48.2=400.