Two Plus Two Older Archives - Pure probability question

Two Plus Two Older Archives (http://archives2.twoplustwo.com/index.php)

- Probability (http://archives2.twoplustwo.com/forumdisplay.php?f=23)

- - Pure probability question (http://archives2.twoplustwo.com/showthread.php?t=303538)

Re: Pure probability question

[ QUOTE ]
it is not immediately (or, to me, eventually) apparent why P(requiring draw N) = 1/(n-1)!

[/ QUOTE ]

check out pzhon's responses to my questions.... he gives two different proofs.

Why is the answer 2000 + a/b?

[ QUOTE ]
[ QUOTE ]

Followup: Suppose you add the numbers up until you get 1000. Roughly how many do you expect it to take?

[/ QUOTE ]
It's easy to see that the average is a bit more than 2000. Here is a hint: The average is about 2000+a/b, where a and b are small integers. Of course, the problem is not just to determine a/b, but also to justify it.

[/ QUOTE ]

OK,
This problem seems difficult. After a fair amount of work (solving ODE's) I got a general expression, f[n], for the expected number of random numbers needed to exceed an integer n. When I compute the value of this expression for various values of n, it is very very near to 2*n+ a/b for two rather small integers a and b. (See pzhon's post.) I have no idea why the answer is so close to this fraction. The values for f[n], a, and b are given below in white with some more details.

<font color="white"> I got that the expected number of random numbers needed to exceed n is exactly

f[n] = Sum[ (-n + i)^i/i! Exp[n - i], {i, 0, n}].

For
n=1 f[n] = 2.7182818284590452354 = e
n=2 f[n] = 4.6707742704716049919 = e^2 - e
n=3 f[n] = 6.6665656395558899041 = e^3 - 2 e^2 + e/2
n=4 f[n] = 8.6666044900326954372
n=5 f[n] = 10.666662068622411858
n=6 f[n] = 12.666667141378121401
n=7 f[n] = 14.666666781522143450
n=8 f[n] = 16.666666670426887824
n=9 f[n] = 18.666666665270321349
n=10 f[n] = 20.666666666476318801
n=11 f[n] = 22.666666666669991177
n=12 f[n] = 24.666666666669900329
n=13 f[n] = 26.666666666666922273
n=14 f[n] = 28.666666666666641305
n=15 f[n] = 30.666666666666660344
n=16 f[n] = 32.666666666666666457
n=17 f[n] = 34.666666666666666744
n=18 f[n] = 36.666666666666666677
n=19 f[n] = 38.666666666666666666
n=20 f[n] = 40.666666666666666666

So I figure to get a sum of 1000, you need an average of 2000+2/3 random numbers. As I said, I have no idea why f[n] is approximately equal to 2*n + 2/3.

pzhon, you said "It's easy to see ...". Do you care to enlighten us?

</font>

Re: Why is the answer 2000 + a/b?

Followup answer in white:
<font color="white">
To justify that the answer is very roughly 2000, you can use the Central Limit Theorem.

To get the exact answer, you can solve some lag differential equations for f(x) = Expected number of draws before the total is at least x, namely f'(x)=f(x)-f(x-1). Numerically, you can see that f(n) is very close to 2n+2/3. However, as irchans points out, this is not obvious from the form of the exact answer.

Here is a way to prove that f(x) is asymptotic to 2x+2/3:

[img]/images/graemlins/diamond.gif[/img] For simplicity, consider g(x) = f(x)-2x. This simplifies the integral equation:

f(x) = 1 + integral f(z)dz from z=x-1 to z=x, f(x)=0 on -1<x<0.

g(x)
= -2x + f(x)
= -2x + 1 + integral f(z)dz from z=x-1 to z=x
= -2x + 1 + integral (g(z) + 2z)dz from z=x-1 to z=x
= -2x + 1 + integral 2zdz + integral g(z)dz from z=x-1 to z=x
= -2x + 1 + x^2 - (x-1)^2 + integral g(z)dz from z=x-1 to z=x
= integral g(z)dz from z=x-1 to z=x.

Note that g(x) = -2x on -1<x<0.

[img]/images/graemlins/diamond.gif[/img] If g is asymptotic to some constant, what must that constant be? It would help to find some sort of weighted average of g on an interval that is preserved by the progression of the integral equation. Then we can compute that weight for -2x on -1<x<0, and compare this with the weight of a constant function.

On the interval [0,1], geometric considerations suggest weighting g(z) by z, since the value of g(z) will be used to determine g(1) through g(1+z). To make the total weight 1, we'll use the weight 2z on [0,1]. We're in luck: this weighted average is the same for all unit intervals.

[img]/images/graemlins/diamond.gif[/img]Claim: The integral of 2z g(x+z)dz from z=0 to z=1 does not depend on x.

Proof: By Leibniz,

d/dx integral 2z g(x+z)dz from z=0 to z=1

= integral 2z g'(x+z)dz from z=0 to z=1

= - integral 2 g(x+z)dz from z=0 to z=1
+ 2z g(x+z) evaluated from z=0 to z=1 (integration by parts)

= - integral 2 g(x+z)dz from z=0 to z=1
+ 2(1) g(x+1) - 2(0) g(x+0)

= -integral 2 g(z)dz from z=x to z=x+1 + 2 g(x+1)

= -2 g(x+1) + 2 g(x+1)

= 0

Since the derivative is 0, this function is constant.

[img]/images/graemlins/diamond.gif[/img] If g is asymptotic to some constant, it must be asymptotic to the weighted average on [-1,0],

integral 2z g(z-1)dz from z=0 to z=1

= integral 2z (-2)(z-1)dz from z=0 to z=1

= 2/3

Therefore, the only possible linear asymptote of f(x) is 2x+2/3.

[img]/images/graemlins/diamond.gif[/img] We still need to establish that g(x) is asymptotic to 2/3, or equivalently, than h(x)=g(x)-2/3=f(x)-2x-2/3 is asymptotic to 0.

Claim: |h(x)| <= 1/2 Max |h(z)| where x-1<=z<=x.

Proof:

From the integral equation for g,

h(x) = Integral h(z)dz from z=x-1 to z=x.

From the conserved weighted average for g,

0 = Integral (z-(x-1)) h(z)dz from z=x-1 to z=x.

Subtracting,

h(x) = Integral (x-z) h(z)dz from z=x-1 to z=x.

|h(x)| <= Integral |x-z| |h(z)|dz from z=x-1 to z=x

<= Integral |x-z| Max |h(z)| dz from z=x-1 to z=x

<= (Max |h(z)| on [x-1,x]) Integral |x-z|dz from z=x-1 to z=x

= (Max |h(z)| on [x-1,x]) 1/2

That's what we were trying to prove.

[img]/images/graemlins/diamond.gif[/img] Claim: |f(x)-2x-2/3| < 1/2^x for x>0.

Proof: This is satisfied for x on [0,1].

Suppose there is a nonempty set of exceptions. Let X be the infimum. By the continuity of f, it must be that |h(X)|=1/2^X. By the previous lemma, |h(z)|>= 2/2^X somewhere on the interval [X-1,X], which contradicts the assumption that X was the infimum of the set of exceptions. The set of exceptions must be empty.

So, f(x) is asymptotic to 2x+2/3, and we have an exponentially decreasing bound on the error.

</font>