Two Plus Two Older Archives - Pure probability question

Two Plus Two Older Archives

Two Plus Two Older Archives (http://archives2.twoplustwo.com/index.php)

- Probability (http://archives2.twoplustwo.com/forumdisplay.php?f=23)

- - Pure probability question (http://archives2.twoplustwo.com/showthread.php?t=303538)

bobman0330

07-29-2005 07:32 PM

Pure probability question

OK, here's one I got from a friend of a friend.

You have a uniform random distribution from (0,1). You repeatedly take numbers from this distribution and add them up. WHat is the expected number of times you will draw before you have a number > 1?

pzhon

07-29-2005 07:47 PM

Re: Pure probability question

Answer in white:
<font color="white">e=1+1+1/2+1/6+1/24+...+1/n!+...
The probability that the sum of the first n is at most 1 is 1/n!. </font>

Followup: Suppose you add the numbers up until you get 1000. Roughly how many do you expect it to take?

pzhon

07-30-2005 05:51 PM

Re: Pure probability question

[ QUOTE ]

Followup: Suppose you add the numbers up until you get 1000. Roughly how many do you expect it to take?

[/ QUOTE ]
It's easy to see that the average is a bit more than 2000. Here is a hint: The average is about 2000+a/b, where a and b are small integers. Of course, the problem is not just to determine a/b, but also to justify it.

gaming_mouse

08-01-2005 01:14 AM

Re: Pure probability question

[ QUOTE ]
Answer in white:
<font color="white">e=1+1+1/2+1/6+1/24+...+1/n!+...
The probability that the sum of the first n is at most 1 is 1/n!. </font>

[/ QUOTE ]

pzhon,

i can verify this by doing the multiple integral directly for the first few terms, but i don't see the trick behind this one line proof. can you explain?

pzhon

08-01-2005 01:57 AM

Re: Pure probability question

[ QUOTE ]
[ QUOTE ]
The probability that the sum of the first n is at most 1 is 1/n!.

[/ QUOTE ]

i can verify this by doing the multiple integral directly for the first few terms, but i don't see the trick behind this one line proof. can you explain?

[/ QUOTE ]
I presume you mean the statement above.

The part of the n-dimensional hypercube below the hyperplane x1+x2+...+xn=1 is a pyramid over the n-1 dimensional case. This has an n-dimensional volume of 1/n times the (n-1)-volume of the base, so the volume is 1/n! by induction.

There is also a volume-preserving linear transformation that takes the part with sum less than or equal to 1 to the part of the unit n-cube with x1<x2<x3<...<xn:

x1' = x1
x2' = x1+x2
x3' = x1+x2+x3
...
xn' = z1+x2+x3+...+xn.

This part of the unit n-cube has volume 1/n! by symmetry, as there are n! possible orderings of the coordinates.

pzhon

08-01-2005 02:04 AM

Re: Pure probability question

Alternate solution in white:
<font color="white">Let f(x) be the expected number of terms before the sum is at least x. f(x) = 1 + Integral with respect to z of f(z) from z=x-1 to z=x. f'(x)=f(x)-f(x-1). f(x)=0 for -1<x<0, so for 0<x<1, f'(x)=f(x). The limit of f(x) as x decreases to 0 is 1, so by solving this differential equation with initial condition f(0)=1, f(x)=e^x on 0<x<=1. f(1)=e. </font>

gaming_mouse

08-01-2005 01:57 PM

Re: Pure probability question

[ QUOTE ]

I presume you mean the statement above.

The part of the n-dimensional hypercube below the hyperplane x1+x2+...+xn=1 is a pyramid over the n-1 dimensional case. This has an n-dimensional volume of 1/n times the (n-1)-volume of the base, so the volume is 1/n! by induction.

There is also a volume-preserving linear transformation that takes the part with sum less than or equal to 1 to the part of the unit n-cube with x1<x2<x3<...<xn:

x1' = x1
x2' = x1+x2
x3' = x1+x2+x3
...
xn' = z1+x2+x3+...+xn.

This part of the unit n-cube has volume 1/n! by symmetry, as there are n! possible orderings of the coordinates.

[/ QUOTE ]

Thanks pzhon. Both of those solutions are nice, especially the second.

But why did you write out the expansion for e, as though the statement below it followed directly from that expansion (in your original answer in white)?

pzhon

08-01-2005 04:35 PM

Re: Pure probability question

[ QUOTE ]

But why did you write out the expansion for e, as though the statement below it followed directly from that expansion (in your original answer in white)?

[/ QUOTE ]
# draws necessary = Z1 + Z2 + Z3 + ...

where

Zi = 0 if the ith draw was not necessary
Zi = 1 if the ith draw was necessary (which happens with probability 1/(i-1)! )

The expected number of draws necessary is the sum of the probabilities of requiring the first draw (1/0!), the second draw (1/1!), the third draw (1/2!), etc.

gaming_mouse

08-01-2005 04:49 PM

Re: Pure probability question

ahhh... i got it. i didn't realize that was the expectation. i thought the statement below it was supposted to follow from it, rather than the other way around.

thanks,
gm

bobman0330

08-01-2005 05:37 PM

Re: Pure probability question

it is not immediately (or, to me, eventually) apparent why P(requiring draw N) = 1/(n-1)!

gaming_mouse

08-01-2005 06:08 PM

Re: Pure probability question

[ QUOTE ]
it is not immediately (or, to me, eventually) apparent why P(requiring draw N) = 1/(n-1)!

[/ QUOTE ]

check out pzhon's responses to my questions.... he gives two different proofs.

irchans

08-03-2005 07:03 PM

Why is the answer 2000 + a/b?

[ QUOTE ]
[ QUOTE ]

Followup: Suppose you add the numbers up until you get 1000. Roughly how many do you expect it to take?

[/ QUOTE ]
It's easy to see that the average is a bit more than 2000. Here is a hint: The average is about 2000+a/b, where a and b are small integers. Of course, the problem is not just to determine a/b, but also to justify it.

[/ QUOTE ]

OK,
This problem seems difficult. After a fair amount of work (solving ODE's) I got a general expression, f[n], for the expected number of random numbers needed to exceed an integer n. When I compute the value of this expression for various values of n, it is very very near to 2*n+ a/b for two rather small integers a and b. (See pzhon's post.) I have no idea why the answer is so close to this fraction. The values for f[n], a, and b are given below in white with some more details.

<font color="white"> I got that the expected number of random numbers needed to exceed n is exactly

f[n] = Sum[ (-n + i)^i/i! Exp[n - i], {i, 0, n}].

For
n=1 f[n] = 2.7182818284590452354 = e
n=2 f[n] = 4.6707742704716049919 = e^2 - e
n=3 f[n] = 6.6665656395558899041 = e^3 - 2 e^2 + e/2
n=4 f[n] = 8.6666044900326954372
n=5 f[n] = 10.666662068622411858
n=6 f[n] = 12.666667141378121401
n=7 f[n] = 14.666666781522143450
n=8 f[n] = 16.666666670426887824
n=9 f[n] = 18.666666665270321349
n=10 f[n] = 20.666666666476318801
n=11 f[n] = 22.666666666669991177
n=12 f[n] = 24.666666666669900329
n=13 f[n] = 26.666666666666922273
n=14 f[n] = 28.666666666666641305
n=15 f[n] = 30.666666666666660344
n=16 f[n] = 32.666666666666666457
n=17 f[n] = 34.666666666666666744
n=18 f[n] = 36.666666666666666677
n=19 f[n] = 38.666666666666666666
n=20 f[n] = 40.666666666666666666

So I figure to get a sum of 1000, you need an average of 2000+2/3 random numbers. As I said, I have no idea why f[n] is approximately equal to 2*n + 2/3.

pzhon, you said "It's easy to see ...". Do you care to enlighten us?

</font>

pzhon

08-04-2005 05:23 AM

Re: Why is the answer 2000 + a/b?

Followup answer in white:
<font color="white">
To justify that the answer is very roughly 2000, you can use the Central Limit Theorem.

To get the exact answer, you can solve some lag differential equations for f(x) = Expected number of draws before the total is at least x, namely f'(x)=f(x)-f(x-1). Numerically, you can see that f(n) is very close to 2n+2/3. However, as irchans points out, this is not obvious from the form of the exact answer.

Here is a way to prove that f(x) is asymptotic to 2x+2/3:

[img]/images/graemlins/diamond.gif[/img] For simplicity, consider g(x) = f(x)-2x. This simplifies the integral equation:

f(x) = 1 + integral f(z)dz from z=x-1 to z=x, f(x)=0 on -1<x<0.

g(x)
= -2x + f(x)
= -2x + 1 + integral f(z)dz from z=x-1 to z=x
= -2x + 1 + integral (g(z) + 2z)dz from z=x-1 to z=x
= -2x + 1 + integral 2zdz + integral g(z)dz from z=x-1 to z=x
= -2x + 1 + x^2 - (x-1)^2 + integral g(z)dz from z=x-1 to z=x
= integral g(z)dz from z=x-1 to z=x.

Note that g(x) = -2x on -1<x<0.

[img]/images/graemlins/diamond.gif[/img] If g is asymptotic to some constant, what must that constant be? It would help to find some sort of weighted average of g on an interval that is preserved by the progression of the integral equation. Then we can compute that weight for -2x on -1<x<0, and compare this with the weight of a constant function.

On the interval [0,1], geometric considerations suggest weighting g(z) by z, since the value of g(z) will be used to determine g(1) through g(1+z). To make the total weight 1, we'll use the weight 2z on [0,1]. We're in luck: this weighted average is the same for all unit intervals.

[img]/images/graemlins/diamond.gif[/img]Claim: The integral of 2z g(x+z)dz from z=0 to z=1 does not depend on x.

Proof: By Leibniz,

d/dx integral 2z g(x+z)dz from z=0 to z=1

= integral 2z g'(x+z)dz from z=0 to z=1

= - integral 2 g(x+z)dz from z=0 to z=1
+ 2z g(x+z) evaluated from z=0 to z=1 (integration by parts)

= - integral 2 g(x+z)dz from z=0 to z=1
+ 2(1) g(x+1) - 2(0) g(x+0)

= -integral 2 g(z)dz from z=x to z=x+1 + 2 g(x+1)

= -2 g(x+1) + 2 g(x+1)

= 0

Since the derivative is 0, this function is constant.

[img]/images/graemlins/diamond.gif[/img] If g is asymptotic to some constant, it must be asymptotic to the weighted average on [-1,0],

integral 2z g(z-1)dz from z=0 to z=1

= integral 2z (-2)(z-1)dz from z=0 to z=1

= 2/3

Therefore, the only possible linear asymptote of f(x) is 2x+2/3.

[img]/images/graemlins/diamond.gif[/img] We still need to establish that g(x) is asymptotic to 2/3, or equivalently, than h(x)=g(x)-2/3=f(x)-2x-2/3 is asymptotic to 0.

Claim: |h(x)| <= 1/2 Max |h(z)| where x-1<=z<=x.

Proof:

From the integral equation for g,

h(x) = Integral h(z)dz from z=x-1 to z=x.

From the conserved weighted average for g,

0 = Integral (z-(x-1)) h(z)dz from z=x-1 to z=x.

Subtracting,

h(x) = Integral (x-z) h(z)dz from z=x-1 to z=x.

|h(x)| <= Integral |x-z| |h(z)|dz from z=x-1 to z=x

<= Integral |x-z| Max |h(z)| dz from z=x-1 to z=x

<= (Max |h(z)| on [x-1,x]) Integral |x-z|dz from z=x-1 to z=x

= (Max |h(z)| on [x-1,x]) 1/2

That's what we were trying to prove.

[img]/images/graemlins/diamond.gif[/img] Claim: |f(x)-2x-2/3| < 1/2^x for x>0.

Proof: This is satisfied for x on [0,1].

Suppose there is a nonempty set of exceptions. Let X be the infimum. By the continuity of f, it must be that |h(X)|=1/2^X. By the previous lemma, |h(z)|>= 2/2^X somewhere on the interval [X-1,X], which contradicts the assumption that X was the infimum of the set of exceptions. The set of exceptions must be empty.

So, f(x) is asymptotic to 2x+2/3, and we have an exponentially decreasing bound on the error.

</font>

All times are GMT -4. The time now is 09:18 AM.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.