PDA

View Full Version : Advanced Random Walk Question (bruce, pzhon)

gaming_mouse
08-07-2005, 01:40 AM
We know that if the random variable X represents the distance from the origin after a random walk of N steps, then, for sufficiently large N, X is normal with mean 0 and SD sqrt(N).

But consider the different but related random variable Y, which represents the maximum distance from the origin over the entire walk. That is, the farthest away from 0 that we ever get at any point during the N steps.

What is the distribution of Y?

pzhon
08-07-2005, 06:41 AM
[ QUOTE ]
But consider the different but related random variable Y, which represents the maximum distance from the origin over the entire walk. That is, the farthest away from 0 that we ever get at any point during the N steps.

[/ QUOTE ]
There are two things you might mean by that. One possibility is that you are allowing the distance to be negative, so you want the distribution of the maximum. (This is suggested by your assertion that the distance is normally distributed.) Another possibility is that you are asking what the probability is that the random walk stays between -x and x. Both can be solved by using a reflection method, but before I write up something complicated, which one is your question?

jason1990
08-07-2005, 10:45 AM
It sounds like you're asking about Donsker's Invariance Principle. See these notes:
http://www.stat.berkeley.edu/users/peres/bmall.pdf

There is an application that looks like what you might mean here. The relevant section starts on p.43 and the example is on p.44. (These page numbers are in the numbering scheme of the document, but they are actually the 46th and 47th pages of the PDF.)

gaming_mouse
08-07-2005, 11:30 AM
[ QUOTE ]

Another possibility is that you are asking what the probability is that the random walk stays between -x and x.

[/ QUOTE ]

Hey pzhon,

Sorry that wasn't more clear.

I was actually more interested in the one-sided question: What is the chance that the random walk stays between 0 and x?

gaming_mouse
08-07-2005, 11:51 AM
[ QUOTE ]

There is an application that looks like what you might mean here. The relevant section starts on p.43 and the example is on p.44. (These page numbers are in the numbering scheme of the document, but they are actually the 46th and 47th pages of the PDF.)

[/ QUOTE ]

Thanks for the reference jason.

Tell me if I understand this correctly... Let's work out a numerical example. Let N = 1000 and we wish to find the chance that the maximum point during the walk is greater than 50. First we solve for "a" in the 2nd formula in section 16.1:

a = 1.58

The answer to the question, then, is 2*(the area to the right of 1.58 on the standard normal density function), or about 11%. That seems a little low to me, so I think I may have done something wrong....

jason1990
08-07-2005, 12:01 PM
That looks right to me. So it's about 89% to stay below 50 in the first 1000 steps. Note that staying below 50 puts no restrictions on the lower end. So it could drift down to -800, for example, and still qualify as staying below 50.

gaming_mouse
08-07-2005, 12:45 PM
[ QUOTE ]
That looks right to me. So it's about 89% to stay below 50 in the first 1000 steps. Note that staying below 50 puts no restrictions on the lower end. So it could drift down to -800, for example, and still qualify as staying below 50.

[/ QUOTE ]

Cool. Thanks again.

pzhon
08-08-2005, 02:38 AM
[ QUOTE ]

I was actually more interested in the one-sided question: What is the chance that the random walk stays between 0 and x?

[/ QUOTE ]
That's a third question ((-x,x), (-oo,x), [0,x]). I still don't know which one you want. All three can be answered by using reflection arguments, or the involution method of combinatorics.

There are analogues for continuous Brownian motion, but things work out so nicely in the discrete case that it is worth looking at that.

Suppose you want to count the paths of n steps starting at the origin that never reach k&gt;0.

The number of unrestricted paths from 0 to a is n choose (n+-a)/2. Here, we will use the convention that if (n-a)/2 is not an integer, this binomial coefficient is 0.

We want to throw out the paths that go from 0 to k, then drop to a. If k&gt;a, every one of these corresponds to a path from 0 to k+(k-a) by reflecting the part after you first hit k, and vice versa. So, the number of such paths is n choose (n+-(2k-a))/2.

The number of paths from 0 to a that never hit k is nC(n-a)/2 - nC(k+(n-a)/2. You can sum these terms over all values of a from -n to k-1 to get the total number of paths. Note that there are positive nCm terms, and negative ones. There is a lot of cancellation, leaving only positive terms, since the first few "negative" terms are actually 0 instead.

k-1
Sum nC(n-a)/2 - nC(k+(n-a)/2) =
a=-n

k-1
Sum nC(n-a)/2 -
a=-n

-k-1
Sum nC(n-b)/2 =
b=-n-2k

k-1
Sum nC(n-a)/2
a=-k

What is the probability that the maximum is exactly k? It is the number of paths that never reach k+1 minus the paths that never reach k.

k
Sum nC(n-a)/2 -
a=-k-1

k-1
Sum nC(n-a)/2
a=-k

= nC(n+k+1)/2 + nC(n-k)/2

Only one of these two terms is nonzero. Which one depends on the parities of n and k. For example, here are the counts of the paths of length 6 with each possible maximum:

k=0 6C3 = 20
k=1 6C2 = 15
k=2 6C2 = 15
k=3 6C1 = 6
k=4 6C1 = 6
k=5 6C0 = 1
k=6 6C0 = 1

I think there ought to be a bijective proof of this simple expression, but I've only worked out a bijection in some special cases.

Later, I might post a solution to the other two problems.

AaronBrown
08-08-2005, 08:17 PM
The answer is about right, but there's a simpler and more accurate way to get it.

The reflection principle for a discrete random walk tells us to add the probability that the random walk is exactly 50 at the end of 1,000 steps to twice the probability that it is greater than 50. The continuous version is simpler, because the probability of being exactly 50 is zero.

Let s = SQRT(1,000) = 31.62.

The probability of being greater than 50 after 1,000 steps is approximately: N(-50.5/s) = 0.0551. The probability of being equal to 50 after 1,000 steps is approximately N(-49.5/2) - N(-50.5/2) = 0.0036. 0.0036 + 2*0.0551 = 0.1139.

pzhon
08-09-2005, 07:51 AM
The involution method is a technique in combinatorics with many applications. An involution is just a fancy name for a permutation of order two. Every item is either fixed by the involution, of swapped with another item. The idea is that we would like to count a set S. Instead, we count a larger set T (containing S) where some of the elements count as positive and some negative, and find an involution I so that

/images/graemlins/diamond.gif Every element of S is fixed by the involution.
/images/graemlins/diamond.gif Every element of T\S has its sign reversed by I.

Then the signed count of the elements of T equals the count of the elements of S.

The reflection argument was an example of the involution method. We counted the paths from 0 to a that never reached k by counting the elements in a larger set, the paths from 0 to a and the paths from 0 to 2k-a, where the paths from 0 to 2k-a were given a negative sign. The involution was to reflect the part of the path after the path first hit k. This had no effect on the paths that never reached k, and it changed every path from 0 to 2k-a into a path from 0 to a that reached k and vice versa.

We'd like to apply the same idea to the paths from 0 to a that didn't reach k or -k. One possibility is to reflect the remainder of the paths when they first hit k or -k. However, this doesn't work as before, since a path from 0 to a might hit both -k and k, so it would correspond to a path from 0 to 2k-a and to a path from 0 to -2k-a. It would be removed twice if we take nC(n-a)/2 - nC(n-2k+a)/2 - nC(n+2k+a)/2.

Let's look for a simple signed set T so that the involution of reflecting the first time the paths reach +-k leaves only the paths that stay between -k and k fixed. We'll give a positive sign to paths of length n from 0 to elements of a set P, and negative signs to paths of length n from 0 to elements of a set N. We need to determine P and N.

P should contain a, since we need to give a positive sign to the paths to a that stay between -k and k.

N should contain 2k-a and -2k-a, since these are images of paths to a under the reflections at k and -k.

P should contain 4k+a and -4k+a, since there are the images of paths to -2k-a and 2k-a under the reflections.

N should contain 6k-a and -6k-a.
P should contain 8k+a and -8k+a.
...
P should contain 4mk+a for each integer m.
N should contain (4m+2)k-a for each integer m.

The involution switches paths to P and paths to N except that it leaves fixed the paths to a staying between -k and k. So, the number of paths from 0 to a staying between -k and k is

Sum n choose (n-b)/2 -
b in P

Sum n choose (n-b)/2
b in N

=

Sum n C (n-4mk-a)/2 -
m

Sum n C (n-(4m+2)k+a)/2
m

To find all constrained paths, you can sum over all possible values of a between -k+1 and k-1:

Sum (n C (n-a)/2) f(a)
a

where

f(a) = +1 if a mod 4k is in (-k+1,k-1)
f(a) = 0 if a mod 4k is -k or k
f(a) = -1 if a mod 4k is in (k+1,3k-1)

For example, the number of paths of length 20 that stay strictly between -3 and 3 is

-20C0 (a=20)
-20C1 (a=18)
-20C2 (a=16)
+20C3 (a=14)
+20C4 (a=12)
+20C5 (a=10)
-20C6 (a=8)
-20C7 (a=6)
-20C8 (a=4)
+20C9 (a=2)
+20C10 (a=0)
+20C11 (a=-2)
-20C12 (a=-4)
-20C13 (a=-6)
-20C14 (a=-8)
+20C15 (a=-10)
+20C16 (a=-12)
+20C17 (a=-14)
-20C18 (a=-16)
-20C19 (a=-18)
-20C20 (a=-20)
---------------
78,732

An alternative method is to take the 20th power of the transfer matrix

01000
10100
01010
00101
00010

9842 0 19683 0 9841
0 29525 0 29524 0
19683 0 39366 0 19683
0 29524 0 29525 0
9841 0 19683 0 9842

The sum of the middle column or row is 78,732, confirming the count by the involution method.

The same argument works for Brownian motion, though the result is that the probability density of the constrained paths is an infinite sum of signed densities. The same argument also works for paths constrained asymmetrically, e.g., paths constrained to stay between 0 and k, inclusive.

jason1990
08-11-2005, 04:16 AM
I just wanted to point something out. When the walk and the functional are fairly simple (for example, the walk steps either +1 or -1 units and the functional you are considering is the maximum of the walk), then there are a lot of things that can be computed exactly or even computed approximately in a more accurate way than that given by Donsker's Invariance Principle. These analyses involve interesting and important mathematics.

I linked you to the Invariance Principle, not to try to cite some fancy mathematics which is overkill, but because the Invariance Principle is the natural extension of the Central Limit Theorem for random walks. It works for a very general class of walks and applies to any continuous functional of the path. In my opinion, the Invariance Principle alone could be motivation enough for someone to learn (more) about Brownian motion.