Inspired by
this post in OOT.
We ask random people for their birthday, and we want to know how long it will be before we get a representative for every day of the year. Obviously the minimum required is 365 people, but what is the distribution of the waiting time till we get a person for every bday?
I was thinking about it like this... number every day in the year 1,2,... 365. So the problem can be reduced to generating random numbers between 1 and 365. After asking N people their birthday (or generating N random numbers), the chance that a particular birthday (say Jan 1) has not yet been found is:
(364/365)^N
Letting E1 = event that birthday 1 is not yet found
E2 = same for bday 2
etc
The chance that we have not completed the full bday spectrum after N should be:
P(E1 or E2 or E3... or E365)
We should then be able to use inclusion-exlcusion. However, there must be an error in my thinking b/c I get
365*(364/365)^N - (365 choose 2)*(363/365)^N
for the first 2 terms, which gives you a negative number for N=400. What's wrong with this approach? What a better way to do it?
gm