In an attempt to brush up on my PHP and learn a little jQuery, I decided to write a simple web app. It’s called PleaseEncourage.me. By clicking on a button, the user gets a random encouraging message. I found my friends at Tech liked using it around Finals week. The basic operation is that there is a list of strings in a PHP file that get ajaxed into the main page. I used jQuery to do some simple things like handle the ajax, fade text, drop down a box at the top for displaying more information about the site, and changing the cursor over certain links to make them appear clickable.

I also learned some things about probability. The idea was that the user would get a random string each time. However, I’ve found that when most people say they want a random quote generator, what they actually want is a randomly generated list with very infrequent repeats. This poses an interesting question: how large of a list must you have to get infrequent repeats when picking randomly from it? This is somewhat related to the Birthday problem. It turns out that picking k things from a list of length n with all the things being distinct (“no collisions”), the probability is:

(1)(1−1/n)(1−2/n)…(1−(k−1)/n).

For n large relative to k this is about 1−(1/n+2/n+…(k−1)/n)=1−k(k−1)/2n. So, if n is about as big as k squared, there will be a significant probability of a collision. If n is way bigger than k squared, the probability will be small. The expected number of collisions is k(k−1)/2n, which is larger than p. To guarantee that p is less than some probability of repeats, ϵ, make a list of n things such that n>k(k−1)/2ϵ. To find the minimum possible n for given p and k, you can solve for n in the equation for the probability of a collision.

I used this to set an arbitrary probability of a repeat string showing up, and then figuring out how many total strings I needed. Obviously, the more strings, the less chance of a repeat. That’s pretty intuitive. But quantifying it helped me understand how quickly that chance goes down and estimate the number of quotes I would need.

Code is on my Github.