标签:simple first ctime elements with imp one discus car
Reservoir sampling is proposed to solve such set of problems: Randomly choose items from a stream of elements where could be very large or unknown in advance, i.e., all elements in the stream are equally likely to be selected with probability
The algorithm works as follows.
Let’s first take a look at a simple example with . When a new item comes, we either keep with probability or keep the old selected item with probability . We repeat this process till the end of the stream, i.e., all elements in have been visited. The probability that is chosen in the end is
Thus we prove the algorithm guarantees equal probability for all elements to be chosen. A Java implementation of this algorithm should look like this:
int random(int n) {
Random rnd = new Random();
int ret = 0;
for (int i = 1; i <= n; i++)
if (rnd.nextInt(i) == 0)
ret = i;
return ret;
}
is a little tricky. One straightforward way is to simply run the previous algorithm times. However, this does require multiple passes against the stream. Here we discuss another approach to get element randomly.
For item , there are two cases to handle:
A simple implementation requires the memory space to store the selected elements, say . For every we first get a random number and keep when , i.e., . Otherwise is discarded. This guarantees the probability in the second scenario.
The proof is as previous. The probability of to be chosen is
is the probability that is replace by ad .
Below is a sample implementation in Java:
int[] random(int[] a, int k) {
int[] s = new int[k];
Random rnd = new Random();
for (int i = 0; i < k; i++)
s[i] = a[i];
for (int i = k + 1; i <= a.length; i++) {
int j = rnd.nextInt(i);
if (j < k) s[j] = a[i];
}
return s;
}
标签:simple first ctime elements with imp one discus car
原文地址:http://www.cnblogs.com/wtsb/p/7803396.html