码迷,mamicode.com
首页 > 其他好文 > 详细

转-spark抽样之蓄水池抽样

时间:2018-12-23 22:11:22      阅读:192      评论:0      收藏:0      [点我收藏+]

标签:log   blog   with   first   res   replace   detail   @param   article   

1.蓄水池抽样算法(Reservoir Sampling)

https://www.jianshu.com/p/7a9ea6ece2af

2.spark抽样之蓄水池抽样

https://blog.csdn.net/snaillup/article/details/69524931?utm_source=blogxgwz3

 

代码:

 /**
   * Reservoir sampling implementation that also returns the input size.
   *
   * @param input input size
   * @param k reservoir size
   * @param seed random seed
   * @return (samples, input size)
   */
  def reservoirSampleAndCount[T: ClassTag](
      input: Iterator[T],
      k: Int,
      seed: Long = Random.nextLong())
    : (Array[T], Long) = {
    val reservoir = new Array[T](k)
    // Put the first k elements in the reservoir.
    var i = 0
    while (i < k && input.hasNext) {
      val item = input.next()
      reservoir(i) = item
      i += 1
    }

    // If we have consumed all the elements, return them. Otherwise do the replacement.
    if (i < k) {
      // If input size < k, trim the array to return only an array of input size.
      val trimReservoir = new Array[T](i)
      System.arraycopy(reservoir, 0, trimReservoir, 0, i)
      (trimReservoir, i)
    } else {
      // If input size > k, continue the sampling process.
      var l = i.toLong
      val rand = new XORShiftRandom(seed)
      while (input.hasNext) {
        val item = input.next()
        l += 1
        // There are k elements in the reservoir, and the l-th element has been
        // consumed. It should be chosen with probability k/l. The expression
        // below is a random long chosen uniformly from [0,l)
        val replacementIndex = (rand.nextDouble() * l).toLong
        if (replacementIndex < k) {
          reservoir(replacementIndex.toInt) = item
        }
      }
      (reservoir, l)
    }
  }

  

转-spark抽样之蓄水池抽样

标签:log   blog   with   first   res   replace   detail   @param   article   

原文地址:https://www.cnblogs.com/moonlightml/p/10165585.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!