spark MLlib的 pipeline方式

时间：2015-07-09 17:52:14 阅读：139 评论：0 收藏：0 [点我收藏+]

标签：

spark mllib的pipeline，是指将多个机器学习的算法串联到一个工作链中，依次执行各种算法。

在Pipeline中的每个算法被称为“PipelineStage”，表示其中的一个算法。PipelineStage分为两种类型，Estimator和Transformer，其中：

Transformer将数据转换为两一种形式（例如修改格式），以供后续的Estimator使用，统一的转换函数transform；
Estimator是由数据得到一个Mode（Mode也是继承于Transformer），有统一触发的函数fit。

然后一个“综合”的算法就可以通过pipeline封装起来。这样做的好处是可以很方便的替换算法。例如，我们在应用中往往只是笼统的期望一个“分类”、”拟合“这样的功能，但不知道是用分类或拟合的那个算法效果是最好的，有了这种pipeline机制后，很方便替换各种分类和拟合算法，从而得到最好的效果。

详情： https://spark.apache.org/docs/latest/ml-guide.html

/**
 * :: Experimental ::
 * A simple pipeline, which acts as an estimator. A Pipeline consists of a sequence of stages, each
 * of which is either an [[Estimator]] or a [[Transformer]]. When [[Pipeline#fit]] is called, the
 * stages are executed in order. If a stage is an [[Estimator]], its [[Estimator#fit]] method will
 * be called on the input dataset to fit a model. Then the model, which is a transformer, will be
 * used to transform the dataset as the input to the next stage. If a stage is a [[Transformer]],
 * its [[Transformer#transform]] method will be called to produce the dataset for the next stage.
 * The fitted model from a [[Pipeline]] is an [[PipelineModel]], which consists of fitted models and
 * transformers, corresponding to the pipeline stages. If there are no stages, the pipeline acts as
 * an identity transformer.
 */
@Experimental
class Pipeline(override val uid: String) extends Estimator[PipelineModel] {

From WizNote

spark MLlib的 pipeline方式

标签：

原文地址：http://www.cnblogs.com/zwCHAN/p/4633753.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行