码迷,mamicode.com
首页 > 其他好文 > 详细

spark dataFrame withColumn

时间:2018-06-25 20:32:01      阅读:6643      评论:0      收藏:0      [点我收藏+]

标签:data   port   加一列   not   string   example   mic   sch   show   

说明:withColumn用于在原有DF新增一列

1. 初始化sqlContext

val sqlContext = new org.apache.spark.sql.SQLContext(sc) 

2.导入sqlContext隐式转换

import sqlContext.implicits._ 

3.  创建DataFrames

 

val df = sqlContext.read.json("file:///usr/local/spark-2.3.0/examples/src/main/resources/people.json")

4. 显示内容

df.show()  

| age|   name|

 

+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|

5. 为原有df新加一列

df.withColumn("id2", monotonically_increasing_id()+1) 

6. 显示添加列后的内容

 res6.show() 

+----+-------+---+
| age|   name|id2|
+----+-------+---+
|null|Michael|  1|
|  30|   Andy|  2|
|  19| Justin|  3|
+----+-------+---+

 

完成的过程如下:

 

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)

 

warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@2513155a
scala> import sqlContext.implicits._
import sqlContext.implicits._
scala> val df = sqlContext.read.json("file:///usr/local/spark-2.3.0/examples/src/main/resources/people.json")
2018-06-25 18:55:30 WARN  ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
2018-06-25 18:55:30 WARN  ObjectStore:568 - Failed to get database default, returning NoSuchObjectException
2018-06-25 18:55:32 WARN  ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala> df.show()
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+

 

scala> df.withColumn("id2", monotonically_increasing_id()+1)
res6: org.apache.spark.sql.DataFrame = [age: bigint, name: string ... 1 more field]
scala> res6.show()
+----+-------+---+
| age|   name|id2|
+----+-------+---+
|null|Michael|  1|
|  30|   Andy|  2|
|  19| Justin|  3|
+----+-------+---+

 

spark dataFrame withColumn

标签:data   port   加一列   not   string   example   mic   sch   show   

原文地址:https://www.cnblogs.com/abcdwxc/p/9225855.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!