标签:select val tab frame .data replace ram span str
显示所有数据库
scala> val df = spark.sql("show databases") df: org.apache.spark.sql.DataFrame = [databaseName: string] scala> df.show +------------+ |databaseName| +------------+ | bigdata| | default| | lx| +------------+
选择数据库并显示所有表
scala> spark.sql("use lx").show ++ || ++ ++ scala> spark.sql("show tables").show +--------+---------+-----------+ |database|tableName|isTemporary| +--------+---------+-----------+ | lx| cource| false| | lx| student| false| | lx| tmp| false| | lx| www| false| +--------+---------+-----------+
查询表数据
scala> spark.sql("select * from sg").show(100,false) //100条记录全显示,不截断 +---+---+-----+ |sno|cno|grade| +---+---+-----+ |1 |5 |50 | |1 |3 |70 | |2 |1 |40 | |3 |6 |50 | |4 |5 |80 | |4 |5 |70 | |6 |5 |60 | |7 |2 |40 | |8 |4 |50 | +---+---+-----+
RDD -- DataFrame -- select API
-- 创建临时表 -- 查询
//构造RDD
scala> val rdd1 = sc.parallelize(Array((1,"tom1",12),(2,"tom2",13),(3,"tom3",14))) rdd1: org.apache.spark.rdd.RDD[(Int, String, Int)] = ParallelCollectionRDD[29] at parallelize at <console>:24 //转换RDD成DataFrame scala> val df = rdd1.toDF("id","name","age") df: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field] //通过DataFrame select API实现sql中的select语句 scala> df.select("id","age").show() +---+---+ | id|age| +---+---+ | 1| 12| | 2| 13| | 3| 14| +---+---+ scala> df.create createGlobalTempView createOrReplaceTempView createTempView //创建或替换临时表 scala> df.createOrReplaceTempView def createOrReplaceTempView(viewName: String): Unit scala> df.createOrReplaceTempView("stuTable") //通过临时表操作数据 scala> spark.sql("select * from stuTable").show(100,false) +---+----+---+ |id |name|age| +---+----+---+ |1 |tom1|12 | |2 |tom2|13 | |3 |tom3|14 | +---+----+---+
标签:select val tab frame .data replace ram span str
原文地址:https://www.cnblogs.com/lybpy/p/9800599.html