spark-sql> use saledata; //所有订单中每年的销售单数、销售总额 spark-sql> select c.theyear,count(distinct a.ordernumber),sum(b.amount) from tblStock a join tblStockDetail b on a.ordernumber=b.ordernumber join tbldate c on a.dateid=c.dateid group by c.theyear order by c.theyear;运行结果:
[hadoop@hadoop3 spark110]$ sbin/start-thriftserver.sh --help Usage: ./sbin/start-thriftserver [options] [thrift server options] Thrift server options: Use value for given property
[hadoop@hadoop3 spark110]$ bin/spark-sql --help Usage: ./bin/spark-sql [options] [cli option] CLI options: -d,--define <key=value> Variable subsitution to apply to hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database to use -e <quoted-query-string> SQL from command line -f <filename> SQL from files -h <hostname> connecting to Hive Server on remote host --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -p <port> connecting to Hive Server on port number -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console)其中[options] 是CLI启动一个SparkSQL应用程序的参数,如果不设置--master的话,将在启动ThriftServer的机器以local方式运行,只能通过http://机器名:4040进行监控。[cli option]是CLI的参数,通过这些参数,CLI可以直接运行SQL文件、进入命令行运行SQL命令等等,类似以前的shark的使用。需要注意的是CLI不是使用JDBC连接,所以不能连接到ThriftServer;但可以配置conf/hive-site.xml连接到hive的metastore(如令人惊讶的CLI中所述),然后对hive数据进行查询。
//配置thriftserver的hive-site.xml <configuration> <property> <name>hive.metastore.uris</name> <value>thrift://hadoop3:9083</value> <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> </property> <property> <name>hive.server2.thrift.min.worker.threads</name> <value>5</value> <description>Minimum number of Thrift worker threads</description> </property> <property> <name>hive.server2.thrift.max.worker.threads</name> <value>500</value> <description>Maximum number of Thrift worker threads</description> </property> <property> <name>hive.server2.thrift.port</name> <value>10000</value> <description>Port number of HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>hadoop2</value> <description>Bind host on which to run the HiveServer2 Thrift interface.Can be overridden by setting$HIVE_SERVER2_THRIFT_BIND_HOST</description> </property> </configuration>
//配置CLI的hive-site.xml <configuration> <property> <name>hive.metastore.uris</name> <value>thrift://hadoop3:9083</value> <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> </property> </configuration>然后启动spark-sql:
bin/spark-sql --master spark://hadoop1:7077 --executor-memory 3g这是集群监控页面可以看到启动了SparkSQL应用程序:
SET spark.sql.shuffle.partitions=20;运行同一个查询语句,参数改变后,Task(partition)的数量就由200变成了20。
sparkSQL1.1入门之七:ThriftServer和CLI
原文地址:http://blog.csdn.net/book_mmicky/article/details/39152727