码迷,mamicode.com
首页 > 其他好文 > 详细

使用Scala基于词法单元的解析器定制EBNF范式文法解析

时间:2017-03-07 23:08:55      阅读:304      评论:0      收藏:0      [点我收藏+]

标签:ann   and   mit   rem   内容   soft   font   nowrap   ack   

一、前言

       近期在做Oracle迁移到Spark平台的项目上遇到了一些平台公式翻译为SparkSQL(on Hive)的需求,而Spark采用亲妈语言Scala进行开发。分析过大概需求过后,拟使用编译原理中的EBNF范式模式,进行基于词法的文法解析。于是拟采用传统的正则词法解析EBNF文法解析的套路来实现,直到发现了StandardTokenParsers这个Scala基于词法单元的解析器类。

二、平台公式及翻译后的SparkSQL

       平台公式的样子如下所示:
  1. 1 if(XX1_m001[D003]="邢おb7肮α?薇"|| XX1_m001[H003]<"2")&& XX1_m001[D005]!="wed" then XX1_m001[H022,COUNT]

  这里面字段值”邢おb7肮α?薇”为这个的目的是为了测试各种字符集是否都能匹配满足。

  那么对应的SparkSQL应该是这个样子的,由于是使用的Hive on Spark,因而长得跟Oracle的SQL语句差不多:

 

 

  1. 1 SELECT COUNT(H022) FROM XX1_m001 WHERE (XX1_m001.D003=邢おb7肮α?薇 OR XX1_m001.H003<2) AND XX1_m001.D005<wed’

       总体而言比较简单,因为我只是想在这里做一个Demo。

 

三、平台公式的EBNF范式及词法解析设计

 

  1. 1 expr-condition ::= tableName "[" valueName "]" comparator Condition
    2 expr-front ::= expr-condition (("&&"|"||")expr-front)*
    3 expr-back ::= tableName "[" valueName ","operator"]"
    4 expr ::="if" expr-front "then" expr-back

     

 

       其中词法定义如下

  1. 1 operator=>[SUM,COUNT]
    2 tableName,valueName =>ident #ident为关键字
    3 comparator =>["=",">=","<=",">","<","!="]
    4 Condition=> stringLit #stringLit为字符串常量

     

四、使用Scala基于词法单元的解析器解析上述EBNF文法

        Scala基于词法单元的解析器是需要继承StandardTokenParsers这个类的,该类提供了很方便的解析函数,以及词法集合。我们可以通过使用lexical.delimiters列表来存放在文法翻译器执行过程中遇到的分隔符,使用lexical.reserved列表来存放执行过程中的关键字。 

       比如,我们参照平台公式,看到"=",">=","<=",">","<","!=","&&","||","[","]",",","(",")"这些都是分隔符,其实我们也可以把"=",">=","<=",">","<","!=","&&","||"当做是关键字,但是我习惯上将带有英文字母的单词作为关键字处理。因而,这里的关键字集合便是"if","then","SUM","COUNT"这些。 
       表现在代码中是酱紫的:

 

  1. 1 lexical.delimiters +=("=",">=","<=",">","<","!=","&&","||","[","]",",","(",")")
    2 lexical.reserved +=("if","then","SUM","COUNT")

    是不是so easy~。我们再来看一下如何使用基于词法单元的解析器解析前面我们设计的EBNF文法呢。我在这里先上代码:

 

  1.  1 classExprParsre extends StandardTokenParsers{
     2     lexical.delimiters +=("=",">=","<=",">","<","!=","&&","||","[","]",",","(",")")
     3     lexical.reserved +=("if","then","SUM","COUNT")
     4     def expr:Parser[String]="if"~ expr_front ~"then"~ expr_back ^^{
     5         case "if"~ exp1 ~"then"~ exp2 => exp2 +" WHERE "+exp1
     6     }
     7     def expr_priority:Parser[String]= opt("(")~ expr_condition ~ opt(")")^^{
     8         case Some("(")~ conditions ~Some(")")=>"("+ conditions +")"
     9         case Some("(")~ conditions ~None=>"("+ conditions
    10         case None~ conditions ~Some(")")=> conditions +")"
    11         case None~ conditions ~None=> conditions
    12     }
    13     def expr_condition:Parser[String]= ident ~"["~ ident ~"]"~("="|">="|"<="|">"|"<"|"!=")~ stringLit ^^{
    14         case ident1~"["~ident2~"]"~"="~stringList => ident1 +"."+ ident2 +"=‘"+ stringList +"‘"
    15         case ident1~"["~ident2~"]"~">="~stringList => ident1 +"."+ ident2 +">=‘"+ stringList +"‘"
    16         case ident1~"["~ident2~"]"~"<="~stringList => ident1 +"."+ ident2 +"<=‘"+ stringList +"‘"
    17         case ident1~"["~ident2~"]"~">"~stringList => ident1 +"."+ ident2 +">‘"+ stringList +"‘"
    18         case ident1~"["~ident2~"]"~"<"~stringList => ident1 +"."+ ident2 +"<‘"+ stringList +"‘"
    19         case ident1~"["~ident2~"]"~"!="~stringList => ident1 +"."+ ident2 +"!=‘"+ stringList +"‘"
    20     }
    21     def comparator:Parser[String]=("&&"|"||")^^{
    22         case"&&"=>" AND "
    23         case"||"=>" OR "
    24     }
    25     def expr_front:Parser[String]= expr_priority ~ rep(comparator ~ expr_priority)^^{
    26         case exp1 ~ exp2 => exp1 + exp2.map(x =>{x._1 +" "+ x._2}).mkString(" ")
    27     }
    28     def expr_back:Parser[String]= ident ~"["~ ident ~","~("SUM"|"COUNT")~"]"^^{
    29         case ident1~"["~ident2~","~"COUNT"~"]"=>"SELECT COUNT("+ ident2.toString()+") FROM "+ ident1.toString()
    30         case ident1~"["~ident2~","~"SUM"~"]"=>"SELECT SUM("+ ident2.toString()+") FROM "+ ident1.toString()
    31     }
    32     def parserAll[T]( p :Parser[T], input :String)={
    33         phrase(p)(new lexical.Scanner(input))
    34     }
    35 }

     

 

内容更新于: 2017-01-03 20:37:58 
链接地址: http://zhkmxx930.leanote.com/post/%E4%BD%BF%E7%94%A8Scala%E5%9F%BA%E4%BA%8E%E8%AF%8D%E6%B3%95%E5%8D%95%E5%85%83%E7%9A%84%E8%A7%A3%E6%9E%90%E5%99%A8%E5%AE%9A%E5%88%B6EBNF%E8%8C%83%E5%BC%8F%E6%96%87%E6%B3%95%E8%A7%A3%E6%9E%90

 

 

 

 





使用Scala基于词法单元的解析器定制EBNF范式文法解析

标签:ann   and   mit   rem   内容   soft   font   nowrap   ack   

原文地址:http://www.cnblogs.com/zhkmxx930/p/0b4d2332a77fd23b3516348f4ca2223f.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!