标签:
这两个任务的作用是数据清洗(Data Cleansing)。
Fuzzy Lookup通过引用另外一张数据库表或者索引来进行相似值匹配。这种组件对于标准化和查找可能错误的客户端数据非常有用。例如像地址或者像城市名这种属性栏位非常有用。
Fuzzy Lookup不仅会输出它的匹配值,同时还会输出similarity和confidence两个属性列。similarity用一个0到1之间的浮点值来表示匹配对间值得相似度。比如Jerry Chan和Jerry Chen的相似度可能是0.89。而对于Confidence,它的值越高代表它可选的匹配对越少。
Fuzzy Lookup一共有4种选择来配置参考表(Reference Table):
1)Generate New Index:根据参考表的参考栏位在内存中建立一条临时索引用来做数据匹配,任务完成后把它删除;
2)Generate New Index + Store New Index选项:相当于建立一条索引在数据库中;
3)Generate New Index + Store New Index选项 + Maintain Stored Index选项:这种情况下勾了Maintain Stored Index选项将会在reference表建一个触发器来捕捉更新以同步更新到该新建的索引;
4)Use Existing Index
Fuzzy Lookup Transformation: Capable of joining to external data based on data similarity,
the Fuzzy Lookup Transformation is a core data cleansing tool in SSIS. This transformation
is perfect if you have dirty data input that you want to associate to data in a table in your
database based on similar values. Later in the chapter, you’ll take a look at the details of the
Fuzzy Lookup Transformation and what happens behind the scenes
Fuzzy Grouping Transformation: The main purpose is de-duplication of similar data. The
Fuzzy Grouping Transformation is ideal if you have data from a single source and you know
you have duplicates that you need to find.
Data Flow ->> Fuzzy Lookup & Fuzzy Grouping
标签:
原文地址:http://www.cnblogs.com/jenrrychen/p/4573726.html