本例背景为: 用PDI(Kettle) 向Mysql数据库导入大量的日志分析数据,开始导入的速度300+r/s,
通过设置如下JDBC的连接参数,明显提升了写入的速度。
useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true
原理参考 :http://forums.pentaho.com/showthread.php?142217-Table-Output-Performance-MySQL#9
To remedy this, in PDI I create a separate, specialized Database Connection I use for batch inserts. Set these two MySQL-specific options on your Database Connection:
useServerPrepStmts false
rewriteBatchedStatements true
Used together, these "fake" batch inserts on the client. Specificially, the insert statements:
INSERT INTO t (c1,c2) VALUES (‘One‘,1);
INSERT INTO t (c1,c2) VALUES (‘Two‘,2);
INSERT INTO t (c1,c2) VALUES (‘Three‘,3);
will be rewritten into:
INSERT INTO t (c1,c2) VALUES (‘One‘,1),(‘Two‘,2),(‘Three‘,3);
So that the batched rows will be inserted with one statement (and one network round-trip). With this simple change, Table Output is very fast and close to performance of the bulk loader steps.
本文出自 “强子的快乐生活” 博客,请务必保留此出处http://fuqiang82.blog.51cto.com/1398227/1628093
原文地址:http://fuqiang82.blog.51cto.com/1398227/1628093