标签:除了 删除表 重复数据 rop 同步 ble 报表 语句 效率
背景:有两个数据库(源数据库,和目标数据库),每天把源数据库了数据同步到目标数据库中,由于各种原因,怕数据丢失,所有同步8天前后的数据(有主键,不要担心重复,每天十几万条,表中已经有6千万条),但是不知道哪天有同事把主键误drop掉。
统计的BI报表数据多的离谱。经过的一番折腾,问题解决了。下面总结一下几种方法:
1)闪回:oracle有闪回技术,可以利用recyclebin(回收站)查询删除的的主键,但是这之前要把重复的数据删除。
2)利用rowid查询重复数据并且干掉相同数据除rowid最小,语句:
delete from 表 a where (a.Id,a.seq) in(select Id,seq from 表 group by Id,seq having count(*)> 1) and rowid not in (select min(rowid) from 表group by Id,seq having count(*)>1)
这条dml语句就是噩梦,因为有"not in" 如果你的数据量大,请慎用。
3)也就是经过实践的方法,效率还可以,大概5分钟就删除了。步奏如下:
1.查询表中的重复数据
select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) (a.Id,a.seq 是有重复的主键)
2.建一张表
create table lsb as select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1); commit ;(这样lsb的表结构就和表1的表结构一样)
3.删除表1里的重复数据
delete from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) ;
commit;
4.查询lsb表中的rowid最小的数据
select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1)
5.把查询出来的rowid插入到表1里
insert into 表1 select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1) ;
commit;
6.drop table lsb;
4)整体步奏
create table lsb as select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1); --也可以是临时表效率更高(不需要写磁盘)
commit ;
delete from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) ;
commit;
insert into 表1 select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1) ;
commit;
drop table lsb;
Oracle删除重复数据
标签:除了 删除表 重复数据 rop 同步 ble 报表 语句 效率
原文地址:https://www.cnblogs.com/zengchenri/p/9323105.html