经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 程序设计 » SQL语言 » 查看文章
not in/ not exists 有哪些可以替代操作
来源:cnblogs  作者:li0924  时间:2019/9/25 9:18:55  对本文有异议

需求
  例如一张A表(id, userid, info);其中id为自增长主键。存在userid重复记录;并一直有新数据增加,现需要根据userid去重并实时插入表B。

用SQL语法来解释:

  1. -- 获取表B中记录最大id
  2. select max(id) from B
  3. -- 根据上面获取的id;提取最新的A表记录;根据userid去重;并插入中间表TMP_B;
  4. INSERT INTO TMP_B
  5. SELECT id, userid, info
  6. FROM (select id,
  7. userid,
  8. info,
  9. ROW_NUMBER() OVER(PARTITION BY userid ORDER BY id) RN
  10. FROM A
  11. WHERE id > ?)
  12. WHERE RN = 1;
  13. -- 把上面获取的记录;不存在则插入表B
  14. insert into B
  15. select id, userid, info
  16. from TMP_B
  17. where userid not in (select userid from B);
  18. -- 或者用not exists
  19. insert into B
  20. select id, userid, info
  21. from TMP_B tb
  22. where not exists (select 1 from B b where b.userid = tb.userid);

其中需要改写的语句是

  1. select id, userid, info
  2. from TMP_B
  3. where userid not in (select userid from B);
    -- 插入实验数据

-- 在表B插入百万条记录
insert into B select level, dbms_random.string('x',8), 'lottu'||level from dual connect by level <= 1000000;
-- 在表TMP_B插入十万条记录
insert into TMP_B select level, dbms_random.string('x',8), 'lottu'||level from dual connect by level <= 100000;

通过执行计划;会发现not in/ not exists 效率相差不大。

  1. insert into B
  2. select id, userid, info from TMP_B tb
  3. where not exists (select 1 from B b where b.userid = tb.userid);
  4. 100000 rows created.
  5. Elapsed: 00:00:00.52
  6. Execution Plan
  7. ----------------------------------------------------------
  8. Plan hash value: 3462170537
  9. ------------------------------------------------------------------------------------------
  10. | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|Time |
  11. ------------------------------------------------------------------------------------------
  12. | 0 | INSERT STATEMENT | | 115K| 4389K| | 2994 (1)|00:00:36 |
  13. | 1 | LOAD TABLE CONVENTIONAL | B | | | | | |
  14. |* 2 | HASH JOIN ANTI | | 115K| 4389K| 4960K| 2994 (1)|00:00:36 |
  15. | 3 | TABLE ACCESS FULL | TMP_B | 115K| 3601K| | 137 (1)|00:00:02 |
  16. | 4 | TABLE ACCESS FULL | B | 1375K| 9403K| | 1372 (1)|00:00:17 |
  17. -----------------------------------------------------------------------------------------
  18. Predicate Information (identified by operation id):
  19. ---------------------------------------------------
  20. 2 - access("B"."USERID"="TB"."USERID")
  21. Note
  22. -----
  23. - dynamic sampling used for this statement (level=2)
  24. Statistics
  25. ----------------------------------------------------------
  26. 0 recursive calls
  27. 3221 db block gets
  28. 6320 consistent gets
  29. 35 physical reads
  30. 3649980 redo size
  31. 845 bytes sent via SQL*Net to client
  32. 866 bytes received via SQL*Net from client
  33. 3 SQL*Net roundtrips to/from client
  34. 1 sorts (memory)
  35. 0 sorts (disk)
  36. 100000 rows processed

替换1: 通过merge into改写语句

  1. merge into B b
  2. using TMP_B tb
  3. on (tb.userid = b.userid)
  4. WHEN NOT MATCHED THEN
  5. INSERT (b.id, b.userid, b.info) values (tb.id, tb.userid, tb.info);

通过执行计划;该SQL会占用内存用于排序。效率会有大大的提升。 特别是在B数据量很大的情况。优势更明显。

  1. Elapsed: 00:00:00.18
  2. Execution Plan
  3. ----------------------------------------------------------
  4. Plan hash value: 2722554344
  5. --------------------------------------------------------------------------------------
  6.  
  7. | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time|
  8. --------------------------------------------------------------------------------------
  9. | 0 | MERGE STATEMENT | | 115K| 7203K| | 5298 (1)| 00:01:04 |
  10. | 1 | MERGE | B | | | | ||
  11. | 2 | VIEW | | | | | ||
  12. |* 3 | HASH JOIN OUTER | | 115K| 8553K| 4960K| 5298 (1)| 00:01:04 |
  13. | 4 | TABLE ACCESS FULL| TMP_B | 115K| 3601K| | 137 (1)| 00:00:02 |
  14. | 5 | TABLE ACCESS FULL| B | 1336K| 56M| | 1373 (1)| 00:00:17 |
  15. --------------------------------------------------------------------------------------
  16. Predicate Information (identified by operation id):
  17. ---------------------------------------------------
  18. 3 - access("TB"."USERID"="B"."USERID"(+))
  19. Note
  20. -----
  21. - dynamic sampling used for this statement (level=2)
  22. Statistics
  23. ----------------------------------------------------------
  24. 0 recursive calls
  25. 0 db block gets
  26. 5496 consistent gets
  27. 0 physical reads
  28. 0 redo size
  29. 844 bytes sent via SQL*Net to client
  30. 896 bytes received via SQL*Net from client
  31. 3 SQL*Net roundtrips to/from client
  32. 1 sorts (memory)
  33. 0 sorts (disk)
  34. 0 rows processed

替换2: 通过kettle工具,设置转换;通过shell命令;用cron或者其他调度系统调用。该操作相对用户是透明的;至于效率方面有待验证。

最后该需求若在PostgreSQL中;有更简洁的的写法.简单粗暴。

  1. INSERT INTO B
  2. SELECT *
  3. FROM A on conflict (userid) do nothing;

 

原文链接:http://www.cnblogs.com/lottu/p/11578835.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号