一日一技：Elasticsearch批量插入时，存在就不插入

时间：2020-12-07 12:46:15 阅读：11 评论：0 收藏：0 [点我收藏+]

一日一技：Elasticsearch批量插入时，存在就不插入

技术图片

摄影：产品经理
买单：kingname
当我们使用 Elasticsearch-py 批量插入数据到 ES 的时候，我们常常使用它的 helpers模块里面的bulk函数。其使用方法如下：


from elasticsearch import helpers, Elasticsearch

es = Elasticsearch(xxx)

def generator():
    datas = [1, 2, 3]
    for data in datas:
        yield {
            ‘_id‘: "xxx",
            ‘_source‘: {
                ‘age‘: data
            }
        }

helpers.bulk(es,
index=‘xxx‘,
generator(),
doc_type=‘doc‘,)

但这种方式有一个问题，它默认相当于upsert操作。如果_id 对应的文档已经在 ES 里面了，那么数据会被更新。如果_id 对应的文档不在 ES 中，那么就插入。

如果我想实现，不存在就插入，存在就跳过怎么办？此时就需要在文档里面添加_op_type指定操作类型为create:


from elasticsearch import helpers, Elasticsearch

es = Elasticsearch(xxx)

def generator():
    datas = [1, 2, 3]
    for data in datas:
        yield {
            ‘_op_type‘: ‘create‘,
            ‘_id‘: "xxx",
            ‘_source‘: {
                ‘age‘: data
            }
        }

helpers.bulk(es,
generator(),
index=‘xxx‘,
doc_type=‘doc‘)

此时，如果_id 对应的文档不在 ES 中，那么就会正常插入，如果ES里面已经有_id对应的数据了，那么就会报错。由于bulk一次性默认插入500条数据，假设其中有2条数据已经存在了，那么剩下的498条会被正常插入。然后程序报错退出，告诉你有两条写入失败，因为已经存在。

如果你不想让程序报错终止，那么可以增加2个参数：


helpers.bulk(es,
    generator(),
    index=‘xxx‘,
    doc_type=‘doc‘,
    raise_on_exception=False,               raise_on_error=False)

其中raise_on_exception=False表示在插入数据失败时，不需要抛出异常。raise_on_error=False表示不抛出BulkIndexError。

一日一技：Elasticsearch批量插入时，存在就不插入

标签：ups 表示 false 程序存在它的 help 操作 error

原文地址：https://blog.51cto.com/15023263/2558975

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行