第17篇-使用Python的初学者Elasticsearch教程

时间：2020-06-26 01:17:50 阅读：147 评论：0 收藏：0 [点我收藏+]

我的Elasticsearch系列文章，逐渐更新中，欢迎关注
0A.关于Elasticsearch及实例应用
00.Solr与ElasticSearch对比
01.ElasticSearch能做什么？
02.Elastic Stack功能介绍
03.如何安装与设置Elasticsearch API
04.如果通过elasticsearch的head插件建立索引_CRUD操作
05.Elasticsearch多个实例和head plugin使用介绍

06.当Elasticsearch进行文档索引时，它是怎样工作的？

07.Elasticsearch中的映射方式—简洁版教程

08.Elasticsearch中的分析和分析器应用方式

09. Elasticsearch中构建自定义分析器

10.Kibana科普-作为Elasticsearhc开发工具
11.Elasticsearch查询方法

12.Elasticsearch全文查询

13.Elasticsearch查询-术语级查询

14.Python中的Elasticsearch入门

15.使用Django进行ElasticSearch的简单方法

17.使用Python的初学者Elasticsearch教程

另外Elasticsearch入门，我强烈推荐[ElasticSearch搭建手册](https://kalasearch.cn/blog/elasticsearch-tutorial/#ElasticSearch%E6%90%AD%E5%BB%BA%E6%8C%87%E5%8D%97)给你，非常想尽的入门指南手册。

Elasticsearch是一个实时的分布式搜索和分析引擎。它使您能够以前所未有的速度和规模探索数据。它用于全文搜索，结构化搜索，分析以及所有这三种方法的组合。弹性搜索是基于Apache Lucecne（一个全文本搜索引擎库）构建的开源搜索引擎。
安装并运行Elasticsearch：
安装Elasticsearch的唯一要求是Java的最新版本。要安装Elasticsearch，请从elastic.co/downlaods/elasticsearch下载并提取存档文件，然后只需运行bin \ elasticsearch.bat。

索引就像传统数据库中的数据库。它是存储相关文档的地方。要检索任何文档，我们需要三条信息
1. 索引—数据库
2. 数据类型-文档类型
3. ID-文件ID
让我们开始表演吧……

# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es=Elasticsearch([{‘host‘:‘localhost‘,‘port‘:9200}])
es<Elasticsearch([{‘host‘: ‘localhost‘, ‘port‘: 9200}])>

Elasticsearch是面向文档的，这意味着它可以存储整个对象或文档。它不仅存储它们，而且索引每个文档的内容以使其可搜索。在Elasticsearch中，您可以对文档进行索引，搜索，排序和过滤。
Elasticsearch使用JSON作为文档的序列化格式。
现在让我们开始索引员工文档。
在Elasticsearch中存储数据的行为称为索引编制。
Elasticsearch集群可以包含多个索引，而索引又包含多个类型。这些类型包含多个文档，并且每个文档都有多个字段。
```
e1={
"first_name":"nitin",
"last_name":"panwar",
"age": 27,
"about": "Love to play cricket",
"interests": [‘sports‘,‘music‘],
}print e1{‘interests‘: [‘sports‘, ‘music‘], ‘about‘: ‘Love to play cricket‘, ‘first_name‘: ‘nitin‘, ‘last_name‘: ‘panwar‘, ‘age‘: 27}
```

插入文件：
＃现在让我们将此文档存储在Elasticsearch
res = es.index（index =‘megacorp‘，doc_type =‘employee‘，id = 1，body = e1）中简单！无需先执行任何管理任务，例如创建索引或指定每个字段包含的数据类型。我们可以直接为文档建立索引。Elasticsearch附带所有内容的默认值，因此使用默认值在后台处理了所有必要的管理任务。
检索文档：
在Elasticsearch中这很容易。我们只需执行一个HTTP GET请求并指定文档的地址-索引，类型和ID。使用这三段信息，我们可以返回原始JSON文档。

```
# Let‘s insert some more documents
e2={
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
e3={
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}res=es.index(index=‘megacorp‘,doc_type=‘employee‘,id=2,body=e2)
print res[‘created‘]
res=es.index(index=‘megacorp‘,doc_type=‘employee‘,id=3,body=e3)
print res[‘created‘]False
True
```
删除文件：
```
res=es.get(index=‘megacorp‘,doc_type=‘employee‘,id=3)
print res{u‘_type‘: u‘employee‘, u‘_source‘: {u‘interests‘: [u‘forestry‘], u‘age‘: 35, u‘about‘: u‘I like to build cabinets‘, u‘last_name‘: u‘Fir‘, u‘first_name‘: u‘Douglas‘}, u‘_index‘: u‘megacorp‘, u‘_version‘: 1, u‘found‘: True, u‘_id‘: u‘3‘}
```
搜索精简版：
GET非常简单-取回所需的文档。让我们尝试一些更高级的操作，例如简单的搜索！
```
res= es.search(index=‘megacorp‘,body={‘query‘:{}})
print res[‘hits‘][‘hits‘][{u‘_score‘: 1.0, u‘_type‘: u‘employee‘, u‘_id‘: u‘4‘, u‘_source‘: {u‘interests‘: [u‘sports‘, u‘music‘], u‘age‘: 27, u‘about‘: u‘Love to play football‘, u‘last_name‘: u‘pafdfd‘, u‘first_name‘: u‘asd‘}, u‘_index‘: u‘megacorp‘}, {u‘_score‘: 1.0, u‘_type‘: u‘employee‘, u‘_id‘: u‘2‘, u‘_source‘: {u‘interests‘: [u‘music‘], u‘age‘: 32, u‘about‘: u‘I like to collect rock albums‘, u‘last_name‘: u‘Smith‘, u‘first_name‘: u‘Jane‘}, u‘_index‘: u‘megacorp‘}, {u‘_score‘: 1.0, u‘_type‘: u‘employee‘, u‘_id‘: u‘1‘, u‘_source‘: {u‘interests‘: [u‘sports‘, u‘music‘], u‘age‘: 27, u‘about‘: u‘Love to play cricket‘, u‘last_name‘: u‘panwar‘, u‘first_name‘: u‘nitin‘}, u‘_index‘: u‘megacorp‘}]
```
现在，让我们搜索姓氏为nitin的用户名。
匹配运算符：
```
res= es.search(index=‘megacorp‘,body={‘query‘:{‘match‘:{‘first_name‘:‘nitin‘}}})
print res[‘hits‘][‘hits‘][{u‘_score‘: 0.2876821, u‘_type‘: u‘employee‘, u‘_id‘: u‘1‘, u‘_source‘: {u‘interests‘: [u‘sports‘, u‘music‘], u‘age‘: 27, u‘about‘: u‘Love to play cricket‘, u‘last_name‘: u‘panwar‘, u‘first_name‘: u‘nitin‘}, u‘_index‘: u‘megacorp‘}]
布尔运算符：
bool使用字典，其中至少包含must，should和must_not中的一个，每个字典都包含匹配列表或其他进一步的搜索运算符。
res= es.search(index=‘megacorp‘,body={
‘query‘:{
‘bool‘:{
‘must‘:[{
‘match‘:{
‘first_name‘:‘nitin‘
}
}]
}
}
})print res[‘hits‘][‘hits‘][{u‘_score‘: 0.2876821, u‘_type‘: u‘employee‘, u‘_id‘: u‘1‘, u‘_source‘: {u‘interests‘: [u‘sports‘, u‘music‘], u‘age‘: 27, u‘about‘: u‘Love to play cricket‘, u‘last_name‘: u‘panwar‘, u‘first_name‘: u‘nitin‘}, u‘_index‘: u‘megacorp‘}]
```
过滤运算符：
让我们使搜索更加复杂。我们仍然希望找到所有姓氏为nitin的员工，但我们只希望年龄在30岁以上的员工。我们的查询将略有变化以适应过滤器，这使我们可以高效地执行结构化搜索：
```
res= es.search(index=‘megacorp‘,body={
‘query‘:{
‘bool‘:{
‘must‘:{
‘match‘:{
‘first_name‘:‘nitin‘
}
},
"filter":{
"range":{
"age":{
"gt":25
}
}
}
}
}
})print res[‘hits‘][‘hits‘][{u‘_score‘: 0.2876821, u‘_type‘: u‘employee‘, u‘_id‘: u‘1‘, u‘_source‘: {u‘interests‘: [u‘sports‘, u‘music‘], u‘age‘: 27, u‘about‘: u‘Love to play cricket‘, u‘last_name‘: u‘panwar‘, u‘first_name‘: u‘nitin‘}, u‘_index‘: u‘megacorp‘}]res= es.search(index=‘megacorp‘,body={
‘query‘:{
‘bool‘:{
‘must‘:{
‘match‘:{
‘first_name‘:‘nitin‘
}
},
"filter":{
"range":{
"age":{
"gt":27
}
}
}
}
}
})print res[‘hits‘][‘hits‘][]
```
全文搜索
到目前为止，搜索非常简单。让我们尝试更高级的全文搜索。在开始下一种搜索之前，让我再插入一个文档。
```
res= es.search(index=‘megacorp‘,body={
‘query‘:{
‘bool‘:{
‘must‘:{
‘match‘:{
‘first_name‘:‘nitin‘
}
},
"filter":{
"range":{
"age":{
"gt":25
}
}
}
}
}
})print res[‘hits‘][‘hits‘][{u‘_score‘: 0.2876821, u‘_type‘: u‘employee‘, u‘_id‘: u‘1‘, u‘_source‘: {u‘interests‘: [u‘sports‘, u‘music‘], u‘age‘: 27, u‘about‘: u‘Love to play cricket‘, u‘last_name‘: u‘panwar‘, u‘first_name‘: u‘nitin‘}, u‘_index‘: u‘megacorp‘}]res= es.search(index=‘megacorp‘,body={
‘query‘:{
‘bool‘:{
‘must‘:{
‘match‘:{
‘first_name‘:‘nitin‘
}
},
"filter":{
"range":{
"age":{
"gt":27
}
}
}
}
}
})print res[‘hits‘][‘hits‘][]
```
在上面的示例中，它返回两个记录，但得分不同。
词组搜索
在一个字段中查找单个单词很好，但是有时候您想要匹配短语中单词的确切顺序。
```
res= es.search(index=‘megacorp‘,doc_type=‘employee‘,body={
‘query‘:{
‘match_phrase‘:{
"about":"play cricket"
}
}
})
for hit in res[‘hits‘][‘hits‘]:
print hit[‘_source‘][‘about‘]
print hit[‘_score‘]
print ‘**********************‘Love to play cricket
0.5753642
**********************
```
集合体
Elasticsearch具有称为聚合的功能，该功能使您可以对数据进行复杂的分析。它与SQL中的“分组依据”相似，但功能更强大。
```
res= es.search(index=‘megacorp‘,doc_type=‘employee‘,body={
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
})
```

第17篇-使用Python的初学者Elasticsearch教程

标签：分布做什么文件 head 新版 href htm 推荐运算符

原文地址：https://www.cnblogs.com/Elasticsearchalgolia/p/13193448.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行