码迷,mamicode.com
首页 > 其他好文 > 详细

elasticsearch data importing

时间:2015-11-18 16:28:16      阅读:148      评论:0      收藏:0      [点我收藏+]

标签:

ElasticSearch stores each piece of data in a document.

That‘s what I need.

 

Using the bulk API.

Transform the raw data file from data.json to be new_data.json .

And then do this to import data to ElasticSearch :

curl -s -XPOST localhost:9200/_bulk --data-binary @new_data.json

 

 

For example, I now have a raw JSON data file as following:

 

 The file   data.json

{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
{"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
{"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}

Then I need to import these data to elasticsearch. So I have to manipulate this file by naming its index and type.

A new file will be created  new_data.json

{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}


There are information above each of the data line in the file new_data.json

 

And if the JSON data file contains data those are not in the same _index or _type, just change the {"index":{"_********   line

 

Here is an example of a valid JSON file for elasticsearch.

full_data.json

 

{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"value1","key2":"value2","key3":"value3"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"abcde","key2":"efg","key3":"klm"}
{"index":{"_index":"myindex2","_type":"mytype2"}}
{"newkey":"newvalue"}


Notice that : There are 2 indexes in the file above. They are   myindex1  and  myindex2

 

 

And the data schema in index myindex2 is different from that in index myindex1 .

That‘s why it‘s so important to have so many lines of {"index":{"_********    in the new data file.

 

-----

Now I am coding a python scripe to manipulate with some raw JSON data files.

Let‘s assume each line of the JSON data file are in the same schema. And I will do this to generate the schema out.

example_raw_data.json

 

import sys

def get_schema():
    """
    """
    return None


if __name__ == "__main__":
    print(get_schema)

 

elasticsearch data importing

标签:

原文地址:http://www.cnblogs.com/spaceship9/p/4974607.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!