elasticsearch Bulk insert(put) 대량 데이터 입력하기

공식 가이드 API : http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html


1) 데이터 파일 생성

$ vi put_data
{ "index":{ "_index" : "logstash-2015.03.06", "_type" : "access_log", "_id" : "1" } }
{ "host":"web01","http_host": "192.168.1.123"}
{ "index":{ "_index" : "logstash-2015.03.06", "_type" : "access_log", "_id" : "2" } }
{ "host":"web01","http_host": "192.168.1.124"}


2) Curl Post 로 전송

$ curl -s -XPOST localhost:9200/_bulk --data-binary @put_data
{"took":317,"errors":false,"items":[{"index":{"_index":"logstash-2015.03.06","_type":"access_log","_id":"1","_version":1,"status":201}},{"index":{"_index":"logstash-2015.03.06","_type":"access_log","_id":"2","_version":1,"status":201}}]}


3) 데이터 확인

$ curl -XGET localhost:9200/logstash-2015.03.06/access_log/1
{"_index":"logstash-2015.03.06","_type":"access_log","_id":"1","_version":1,"found":true,"_source":{ "host":"web01","http_host": "192.168.1.123"}}

$ curl -XGET localhost:9200/logstash-2015.03.06/access_log/2
{"_index":"logstash-2015.03.06","_type":"access_log","_id":"2","_version":1,"found":true,"_source":{ "host":"web01","http_host": "192.168.1.124"}}


4) index와 type이 동일하다면 url쓰고 제외해도됨

$ vi put_data 
{ "index":{ "_id" : "1" } }
{ "host":"web01","http_host": "192.168.1.123"}
{ "index":{ "_id" : "2" } }
{ "host":"web01","http_host": "192.168.1.124"}

$ curl -s -XPOST localhost:9200/logstash-2015.03.06/access_log/_bulk --data-binary @put_data
{"took":321,"errors":false,"items":[{"index":{"_index":"logstash-2015.03.06","_type":"access_log","_id":"1","_version":1,"status":201}},{"index":{"_index":"logstash-2015.03.06","_type":"access_log","_id":"2","_version":1,"status":201}}]}


※ UDP 방식도 있는 듯한데 2.0에서 삭제될 예정이라고함

Bulk UDP has been deprecated and will be removed in Elasticsearch 2.0. You should use the standard bulk API instead.


Curl 참조해서 다른 스크립트 언어에서도 POST 로 방식으로 짜면될듯함

node.js의 경우 라이브러리 활용하면 될 듯 : https://github.com/phillro/node-elasticsearch-client

//Bulk index example
var commands = []
commands.push({ "index" : { "_index" :'my_index_name', "_type" : "my_type_name"} })
commands.push({field1:'value',field2:'value'})


    elasticSearchClient.bulk(commands, {})
            .on('data', function(data) {})
            .on('done', function(done){})
            .on('error', function(error){})
            .exec();


Posted by 시니^^