elasticsearch

엘라스틱 서치는 아파치 루신(lucene)을 기반으로한 오픈소스 검색엔진이다.

배경

엘라스틱서치는 Shay Banon이 2010년 처음 발표한 이래 지금까지 아파치2 라이선스 기반의 오픈소스로서 커뮤니티 중심으로 개발이 이루어지고 있다. 2014년 2월 12일에는 드디어 1.0.0 버전이 릴리즈됐다.

현재는 ELK스택이라 불리우는 elasticsearch, logstash, kibana를 결합하여 많이 사용되고 있다.

특징

참고 : https://www.elastic.co/products/elasticsearch

High Availability

elasticsearch는 replication설정이 가능한데 이 복사 본들을 분산시켜두어 특정 노드의 장애 상황에서도 서비스가 가능하다.

Distributed

single 머신에서 뿐만아니라 필요에 따라 node를 추가하는 것(scale out)이 쉽도록 설계되어있다.

Document-Oriented

json으로 표현되는 document 기반의 저장구조로 되어 있으며 복잡한 실세계를 잘 표현할 수 있는 저장구조를 가진다.

RESTful API

Http를 사용한 RESTful API를 제공한다. 이 API를 통해서 원하는 action을 모두 요청가능하다.

Real-Time

실시간으로 indexing되며 검색 가능하고 분석 및 통계 기능을 제공한다.

설치

참고 : https://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html

필요 환경

Java 7 이상

Oracle JDK version 1.8.0_25 추천

다운로드

curl -L -O https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.4.tar.gz

압축 해제

tar -xvf elasticsearch-1.7.4.tar.gz

디렉토리 이동

cd elasticsearch-1.7.4/bin

실행

./elasticsearch

실행 결과

./elasticsearch [2014-03-13 13:42:17,218][INFO ][node ] [New Goblin] version[1.7.4], pid[2085], build[5c03844/2014-02-25T15:52:53Z] [2014-03-13 13:42:17,219][INFO ][node ] [New Goblin] initializing ... [2014-03-13 13:42:17,223][INFO ][plugins ] [New Goblin] loaded [], sites [] [2014-03-13 13:42:19,831][INFO ][node ] [New Goblin] initialized [2014-03-13 13:42:19,832][INFO ][node ] [New Goblin] starting ... [2014-03-13 13:42:19,958][INFO ][transport ] [New Goblin] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.8.112:9300]} [2014-03-13 13:42:23,030][INFO ][cluster.service] [New Goblin] new_master [New Goblin][rWMtGj3dQouz2r6ZFL9v4g][mwubuntu1][inet[/192.168.8.112:9300]], reason: zen-disco-join (elected_as_master) [2014-03-13 13:42:23,100][INFO ][discovery ] [New Goblin] elasticsearch/rWMtGj3dQouz2r6ZFL9v4g [2014-03-13 13:42:23,125][INFO ][http ] [New Goblin] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.8.112:9200]} [2014-03-13 13:42:23,629][INFO ][gateway ] [New Goblin] recovered [1] indices into cluster_state [2014-03-13 13:42:23,630][INFO ][node ] [New Goblin] started

구조

Cluster

Cluster는 node들로 구성되어있다.

node

node는 실제 elasticserche의 instance로 볼 수 있다. 즉, 하나의 server로 host 장비와는 다르다.

하나의 host에 여러개의 node를 띄울 수 있다.

index

비슷한 특징을 가지는 문서들을 저장하고 있는 문서들의 collection이다.

type

index 안에 추가로 type이라는 것을 지정할 수 있다.

보통 type에 대한 차이는 docment를 구성하는 field들의 차이로 볼 수 있다.

예를 들면 blog서비르를 구성하는 경우, user에 대한 type, blog data에 대한 type, comment에 대한 type을 따로 둘 수 있다.

document

indexing되는 기본 데이터 타입이다.

document는 JSON형식으로 표현된다.

shard & replicas

하나의 index파일만으로는 1TB정도의 문서들을 indexing한다고 했을 때, disk 공간이나 request를 만족시켜주는 성능을 내기에는 어려운점이 많다.

따라서 elasticsearch에서는 index를 shard라는 여러개의 단위로 분할하여 저장한다.

index를 생성할때 쉽게 shard의 개수를 지정할 수 있다. 각 shard는 fully-functional한 index의 기능을 제공한다.

sharding은 다음 2개의 이유로 중요하다.

sharding은 content volume에 대해 horizontally spli / scale을 할 수 있도록 해준다.
shard마다 분산(distribute) 및 병렬(parallelize) operation을 가능하게 해주어 결과적으로 performance와 throughput을 향상시킨다.

network나 cloud환경에서는 언제난 failure가 예상되므로, 이를 극복하기 위해 elasticsearch에서는 replica를 제공한다.

출처 : https://www.elastic.co/guide/en/elasticsearch/guide/current/_add_failover.html

위의 그림에서보면 하나의 cluster에 2개의 node가 있고 하나의 node1에 primary shard들이 node2에 replica shard들이 저장되어있는것을 볼 수 있다.

Node를 하나늘리는 경우 shard가 reallocted되며 node가 추가될 때마다 이렇게 data가 Rebalancing된다.

replica를 2개로 늘리면 알아서 replica shard의 개수가 늘어난다.

관계 DB와 연관지어 살펴보면....

출처 : http://www.slideshare.net/kjmorc/ss-49009522

저장구조

출처 : http://www.slideshare.net/kjmorc/ss-49009522

사용

index 생성

curl -XPUT 'localhost:9200/customer?pretty' { "acknowledged" : true } curl 'localhost:9200/_cat/indices?v' health index pri rep docs.count docs.deleted store.size pri.store.size yellow customer 5 1 0 0 495b 495b

index

curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' { "name": "John Doe" }' { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 1, "created" : true }

query document

curl -XGET 'localhost:9200/customer/external/1?pretty' { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "name": "John Doe" } }

API

Document API

문서 개별 단위의 CRUD를 위해 사용.

This section describes the following CRUD APIs:
Single document APIs
Index API
Get API
Delete API
Update API
Multi-document APIs
Multi Get API
Bulk API
Bulk UDP API
Delete By Query API

Search API

문서를 찾거나 aggregation등을 하기 위한 API

Search APIs
Search
URI Search
Request Body Search
Search Template
Search Shards API
Aggregations
Facets
Suggesters
Multi Search API
Count API
Search Exists API
Validate API
Explain API
Percolator
More Like This API
Field stats API

저작자표시 비영리 동일조건

여긴지구