Know Your Elasticsearch!

Q)What is ElasticSearch?

Ans) Elasticsearch is a distributed, free and analytics engine for all types of data, including textual, numerical, geospatial<geo location>, structured, and unstructured.

Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic).

We can use REST APIs other than ample of tools for data ingestion, enrichment, storage, analysis, and visualization. Due to Rest API CRUD feature it is easy to integrate with coding Languages/Platforms like Java, Python or Spring Boot.

Q) Two Case study to use Elastic Search or Areas where we can use it ?

Ans) a)ELK Stack

Any application generates logs. We can monitor application with use of Logstash. It will store these logs into Elasticsaerch. To see data after insertion we can see it in Kibana. In kibana we can write queries to analyze data and analyze it in data or graphical form.

b) Searching text in Java Application

I have blog, I can insert it’s content in Elasticsearch using Java(Rest CRUD api).

Above that I can use Elasticsearch data(JPA) to search text in Elasticsearch and bring related results back as result of get api.

Q) Is Elasticsearch as a NoSQL Database ?

Ans) Yes

Q) Is Elasticsearch is build upon Lucene engine ?

Ans) Yes

Q) Terminologies of Elasticsearch ?

Ans) field, document, index and cluster .

Q) Map above Elasticsearch terminologies with RDBMS ?

Ans) Elasticsearch RDBMS

Cluster Database

index table

document row

field column

● Cluster: A cluster is a collection of one or more nodes that together holds the entire data. It provides federated indexing and search capabilities across all nodes and is identified by a unique name (by default it is ‘elasticsearch’).

● Node: A node is a single server which is a part of cluster, stores data and participates in the cluster’s indexing and search capabilities.

● Index: An index is a collection of documents with similar characteristics and is identified by a name. This name is used to refer to the index while performing indexing, search, update, and delete operations against the documents in it.

● Type: A type is a logical type of an index whose semantics is complet. It is defined for documents that have a set of common fields. you can define more than one type in your index.

● Document: A document is a basic unit of information which can be indexed. It is demonstrated in JSON which is a global internet data interchange format.

Documents also contain reserved fields that constitute the document metadata such as:

  1. _index – the index where the document resides
  2. _type – the type that the document represents
  3. _id – the unique identifier for the document

An example of a document:

{
   "_id": 3,
   “_type”: [“your index type”],
   “_index”: [“your index name”],
   "_source":{
   "age": 32,
   "name": ["arun”],
   "year":1989,
}
}

● Shards: Elasticsearch provides the ability to subdivide the index into multiple pieces called shards. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node within the cluster

● Replicas: Elasticsearch allows you to make one or more copies of your index’s shards which are called replica shards or replica.

Q)Why Elasticsearch is faster in searching than file search/RDBMS search?

Ans) Its all depend on how these system store data instead of retrieval.

Let me explain, If I have 1000 blogs and in three of them I have word ShRaam

Then, RDBMS/File system will go per blog/page and search for entire content in these pages and then bring three which has this matching term.

While, Elastisearch make use of inverted Index i.e. it will store words of that pages as keys of that pages.

ShRaam–> Page x, y and z

So when you search for keyword ShRaam, it will simply bring those three page where it is present instead of searching in page content at time of requirement.

Q) Name three companies using Elasticsearch?

Ans) Netflix

Walmart

Ebay

Author: Arun Singh

Learning is an Habit.