Python Elasticsearch Getting Started Guide

Since its release in 2010, Elasticsearch has become the most popular search engine. It is open-source, scalable and proves enterprise-grade search to power your most demanding applications completely free. It has been widely used in log analytics, full-text search, business analytics, and many other applications. Thanks to its extensible rest based API's it's easy to get started. This Python Elasticsearch guide will show you how to quickly get started.

Installing Elasticsearch

This guide assumes you have elasticsearch setup in your machine or have a readily accessible elasticsearch cluster. If you don’t, it's quite easy to get started. Click here to download and follow the instructions on setting it up.

Once you have elasticsearch installed, to follow along with this tutorial you will need the URL of your cluster. I installed elasticsearch locally for development purposes and is accessible through the default URL: http://localhost:9200/

Python Elasticsearch Tutorial

Python Elasticsearch Client

To interact with elasticsearch, we will be using the official python client called elasticsearch-py and you can install it as follows.

python -m pip install elasticsearch

This package is a low-level client providing you more flexibility and control than a higher level API. 

Elasticsearch Sample Data

With our cluster setup, it's now time to generate some data. For this article, we will be using the python requests module to obtain data of the popular Big Bang Theory show. The freely available JSON data set contains episode level information of the show including descriptions, names and other information which is easily searchable using elasticsearch.

The JSON dataset is located here. Issue a python requests GET to the URL as follows to obtain the JSON data.

import requests
r = requests.get('http://api.tvmaze.com/singlesearch/shows?q=big-bang-theory&embed=episodes')

 If you are not familiar with the python requests module, r.content now holds the data in a bytes data structure. Use the python JSON module to parse it into a JSON object. Lastly, let’s take a look at the information we have available.

#parse to json object
import json 
jd = json.loads(r.content)
print(jd["_embedded"]["episodes"][0])

 ElasticSearch Python Tutorial

Next, we will create the dataset to be indexed with elasticsearch. In elasticsearch, you index a JSON formatted document. In our case, the document will contain the following fields:

  • id
  • season
  • episode
  • name
  • summary

Run the following code to convert the dataset into a list of dictionary objects in which each object represents a document to be indexed. Notice the summary field contains html markup. We will be using the python re package to remove the html markup and keep only the words. 

import re
ldocs = []
for jo in jd["_embedded"]["episodes"][0:200]:
    d = {}
    d['id'] = jo['id']
    d['season'] = jo['season']
    d['episode'] = jo['number']
    d['name'] = jo['name']
    d["summary"] = re.sub('<[^<]+?>', '', jo['summary'])
    ldocs.append(d)

The list ldocs now contains 200 documents in the form of dictionary objects we will be indexing and searching with elasticsearch. Let's continue with this elasticsearch python tutorial and get onto actually using elasticsearch.

Connect to ElasticSearch

First of all, connect to your elasticsearch cluster. As mentioned previously, mine is on my local machine and is accessible using the URL: http://localhost:9200/

Use the following code to connect the elasticsearch client to your cluster.

from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

Add Data to Index

If you are familiar with databases, each database holds tables and each table holds records. In that sense, to better understand elasticsearch you need to understand indexes, types and documents

An index can be thought of like a database in a relational system. Similarly, a type can be thought of as a table holding multiple documents in which each document is like a row in a relational table.

With that in mind, let’s start to add documents to our elasticsearch index using python. We will be adding our previously created object list to the tvshows index with type bigbang.

import json 
#iterate through documents indexing them
for doc in ldocs:
    es.index(index='tvshows', doc_type='bigbang', id=doc["id"], body=json.dumps(doc))

 When the code above runs, you will have data in your elasticsearch cluster. To view the indexes that are in your cluster use the following URL: http://localhost:9200/_cat/indices?v

Elasticsearch Python

 

Get Types in Index

To view the types that your index has, you can get the mappings for a specific index with a URL such as the following in which we issue a GET to Elasticsearch to provide the types for the tvshows index: http://localhost:9200/tvshows/_mapping

Mappings Returned:

{"tvshows":{"mappings":{"bigbang":{"properties":{"episode":{"type":"long"},"id":{"type":"long"},"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"season":{"type":"long"},"summary":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}}}}

You can see the types, in our case only bigbang, and the fields and datatypes the bigbang type contains.

Get By ID

Elasticsearch is a powerful search engine. Before we get onto the searching capabilities, let us see how to get a specific document from our cluster if we know the id.

In the following code, we will be using the python client to get a document by id.

#python elasticsearch get by id
es.get(index='tvshows', doc_type='bigbang', id=2915)

Python elasticsearch get by id

With that simple line, we were able to quickly get the document representing The Fuzzy Boots Corollary episode. 

Searching an Elasticsearch Cluster

We will be barely touching the surface of the searching capabilities of elasticsearch. There is much more to learn with this powerful tool.

Let’s do a quick term search. A term search will search for the exact string we provide the elasticsearch query API.

#term search
es.search(index="tvshows", doc_type='bigbang', body={"query": {"match": {'summary':'rivalry'}}})

Returns the following search result. 

 Python Elasticsearch Search API

 

Fuzzy Search

Many times, you don’t want to search for a specific term as words can be misspelled. In that case, you might want to use a sort of fuzzy search. This can take many different meanings as to what is happening behind the scenes, will keep it simple.

Let’s perform a fuzzy search to our elasticsearch cluster looking for rival instead of rivalry. 

es.search(index="tvshows", doc_type='bigbang', body={"query": {"fuzzy": {'summary':'rival'}}})

After you run this code, you will notice 2 results, one document in which rival was found, and another document in which rivalry was matched. We performed a fuzzy search of the word rival in the summary of our documents.

Delete an Index

If you want to delete an index using the python eleasticsearch package you can do as follows. In the following code snippet, we will delete our tvshows index from our elasticsearch cluster.

#delete index
es.indices.delete(index='bigbang', ignore=[400, 404])

Conclusion

That completes our getting started with elasticsearch quick guide. There is still much more to learn of elasticsearch but you are now able to index and search documents in various ways using the python elasticsearch package. Hope you enjoyed this article and stay tuned for more.