Set up autocomplete searchbox with elasticsearch

February 12, 2014

Let’s build a fancy autocompleting search widget! You’ve probably seen it in lots of places. Most recognize it from google search and texting on smart phones. Type a few characters and a drop down appears with predictive results. I wanted to develop a similar search widget for goodybag.com so you can search for restaurants. Here’s a guide on how to do so with elasticsearch.

The stack I’m using is going to be elasticsearch, postgres and express js on the back end. We’ll also use twitter’s typeahead plugin on the front.

Prequisites:

Familiarity with
- expressjs
- PostgreSQL (and hopefully node-postgres)
Tons of patience setting up elasticsearch and some plugins

Elasticsearch Basics

So let’s first set up elasticsearch. Grab the latest from http://www.elasticsearch.org/overview/elkdownloads/ and extract your archived package.

Inside of the new elasticsearch folder, run

bin/elasticsearch

To start the elasticsearch server. The beauty of elasticsearch is its api is powered through http so it’s easy to get through the official documentation examples. There’s tons of great cURL examples you can copy and paste.

The default port is 9200, so you’ll be able to run all of your commands against http://localhost:9200/

Learn to walk before running

..and crawl before walking.

So to search with elasticsearch we gotta start indexing data. Let’s first take a look at the anatomy of a typical index.

http://localhost:9200/cater/restaurants/5

Here, cater is the name of an index. Indices are kind of analogous to relational databases. Next, restaurants is a type. Types are like tables. The identifier at the end is going to specify a document which would be a row.

So we could imagine other routes set up in a RESTful manner like the following:

/cater/restaurants
/cater/restaurants/5
/cater/users/
/cater/users/10

Now with that in mind, I suggest reading through the documentation for document, search and indice apis. There are A LOT of options and configurations per api. The examples in the rest of this post will be simple.

Here’s an example ripped straight from the docs for indexing a new document. In this case, its a new tweet.

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}'

Notice the index is twitter, and the type is tweet. This is a HTTP PUT request to create a new document with an id 1 and will return

{
    "_index" : "twitter",
    "_type" : "tweet",
    "_id" : "1",
    "_version" : 1,
    "created" : true
}

Set up

All right, so how can we integrate this into our own application?

A rather simple implentation would be to

Create a script to query all existing tweets and index them
Index new tweets as they are created

An alternative is to use “rivers” in elasticsearch. Rivers are plugins to connect to your data store and automatically index streaming data. So it will 1) index existing data 2) index new updates 3) support failover.

For postgres, you will need to install

JDBC River plugin - https://github.com/jprante/elasticsearch-river-jdbc
Postgres JDBC Driver - http://jdbc.postgresql.org/download.html

Make sure to get the plugin compatible with your version of elasticsearch. Same with the driver and postgres.

Here’s a node script to set up a river for index cater and type restaurant. The document contains several fields, but let’s just index id and name. Note that we’re using request for firing http requests. Make sure ./bin/elasticsearch is running and run this.

var request = require('request');
var config = require('../../config');

var options = {
  uri: 'http://localhost:9200/_river/cater/_meta'
, method: 'PUT'
, timeout: 7000
, json: {
    type: 'jdbc'
  , jdbc: {
      url: 'jdbc:postgresql://localhost:9200/cater'
    , sql: 'select id as _id, name from restaurants'
    , index: 'cater'
    , type: 'restaurant'
    , schedule: '0 0/5 * * * ?' // every 5 mins
    }
  }
};

request(options, function(err, res, body) {
  if (err) {
    return console.log(err);
  }
  console.log('Created /_river/cater');
  console.log('Response:', body);
});

It will use the sql query for indexing data into /cater. The magic sauce is in the schedule cron string which triggers updates every 5 minutes!

A little gotcha I ran into was that if you don’t set _id, then elasticsearch will index a new document instead of updating an existing one. Make sure you select _id to avoid duplicates!!

TODO: writing up the front end side of this