BE CAREFUL WHILE QUERYING INNER OBJECTS IN ELASTICSEARCH

Content posted here with the permission of the author Anuj Verma, who is currently employed at Josh Software. Original post available here.

In elasticsearch we can store closely related entities within a single document. For example, we can store a blog post and all of its comments together, by passing an array of comments.

{
  "title": "Invest Money",
  "body": "Please start investing money as soon...",
  "tags": ["money", "invest"],
  "published_on": "18 Oct 2017",
  "comments": [
    {
      "name": "William",
      "age": 34,
      "rating": 8,
      "comment": "Nice article..",
      "commented_on": "30 Nov 2017"
    },
    {
      "name": "John",
      "age": 38,
      "rating": 9,
      "comment": "I started investing after reading this.",
      "commented_on": "25 Nov 2017"
    },
    {
      "name": "Smith",
      "age": 33,
      "rating": 7,
      "comment": "Very good post",
      "commented_on": "20 Nov 2017"
    }
  ]
}

So we have an elasticsearch document describing a post and an inner object comments containing all the comments on a post. But inner objects in elasticsearch do not work as we expect. How ? We will see it soon.

Problem

Now suppose we want to find all blog posts on which user {name: john, age: 34} has commented. So lets again look at our sample document above and find the users who had commented.

name age
William 34
John 38
Smith 33

From the list we can clearly see that there is no user John of 34 years age. For simplicity consider we have only 1 document in elasticsearch index. Lets verify the same by querying the index:

curl -XGET 'localhost:9200/blog/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "comments.name": "John" }},
        { "match": { "comments.age":  34 }}
      ]
    }
  }
}

Our sample document is returned in response. Surprised ?. Now that is why I said:

inner objects in elasticsearch do not work as expected

The problem here is that the library used by elasticsearch(lucene) has no concept of inner objects, so as a result inner objects are flattened into a simple list of field name and values. Our document is internally stored as:

{
  "title":                    [ invest, money ],
  "body":                     [ as, investing, money, please, soon, start ],
  "tags":                     [ invest, money ],
  "published_on":             [ 18 Oct 2017 ]
  "comments.name":            [ smith, john, william ],
  "comments.comment":         [ after, article, good, i, investing, nice, post, reading, started, this, very ],
  "comments.age":             [ 33, 34, 38 ],
  "comments.rating":          [ 7, 8, 9 ],
  "comments.commented_on":    [ 20 Nov 2017, 25 Nov 2017, 30 Nov 2017 ]
}

As you can clearly see above that the relationship between comments.name and comments.age has been lost. So that is why our document matches a query for john and 34.

Solution

To solve this problem we just need to make a small change in mapping of elasticsearch. If you have a look at the mapping of index you will find that the type of comments field is object. We need to update it to type nested.

We can simply update the mapping of our index by running the below query:

curl -XPUT 'localhost:9200/blog' -d'
{
  "mappings": {
    "blog": {
      "properties": {
        "title": { "type": "string" },
        "body": { "type": "string" },
        "tags": { "type": "text" },
        "published_on": { "type": "text" },
        "comments": {
          "type": "nested",
          "properties": {
            "name":    { "type": "string"  },
            "comment": { "type": "string"  },
            "age":     { "type": "short"   },
            "rating":   { "type": "short"   },
            "commented_on":    { "type": "text"    }
          }
        }
      }
    }
  }
}

After changing the mapping to type nested, there is a slight change in the way we can query the index. We need to use nested query. Given below is the nested query example:

curl -XGET 'localhost:9200/blog/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "comments",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "comments.name": "john"
                    }
                  },
                  {
                    "match": {
                      "comments.age": 34
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

The above query will return no document in response as there is no match of user {name: john, age: 34}.

Surprised again ? Just a small change solved a problem in no time. It may be a smaller change from our side, but a lot has changed in the way elasticsearch stores our document. Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others.

Given below is the internal representation of sample document after changing mapping:

{
  {
    "comments.name":    [ john ],
    "comments.comment": [ after i investing started reading this ],
    "comments.age":     [ 38 ],
    "comments.rating":  [ 9 ],
    "comments.date":    [ 25 Nov 2017 ]
  },
  {
    "comments.name":    [ william ],
    "comments.comment": [ article, nice ],
    "comments.age":     [ 34 ],
    "comments.rating":   [ 8 ],
    "comments.date":    [ 30 Nov 2017 ]
  },
  {
    "comments.name":    [ smith ],
    "comments.comment": [ good, post, very],
    "comments.age":     [ 33 ],
    "comments.rating":   [ 7 ],
    "comments.date":    [ 20 Nov 2017 ]
  },
  {
    "title":            [ invest, money ],
    "body":             [ as, investing, money, please, soon, start ],
    "tags":             [ invest, money ],
    "published_on":     [ 18 Oct 2017 ]
  }
}

As you can see each inner object is stored as a separate hidden document internally. This maintains the relationship between their fields.

Conclusion:

So if you are using inner objects in index and querying them too, verify that the type of inner object is nested. Else the query may return invalid result documents.

Thanks for reading. Please like and share so that it can reach out to other valuable readers too.

Advertisements

About Sachin Shintre

Director, Josh Software
This entry was posted in General. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s