How I enhanced Rasa’s accuracy through Duckling Extractor

Building a Chatbot??

Don’t just start, UNDERSTAND the domain and types of input expected from a user and build your pipeline. This can save you from writing lines of code to validate the extracted entities.

An entity is a term or a series of term in a structured or unstructured sentence which is unique and most informative value of a category. The categories, for instance, could be a person, date, organization etc., which can hold generic data for a group of similar things.
So, in order to react accurately to the user’s message, it’s vital to extract the meaningful entities for the bot.

In Rasa, when user enters a message, the default extractor forwards the utterances as it is, as a result of which for actual entities like date, price, distance etc., additional lines of code had to be written to extract exact numeric value from units, which user can input in any abbreviated form .

The entity extracted from ner-crf extractor:

{
		<br>"text": "20 lac",
		<br>"intent": "enter_data",
		<br>"entities": [
		<br>{
		<br>"start": 0,
		<br>"end": 7,
		<br>"value": "20 lac",
		<br>"entity": "price",
		<br>"extractor": "CRFEntityExtractor",
		<br>"confidence": 0.854,
		<br>"processors": []
		<br>
	}
	<br>
	]
	<br>
}

As a result, the price entity had to be validated manually to check if the unit is correct or not. In addition to this, unit was fetched and converted from 20lac to 2000000.

def validate_price(self, value, dispatcher, tracker, domain):
	        temp = re.compile("\s{0,}([0-9]+)\s{0,}([a-zA-Z|$|₹]+)")
	        res = temp.match(value).groups()
	        out = list(map(lambda x:x.lower(), res))
	        possible_units = ["k", "lac", "million", "inr", 'm' 'bucks', 'rupees', "dollars", "dollar", "$", "rs", "lakhs", "lakh","crore", "₹","cr","usd"]
	        result = list(map(lambda x: x in out, possible_units))
	        numbers = list(filter(lambda x: x.isdigit(), out))
	        dollar_check = list(map(lambda x: x.startswith('$') or x.endswith('k') or x.endswith('K') or x.endswith('l') or x.endswith('L'), out))
	        if any(result) or any(dollar_check):
	            return {"price": value}
	        elif tracker.latest_message['intent']['name'] == 'deny' or tracker.latest_message['intent']['name'] == 'deny_accept':
	            dispatcher.utter_message("Looks like you not interested to enter price, but this field is mandatory to estimate the car.")
	            return {"price": None}
	        else:
	            digit = False
	            for i in numbers:
	                if i.isdigit():
	                    digit = i
	                    break
	            if digit:
	                msg = "I didn't understand, please rephrase and add units like {} INR, {}k, {} dollors".format(digit, digit, digit)
	            else:
	                msg = "I didn't understand, please rephrase and add units like 200 INR, 200k, 2000 dollars"
	            msg = "I didn't understand, please rephrase and add units like 200 INR, 200k, 2000 dollars"
	            dispatcher.utter_message("{}".format(msg))
	            return {"price": None}

This is where the Duckling came into existence. It is an extractor that has predefined abbreviations for units and words, it calculates the proper value for us and returns. Duckling, instead of passing the raw user’s utterances, processes it and provides more accurate value.

def validate_price(self, value, dispatcher, tracker, domain):
	    if value:
	        for entity in tracker.latest_message['entities']:
	            if entity['entity'] == 'amount-of-money' :
	                val = entity['additional_info']['value']
	                return {"price":{"value":val,"text":entity['text']}}
	            else:
	                return {"price": None}

Running Duckling Service:

There are two ways by which one can start using Duckling Extractor
1) Using the Haskell Stack
2) Running docker image

It’s comparatively easy to run docker image for duckling. For pulling and running the image from dockerHub, use below commands

a) docker pull rasa/duckling
b) docker run -p 8000:8000 rasa/duckling

By default duckling is binded to port 8000. One can change the port if 8000 is binded to some other service.
For using the Haskell Stack , use the link https://github.com/facebook/duckling.

Adding Duckling Configuration in NLU:

In your config.yml add a new entry as mentioned below for the URL and port of Duckling server and the entities you want to work with.

  • name: “DucklingHTTPExtractor”
    url: “http://localhost:8000&#8221; [8000 is default]
    dimensions: [“time”, “distance”, “amount-of-money”]

The next step is to add the entities in your domain.yml file

entities:

  • time
  • distance

Fetching the extracted Data:

When the configuration is added, every time a user sends a message, Duckling will try to extract values and can be fetched in the entities in the tracker’s latest message. For example if user slot value is expecting a distance entity and user enters 6700km , duckling will extract the values as shown below

entities '[{
'start': 0,
'end': 7,
'text': '6700 km',
'value': 6700, 'confidence': 1.0,
'additional_info': {
'value': 6700,
'type': 'value',
'unit': 'kilometre'
},
'entity': 'distance',
'extractor': 'DucklingHTTPExtractor'}]'

It can be used in the code in the below mentioned way to fetch the exact value the extractor has extracted [Reference:Image2].

for entity in tracker.latest_message['entities']:
		if entity['entity'] == 'distance':
			value = entity['additional_info']['value']

Challenges with Duckling:

1) For duckling to extract the entities from the sentences, the spelling of words should be correct. For example if the spelling of tomorrow mentioned by user is tomrrow , the data extracted will not be correct.
2) If you are using amount-of-money entity , duckling might not be able to extract all the abbreviations used with numbers , for instance duckling will be able to parse 10L but not cr. Also in this case one will be able to get the proper converted value but the entity will be number instead of amount-of-money. The entity amount-of-money will only support currencies as proper entity.

Duckling has helped me to create a Great Chatbot. Hope it helps you too 😊.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.