R&D

Neural Machine Translation With Attention Mechanism: Step-by-step Guide

Related service

Data science

Neural networks now

Neural networks have made significant leaps in the image and natural language processing (NLP) recently. They’ve not only learned to recognise, localise and segment images; they’re now able to effectively translate natural language and answer complex questions. One of the precursors to such massive progress was the introduction of Seq2Seq and neural attention models – enabling neural networks to become more selective about the data they’re working with at any given time.

The core focus of the neural attention mechanism is to learn to recognise where to find important information. Here’s an example of a neural machine translation:

The cycle runs as follows:

The words from the input sentence are fed into the encoder to deliver the sentence meaning; the so-called ‘thought vector’.
Based on this vector, the decoder produces words one by one to create the output sentence.
Throughout this process, the attention mechanism helps the decoder focus on different fragments of the input sentence.

Neural machine translation’s current success can be attributed to:

Sepp Hochreiter and Jurgen Schmidhuber’s 1997 creation of the LSTM (long short term memory) neural cell. This presented the opportunity to work with relatively long sequences, using a machine learning paradigm.
The realisation of sequence-to-sequence (Sutskever et al., 2014, Cho et al., 2014), based on LSTM. The concept being to “eat” part of a sequence and “return” another.
The creation of the ‘attention mechanism’, first introduced by Bahdanau et al., 2015.

But why is this so technologically important? In this blog, we describe the most promising real-life use cases for neural machine translation, with a link to an extended tutorial on neural machine translation with attention mechanism algorithm.

Seq2Seq algorithm’s real-world applications

The Seq2Seq algorithm can perform several core tasks, all of them grounded in ‘translation’ but each with distinct differences. Let's take a closer look at some of them.

Neural machine translation

Machine translation took a huge step forward in 2017, with the introduction of a bidirectional residual Seq2Seq (sequence-to-sequence) neural network, complete with an attention mechanism. The mechanism’s role is to determine the importance of each word in the input sentence, then to extract additional context around each word. It’s thanks to this development that modern tools are now able to produce high-quality translations of lengthy, complex sentences.

Text summarisation

Annotating text and articles is a laborious process, especially if the data’s vast and heterogeneous. Attention models can be used pinpoint the most important textual elements and compose a meaningful headline, allowing the reader to skim the text and still capture the basic meaning. What’s more, text summarisation can do this almost instantly. And it can be used to generate titles for web pages and perform high-level information research, or information segmentation, for rapid reading.

Chatbots with a question-answering capabilities

In the constant quest for efficiency, businesses are trying to automate as many routine processes as possible. As yet, however, the perfect tool for human-machine interaction hasn’t been created. Natural language processing (NLP) isn’t yet flawless but, with the addition of the attention mechanism, its accuracy is greatly improved.

An attention mechanism can detect the most significant (key) words from all kinds of questions – even those that are lengthy and complex – to produce the right answer. And the mechanism can be implemented as an add-on, to work in conjunction with the neural network on the common knowledge base. With chatbots, the mechanism transcends machine translation and takes on a higher level of abstraction – allowing it to translate one verbal sequence into another.

Natural language image captioning (Img2Seq)

The idea here is the same as it is for image recognition. The difference, however, is that to caption the image the attention heat map changes, depending on each word in the focus sentence.

TensorFlow neural machine translation Seq2Seq with attention mechanism: A step-by-step guide

There are many online tutorials covering neural machine translation, including the official TensorFlow and PyTorch tutorials. However, what neither of these addresses is the implementation of the attention mechanism (using only attention wrapper), which is a pivotal component of modern neural translation.

Here’s the link to our tutorial on neural machine translation, based on modern Seq2Seq with attention mechanism algorithm built from scratch. By comparison to what’s out there, this should offer an in-depth overview of all aspects of seq2seq, including attention algorithm.

We used the TensorFlow framework to offer a usable, low-level working example of the concept, based on the Dynamic Seq2Seq in TensorFlow tutorial. And we aim to make it as good as the original PyTorch version.

We hope this will help you get the most out of your machine translation projects and, ultimately, pay dividends for your outcomes. Feel free to leave any questions and feedback in the comments box below.

By Michael Konstantinov
Deep Learning Specialist at Eleks

Want to learn more about the benefits of machine learning for business?

Contact an expert

Data science

Deep-dive into your data and boost business performance by understanding what your users really want.

View service

Nearshore development

During the thirty years we’ve been in operation, we’ve gained the skills and experience needed to offer broad-ranging, market-leading software and consultancy services to clients all over the world.

View expertise

Talk to experts

Skip the section

Full name*
We need your name to know how to address you

Email*
We need your email to respond to your request

Phone number*
We need your phone number to reach you with response to your request

Country*
We need your country of business to know from what office to contact you

Company*
We need your company name to know your background and how we can use our experience to help you

Message*

Attach file
Accepted file types: jpg, gif, png, pdf, doc, docx, xls, xlsx, ppt, pptx, Max. file size: 10 MB.

Add an attachment

(jpg, gif, png, pdf, doc, docx, xls, xlsx, ppt, pptx, PNG)

- I want to receive news and updates once in a while

We will add your info to our CRM for contacting you regarding your request. For more info please consult our privacy policy

Phone
This field is for validation purposes and should be left unchanged.

What our customers say

The breadth of knowledge and understanding that ELEKS has within its walls allows us to leverage that expertise to make superior deliverables for our customers. When you work with ELEKS, you are working with the top 1% of the aptitude and engineering excellence of the whole country.

Sam Fleming

President, Fleming-AOD

Right from the start, we really liked ELEKS’ commitment and engagement. They came to us with their best people to try to understand our context, our business idea, and developed the first prototype with us. They were very professional and very customer oriented. I think, without ELEKS it probably would not have been possible to have such a successful product in such a short period of time.

Caroline Aumeran

Head of Product Development, appygas

ELEKS has been involved in the development of a number of our consumer-facing websites and mobile applications that allow our customers to easily track their shipments, get the information they need as well as stay in touch with us. We’ve appreciated the level of ELEKS’ expertise, responsiveness and attention to details.

Samer Awajan

CTO, Aramex