Radio Inspire

How To Learn Sign Language

How to Make a Language Translator – Intro to Deep Learning #11


Hello World! It’s Siraj and let’s make our own language translator using TensorFlow Today there are about 6,800 different languages spoken across the world and in an increasingly globalised world nearly every culture has interactions with every other culture in some way that means there are an incalculable number of translation requirements every second of every day across the world Translating is no easy task A language isn’t just a collection of words and of rules of grammar and syntax it’s also a vast inter-connecting system of cultural references and connotations and this reflects a centuries old problem of two cultures wanting to communicate but are blocked by a language barrier our translation systems are fast improving though, so whether it be an idea or a story or a quest each new advancement means one less message will be lost in translation. During the Second World war, the British government was hard at work trying to decrypt the morse coded radio communications that Nazi Germany used to send messages sucurely, known as Enigma. They decided to hire a man named Alan Turing to help in their effort and when the American government learnt of their translation effort they were inspired to try it themselves, post war. Specifically because they needed a way to keep up with Russian scientific publications. The first public demo of a machine translation system, translated 250 words between Russian and English in 1954. It was dictionary based so it would attempt to match the source language to the target language word for word The results were poor since it didn’t capture syntactic structure. The second generation of systems used Interlingua that means they changed a source language to a special intermediary language with specific rules encoded into it then from that generated the target language. This proved to be more efficient, but this approach was soon overshadowed by the rise of statistical translation in the early 90s Primarily from engineers at IBM Innovation at IBM Watch this A popular approach was to break the source text down into segments then compare them to an aligned bi-lingual corpus using statistical evidence and probabilities to choose the most likely translation. Nowadays the most used statistical translation system in the world is Google Translate and with good reason. Google uses deep learning to translate from a given language to another with state of the art results So how do they do this? Let’s recreate their results in TensorFlow to find out The dataset we’ll be using to train our language translation model is a corpus of transcribed TED talks It’s got both the English version and the French version and our goal will be to create a model that can translate from one to the other after training. We’ll be using TensorFlow’s built in data_utils class to help us pre-process our data set and we’ll start by defining our vocab size which is the number of words we want to train on from our dataset. We’ll set it to 40k for each which is a small portion of the data then we’ll use the data_utils class to read the data from the data directory. Giving it our desired vocab size and it will return the formatted and tokenised words in both languages We’ll then initialise TensorFlow placeholders for our encoder and decoder inputs Both will be integer tensors that represent discrete values they will be embedded into a dense representation later We’ll feed our vocabulary word to the encoder and the encoded representation that’s learnt to the decoder. Now we can build our model Google published a paper more recently discussing a system they integrated into their translation service called Neural Machine Translation. It’s an encoder decoder model inspired by similar work from other papers on topics like text summarisation. So whereas as before Google Translate would translate from language A to English to language B with this new NMT architecture, it can translate directly from one language to the other It doesn’t memorise phrase to phrase translations instead it encodes the semantics of the sentence. This encoding is generalised so it can even translate between a language pair like Japanese and Korean that it hasn’t explicitly seen before. So I guess we can use a LSTM recurrent network to encode a sentence in language A the RNN spits out a hidden state ‘s’ which represents the vectorised contents of the sentence. We can then feed ‘s’ to the decoder which will generate the translated sentence in language B, word by word. Sounds easy enough right? WRONG! There is a drawback to this architecture, it has limited memory that hidden state ‘s’ of the LSTM is where we’re trying to cram the whole sentence we want to translate ‘s’ is usually only a few hundred floating point numbers long The more we try to force our sentence into this fixed dimensionality vector the more lossy our neural net is forced to be. We could increase the hidden size of the LSTM after all they’re supposed to remember long term dependencies but what happens is as we increase the hidden size ‘h’ of the LSTM the training time increases exponentially. So to solve this we’re going to bring ‘Attention’ into the mix If I was translating a long sentence, I’d probably glance back at the source sentence a couple times to make sure I was capturing all the details. I’d iteratively pay attention to the relevant parts of the source sentence We can let neural nets do the same by letting them store and refer to previous outputs of the LSTM This increases the storage of our model without changing the functionality of the LSTM So the idea is once we have LSTM outputs from the encoder stored we can query each output asking how relevant they are to the computation happening in the encoder Each encoder output gets a relevancy score which we can convert to a probability score by applying a softmax activation to it. Then we extract a context vector which is a weighted summation to the encoder outputs depending on how relevant they are. Memory ain’t enough, pay attention Memory ain’t enough, pay attention (In Hindi) Memory ain’t enough, pay attention (In German) Memory ain’t enough, pay attention (In Spanish) Memory ain’t enough, pay attention We build our model using TensorFlow’s built in embedding attention sequence to sequence function giving it our encoder and decoder inputs as well as a few hyper parameters we define like the number of layers. It builds a model that is just like the one we discussed TensorFlow has several built in models like this that we can drop into our code easily So normally this alone would be fine and we could run this and the results would be decent but they added another improvement to their model that requires MORE CODE A 100 GPUs and a WEEK OF TRAINING Seriously that’s what it took we won’t implement it all programatically but let’s dive into it conceptually If the outputs don’t have sufficient context then they won’t be able to give a good answer we need to include info about future words, so that the encoder output is determined by the words on the left and the right. We humans would definitely use this kind of full context to determine the meaning of a word we see in a sentence. The way they did this is tho use a bi-directional encoder so it’s two RNNs. One that goes forward over the sentence and the other goes backwards. So for each word it concatenates the vector outputs which produces a vector with context from both sides. and they added a lot of layers to their model. The encoder has one bi-directional RNN layer and seven uni-direciotnal RNN layers The decoder has eight uni-directional RNN layers The more layers the longer the training times so that’s why we use a single bi-directional layer if all the layers were bi-directional the whole layer would have to finish before layer dependencies could start computing But by using uni-directional layers, computation is going to be more parallel. We’ll initialise our TensorFlow section, then our model inside of it Let’s see some results after training. First I’ll give it this phrase Looks good and now another phrase DOPE! While it’s not perfect and we still have a way to go we’re definitely getting closer to having a universal translation model. Breaking it down Encoder-Decoder architectures are for state-of-the-art performance in machine translation by storing the previous outputs of the LSTM cells we can judge the relevancy of each to decide which to use via an attention mechanism. And by using a bi-directional RNN, the context of both past and future words is used to create an accurate encoder output vector. The coding challenge winner from last week is Ryan Lee This was very impressive, he created a recipe summariser by scraping a 125,000 recipes from the web and documented it all beautifully with installation steps so you can reproduce the results yourself. WIZARD OF THE WEEK! and the runner up is Sarah Collins her code converts scientific papers to text and prioritises them by topic. This weeks coding challenge is to create a simple translation system using an encoder-decoder model. All the details are in the readme, post your github link in the comments and I’ll announce the winner next week. Please subscribe for more programming videos Check out this related video and for now I’ve got to get a better GPU So, thanks for watching!

89 Replies to “How to Make a Language Translator – Intro to Deep Learning #11”

  • awesome! looking forward to your incorporation of the context, a very important factor. for example,
    In English, "You aren't a student, are you?" (if you are) "Yes."
    In Chinese, "You aren't a student, are you?" (if you are) "No."

    not just the whole article, if you are translating a movie, i think people would be interested in knowing how to label everything and teach the machine to learn them.

    looking forward to that

  • how to add attention on top of a non seq2seq lstm? i wanna do a text classification and i think attention might help (mathematically)?

  • from tensorflow.models.rnn.translate import data_utils
    I am getting the error that this package does not exist. How to solve this?

  • Hi Siraj,
    I am a computer science student recently started working on a project on machine learning. I need to make a bot to learn the game "chain reaction" that i have coded in pygame. I'm stuck on how should I implement the bot. Some help would be really appreciated!!
    Thanks in advance..

  • Hi Siraj, you create LIT content! I love it. Keep up the good work. I'm learning so much to hopefully impress my interviewers at Google. πŸ™‚

  • Is this meant to work with a specific version of TensorFlow? I'm on 1.0.1 and it throws errors ("has no attribute 'rnn_cell'") which are apparently related to some undocumented changes between TF versions. I've also been running into other errors but have been able to figure them out — is this part of the challenge? πŸ˜‰

  • Love your vids! And the rap parts are awesome! Thank you for showing us how easy is ML. For me as an statistician it's a pleasure to see what you create each week!

  • Great work, love your channel! It's all starting to make sense but still wouldn't be able to write a model for a new problem yet. Also love the rapping, really cool πŸ˜‰

  • hey can you make an episode on Latent Sentiment Analysis on score essays
    to a numeric value or grade say 90% or 20% in python. There's is little
    content on YouTube that fully describe Latent Sentiment Analysis ,most
    of they just talk about TF-IDF,so i am looking for more really.

  • Hi Siraj,

    I am new to machine learning. I have seen bunch of your videos which are very good and interesting. I have one question, from where do I start as a beginner. Should I continue directly from deep learning or clear some my basics first. I have experience in python so that will not be a problem.

  • I really appreciate that you're a professional who's willing to share his expertise, even though I'm not interested in this subject. Unfortunately when senior Chemical engineers retire, they leave the plant they worked at and their 30+yrs knowledge goes with them. They fix problems that appear in the plant with properly logging the events, but only in their minds. So when they retire all the professional knowledge is lost.
    What you are doing here is really special, thank you.

  • Hi Siraj, could you please do a video on bounding box detection? I really like your videos, thanks for all the effort you put into making them

  • Siraj you should get a professional mic with a filter! It will put your videos to a whole another level πŸ˜‰

  • Great video Siraj! Here's my submission, https://github.com/erilyth/DeepLearning-Challenges/tree/master/Language_Translation . Training these models takes a very long time though, are there any online services that provide free GPU access for students?

  • Siraj, can you please show the 10,000 hours project? I like to see how I can shorten the learning process in any subject. thank love u πŸ™‚ πŸ™‚

  • When I run:
    from tensorflow.models.rnn.translate import data_utils
    I get an ImportError saying "No module named tensorflow.models"
    Any help, please?
    Thanks in advance.

  • Google translate is horrible when translating to and from a minority language compared to a majority language. This is the sad truth!

  • Good to know that google translate can be trusted tomorrow, I'm going to translate sentences to Italian because there's a girl in my 6th hour that doesn't speak English and I want to talk to her hope it goes well 🀞🏻

  • it's "prend-moi dehors", 'cos attrape is like catching a fish, but prend is take, but catching someone in a sexual way as well, so it's more appropriate to the context. πŸ˜‰

  • Great video as always
    Tnx for sharing!!

    Would it be possible to share the weights of your videos? it would be much better to see results that way for the poor who don't use gpu:)

  • Hello Siraj, its World :-p "Great videos, thanks… "The way you teach/share knowledge is way different from anyone. Your excitement makes vids interesting :-p Though Some of the topics needs more details .

  • I am getting below error when running this. Siraj could you please help."Valueerror: Variable proj_w already exists, disallowed. Did you mean to set reuse = True or reuse =AUTO_REUSE in VarScope"

  • Can you make a video over Gene expression microarray data like ACGT… using DL? Thanks in Advance πŸ™‚

  • #TranslationServicesInSingapore #TranslationCompaniesInSingapore – http://www.translationworkzone.com/translation-services-in-singapore

  • siraj im software engineering student same like try use Language Translator application for my final project im new to python language try to run your application and get the idea but application cant be run can you please teach me how to run this application please contact me hope [email protected]

  • greetings from Kuwait! I'm intrested in translation field in general and this video helped alot! thank you very much for sharing this well-executed video. subscribed

  • By the way, Marian Rejewski was the first polish mathematician who broke Enigma machine in 1932, before WorldWar 2 and before Alan Turing.

  • Just started with OpenNMT, one simple question: Once trained and deployed, does the neural network grow upon each usage? i.e. if I train OpenNMT then deploy it on a server and make it public, will the machine keep learning unsupervised when people use it?
    Does the knowledge base of the machine grow?

  • have you done any Hindi to English translator . if yes then give me the code or this same code works in it…thankyou…..reply as soon as possible brother….

  • bro is it possiple to find which is their native country based on the photos through giving collection of photo he is indian and he is britain

  • please guide me . . How i make portable language translator offline device with the help of arduino or pi ?

  • SMT and neural machine translation are great, but they are not useful for minority languages which do not have parallel corpora (Quechua, Wolof, Chamorro…), and that's a pity.

  • Hey Siraj could you please make a video on NMT (Neural machine translation), which is one of the advanced machine translation methods.

  • Hi, I am using this tutorial to create a webpage, where anyone can upload the document (pdf, docx) to translate the file. Can anyone help me on how to get the pdf document in python and extract the texts to translate it?

  • I am not in this level of education…but i was plotting on making a deep learning ai system you can carry on with mic for listening and small form monitor you can carry on and it translates in real time words spoken in any language. Again, i know nothing about this stuff but it seems like its well on its way…reguardless….if the system is made maybe it can be incorporated into a dfesign like the idea i got….maybe the lack of knolege comes from the lack of having that type of hardware…i love learning things

  • Hi. did anyone know how's to create a transliteration machine learning that can solved homograph disambiguation using python?

  • use this link for language translation in your android phone

    https://play.google.com/store/apps/details?id=com.aru.languagetranslator

Leave a Reply

Your email address will not be published. Required fields are marked *