TheMDA stands for Language Models for Dialog Application. That is, language models for dialog application, a language model created to allow software to better participate in a fluid and natural conversation.
TheMDA is based on the same transformer architecture as other language models, such as BERT and GPT-3.
One of the main features of LaMDA is that it is a system capable of understand questions and conversations nuanced questions and conversations covering several different topics. This allows it to differentiate itself from other systems where, the open-ended nature of human conversations, where one can answer something that has nothing to do with the question, causes confusion.
how does Google LaMDA work?
LaMDA is built on top of Transformer, the google’s open source neural networkwhich is used to train natural language understanding algorithms.
The model is trained to find patterns in sentences, correlations between different words used in those sentences, and even predict the word that is likely to come next.
To do this, it analyzes data sets consisting of dialogues and not just individual words. In this sense, although a conversational AI system is similar to chatbot software, there are some key differences. For example, chatbots are trained on limited and specific data sets and can only hold a limited conversation based on the exact data and questions it has been trained on.
However, since LaMDA is trained with several different datasets, it can hold open-ended conversations. During training, it picks up the nuances of the open-ended dialogue and adapts to them. That is why it can answer questions on many different topics, depending on the flow of the conversation.
In this way, it enables conversations even closer to human interaction than chatbots usually offer.
LaMDA training uses a total of 1.56 trillion words and 137 billion parametersin two different phases. In the first phase, the team created a dataset of 1.56 trillion words from multiple public documents.
This dataset is converted into a string of characters to form sentences (it is tokenized), with a total of 2.81 billion tokens, which are used for initial training of the model. General, scalable parallelization is also used at this stage to predict the next part of the conversation.
In the second stage, LaMDA is trained to be able to predict the next part of the dialogue, generating relevant responses based on the back-and-forth conversation.