Artificial intelligence is one of the most important advances of this century. It goes hand in hand with new technologies and the internet. It can be applied in a wide variety of fields such as science, industry, household, defense, security, and also language. That is why they applied it to the National Library of Spain and developed the MarIA system to help them use the Spanish language properly.
They developed the MarIA system to optimize the use of the Spanish language
The Barcelona Supercomputing Center (BSC) has launched a project together with the Spanish National Library. It is an artificial intelligence system for understanding and writing Spanish. They called it MarIA and it’s available for free. It can be acquired by developers, companies, and even entities.
The aim is to improve the use of the Spanish language when it is used by other artificial intelligence devices. The data files were provided by the National Library. To do this, they have released a supercomputer called MareNostrum.
Applications and uses of the MarIA system
The system can be used in a wide variety of ways. It can be used in predictors or proofreaders of the language. It’s also useful for automated summaries, chatbots, and translation engines. Other uses may include intelligent searches and automatic subtitles.
The open system is also used to train other systems in the correct interpretation and spelling of the Spanish language. Open Access enables you to view the information stored in MarIA. The project is based on texts and files from the National Library. However, the normal user does not have free access to these files. It’s just created so that companies or professionals can improve their answers. The aim is to achieve correct use of the language.
MarIA system functions
The MarIA project consists of a system of neural networks. You are programmed to understand the language. For this reason, the correct vocabulary and expression with its meaning is important. The system uses 59 terabytes taken from the digital archive of the Spanish National Library. They also performed a cleaning process. In doing so, they eliminated:
- Page numbers.
- Unfinished sentences.
- Repeated sentences
- Sentences in another language.
- Wrong coding.
The system comprises 202 million carefully checked documents. All of these documents occupy a total of 570 gigabytes.
MarIA is the largest and purest system in Spanish language that will exist until 2021. They use a technology called “transformers”. It has already been tested in English with good results. Thanks to “Transformer”, Artificial Intelligence “guesses” the context of each term. In the not too distant future there are plans to carry out the same project with other languages such as Catalan, Galician, Portuguese, Basque and Latin American Spanish.