Swedish AI Models Preserve 500 Years of History

Athina Kontos reports on the pioneering of AI to analyze text in Sweden

The National Library of Sweden is harnessing AI technology developed by NVIDIA to preserve almost half a millenia of literature in digital form.

The library, renowned for archiving ancient and modern Swedish literature, is now working on converting millions of documents into accessible digital assets. The project will benefit researchers in humanities subjects, linguistics, history and media studies, but provides a principal role in the preservation and showcase of medieval manuscripts. 

Swedish law requires that a copy of everything officially published in Swedish is submitted to the National Library of Sweden (Kungliga Biblioteket) for public record. This includes state documentation, journals, books, plays, internet content, menus, all TV/film/radio media, and even video games. This enormous body of data – 26 petabytes in total, has provided a plethora of information for NVIDIA GDX systems and everything needed for a comprehensive Swedish-language training program for AI models.

Researchers are currently developing over 24 open-source transformer models to enable research at the library building in Humlegården, Stockholm and other academic institutions around the country.

In 2019, the Kungliga Biblioteket (KB) established a department called the KBLab. Researchers began experimenting on just 5GB of Swedish-language text and sought inspiration from early language processing models created by Google. Soon after, the lab began testing AI training methods on an international data set of Dutch, German, and Norwegian language text. This work continues efforts towards computing larger models for international language research and content translation. 

As results grew more positive, researchers at KBLab began to focus more on their own body of Swedish-language data and upgrading systems. 

The current GDX models are effective in helping researchers create specialized data sets to understand the specific context and nature of every piece of Swedish-language content. From postcards to blog posts, videos, and social media, this technology will also enable language analysts to review how written and spoken Swedish has evolved over time, its societal influences, and distinction from other European languages. 

In addition to the transformer models, KBLab is working on an AI sound-transcription tool, to create a written record of existing digital media. 

Partnering with the University of Gothenburg, KBLab has also announced an upcoming project to support the Swedish Academy’s work to modernize data-driven techniques for creating Swedish-language dictionaries. 

LEAVE A REPLY

Please enter your comment!
Please enter your name here