OpenAI’s newest text generation model, GPT-4, was released yesterday with predictable fanfare on social media. But developers can’t yet build any products or services on it (sorry Tweet generators!) because the API is still waitlisted.
That means only a lucky few have had the chance to take OpenAI’s latest large language model (LLM) for a spin yet. One of those is Icelandic AI startup Miðeind ehf, which was one of only six projects selected for GPT-4 beta testing.
This team of 12 people working on Icelandic language preservation came to be one of the anointed early testers of Silicon Valley’s hottest product after a May 2022 trip to the Bay Area. Miðeind’s CEO joined an Icelandic government delegation to explore how tech could be used to help safeguard the country’s language, which dates back to the time of the Vikings.
There was a meeting with OpenAI’s Sam Altman about low-resource languages like Icelandic. These languages pose a challenge for globalising LLMs — as there is far less collected data to train models on.
The Miðeind team gave Sifted their view on how GPT-4 improves on its predecessor, why AI is being used to preserve the Icelandic language, and a very interesting new term GPT-4 has come up with for cats.
Miðeind’s team were tasked with seeing if they could improve GPT-4’s foreign language performance by feeding in Icelandic reinforcement learning data to the model (the phase after the initial training).
Miðeind machine learning team member Pétur Orri Ragnarsson says that the results are a definite improvement on GPT-3.5, but the model is still not perfect when it comes to working in Icelandic: “The text that it generates in Icelandic tends to be understandable — don’t get don’t get me wrong, it’s great — but there are still some grammatical errors.”
Ragnarsson says he can see huge improvements on GPT-3.5 when it comes to more general reasoning as well.
“The most mind-blowing thing is that you can ask it to do something and explain why it gave you this result,” he says. “GPT-3.5 could do it, but GPT-4 is better — it feels like the explanations are more plausible or more thought-through.
“A common thing people will tell you to try out is to ask it [the model] to do something and explain every step along the way — it does that super well.”
“Explainability” is one of the big challenges that people developing generative AI have been trying to solve, as the way LLMs function means that the output is generated in a “black box”. This means that even the people who built GPT-4 don’t know exactly how it answers questions in the way it does, meaning it’s been hard to get these models to show their workings.
If generative AI is going to be put to use widely across industries like medicine and the legal sector, people working in those fields will need to be able to trust the outputs from the models.
Higher order thinking
Another feature of GPT-4 that Ragnarsson been impressed by is its ability to generate responses that seem more perceptive than the model’s predecessors. He gives the example of using it to do sentiment analysis on a piece of text, scoring it from neutral to positive on a scale of one to five.
“I inputted a text, which I think is a pretty neutral text — about a customer asking customer service for something,” says Ragnarsson, who was then surprised that GPT-4 told him the text was “slightly positive”.
“I asked, ‘Please explain.’ The answer that came back was very surprising, it said, ‘While the text itself is neutral, the action that the person is considering doing would improve their life, so on the whole this text is slightly positive’.”
He believes this demonstrates that GPT-4 has learnt to see beyond the “surface meaning” of the text.
Miðeind’s COO Linda Heimisdottir says that these capabilities are particularly impressive, given that the model wasn’t — as far as she knows — specifically trained for sentiment analysis.
“It’s basically mind-blowing to see a model like this do things that researchers have been working on for years and years and it’s not specifically trained on this,” she says. “It’s just really exciting to see what will come out of it and what people come up with. It feels like the sky’s the limit.”
One example of how GPT-4 struggles with Icelandic comes from the language’s use of compound words — which combine different concepts into one word.
Heimisdottir says she asked GPT-4 to tell her a story about a cat and that it produced an Icelandic text with the term “kattafræðilega”, a compound word that the model had invented which roughly translates as “catologically”.
“The first part just means ‘cat’ but the second part, ‘fræðilega’, means something like ‘related to theory,’” she explains. “The model described the cat as being ‘kattafræðilega duglegur’. Duglegur is a legit Icelandic word which can mean something like diligent or hard-working.
“When I asked the model to explain what it meant it said: To be ‘kattafræðilega duglegur’ means that the cat is particularly diligent at what it does as a cat. In other words, it is skilled at scratching, investigating, chasing after insects, finding food and at being active and interested in its surroundings. It is simply good at being a cat.”
Miðeind believes that for LLMs to achieve really high performance in lesser-used languages, the models will need to include good multilingual data sets in their initial training (“We’re hoping that we can get into the pre-training as the next step,” she says.)
Research like this will be critical in ensuring that the next generation of AI doesn’t just further concentrate the advances from innovation in the English-speaking world, as Silicon Valley’s big tech companies are already dominating the field of LLMs. The fact that OpenAI chose Miðeind as an early partner for GPT-4 does as least show the company has a global vision for generative AI — even if it’s a commercially motivated one.
Tim Smith is senior reporter at Sifted. He tweets from @timmpsmith