Subscribe to enjoy similar stories. Large language models (LLMs) work so well because they compress human knowledge. They are trained on massive data-sets and convert the words they scan into tokens.
Then, by assigning weights to these tokens, they build vast neural networks that identify the most likely connections between them. Using this system of organizing information, they generate responses to prompts—building them, word by word, to create sentences, paragraphs and even large documents by simply predicting the next most appropriate word. We used to think that there had to be a limit to the extent to which LLMs could improve.
Surely, there was a point beyond which the benefits of increasing the size of a neural network would be marginal at best. However, what we discovered was that there was a power-law relationship between an increase in the number of parameters of a neural network and its performance. The larger the model, the better it performs across a wide range of tasks, often to the point of surpassing smaller, specialized models even in domains they were not specifically trained for.
This is what is referred to as the scaling law thanks to which artificial intelligence (AI) systems have been able to generate extraordinary outputs that, in many instances, far exceed the capacity of human researchers. But no matter how good AI is, it can never be perfect. It is, by definition, a probabilistic, non-deterministic system.
As a result, its responses are not conclusive but just the most statistically likely answer. Moreover, no matter how much effort we put into reducing AI ‘hallucinations,’ we will never be able to eliminate them entirely. And I don’t think we should even try.
Read more on livemint.com