Mint. But there are some basic differences “in our approach to building these models", he insisted. First, unlike most other startups and companies that are building ‘local’ or ‘Indic’ LLMs for India by fine-tuning global LLMs, “we have built a foundational, and not a fine-tuned model," he said.
General-purpose foundational models such as Google's BERT and Gemini, OpenAI's generative pre-trained transformer (GPT) variants, and Meta's LlaMA series, have been pre-trained on humungous amounts of data from the internet, books, media articles, and other sources. But most of this training data is in English. Most companies in India are building their Indic LLMs atop these foundational models (hence they're called 'wrappers') by fine tuning these general-purpose LLMs on a smaller, task-specific dataset (such as regional languages like Hindi, Marathi, Gujarati, Tamil, Telegu, Malayalam, etc., and their dialects), which allows the models to learn the nuances of the language and improves its performance.
Sutra, instead, uses two different transformer architectures. Developed by Google, transformers predict the next word in a sequence of text based on large, complex data sets. Since they process words in a single sequence while understanding their relationships with each other, transformers are very effective for tasks like translating languages.
The multilingual LLM Sutra, according to Mistry, has combined an LLM architecture with a Neural Machine Translation (NMT) one. The reason: while LLMs may struggle due to the lack of specialized training data while translating specific pairs of language, NMT systems are typically better equipped to translate idiomatic expressions and colloquial language. Second, while “GPT-4 is great in
. Read more on livemint.com