Savant Code - Executive software consulting services

Large Language Models have been in the news a lot with usage exploding in the last two years. This AI technology has many applications and is being used to answer questions, generate stories, translate text, summarize documents, and recognize speech in human-like ways. The best known LLM is OpenAI’s ChatGPT, but there are many other popular ones such as Google's Gemini, Meta's Llama, Anthropic's Claude, Mistral AI's Mistral Large, etc.

While LLMs have proven to be extremely useful and valuable, there are pitfalls worth understanding. One technology that addresses these particular concerns are Small Language Models or SLMs.

Language Models

Let's first understand language models. A language model is a type of machine language model that is trained on large amounts of text written by people. By looking at how people normally communicate it is able to generate statistics and analyze those results to predict how likely it is for words to appear together. The key point is that engineers have not tried to program language rules or grammar to build the model. Instead, the software uses experience gained from how people write (i.e. through statistics, algebra and probability) to give it the ability to generate human-like text in response to a question or as part of the normal flow of conversation.

LLMs are trained with lots of data. They also typically have billions or trillions of internal tweakable knobs (i.e. parameters) that influence how input is processed and the output it generates. As an example, GPT-3 was trained with about 175 billion parameters and GPT-4 was trained with about 1.8 trillion parameters. The next version of GPT will probably have 50 times that. SLMs typically have less than 100 million parameters and more likely less than 30 million parameters. There isn’t an official cutoff, but that’s the general magnitude and my guess is that it is a slightly moving target.

Isn’t more better?

Sure, in some ways. However, you can imagine that with so much data you would need a lot of computational horsepower to train, generate and update the model. That leads to very costly infrastructure. Companies with less resources are forced to rely on APIs and services provided by one of the large players to satisfy their need on a pay for service basis. Enter the SLM. These targeted models based on smaller datasets are easier to tune, deploy, train and operate. There are four non infrastructure concerns that this leads to:

Bias and Hallucinations:

AI Bias refers to results from AI systems skewed to reflect human biases in society. For example, if training data for speech recognition over-represents native English speakers the system may result in errors when recognizing speech of non-native English speakers because of accents or how they speak. In addition to training data bias, there could be algorithmic or cognitive bias as well.

AI Hallucinations refer to incorrect results that may be plausible. This may be due to flawed training data or not correctly understanding real world knowledge. Worse it could be the result of bad actors purposely introducing misleading training data.

Since LLMs are generally trained on data scraped from publicly available internet sources they are susceptible to these flaws and require one to put measures in place to mitigate them.

The training data used in SLMs generally come from more controlled sources and are therefore inherently resistant to this issue.

Generalized answers

LLMs can service a wide array of topics because of the broad training data that they are based on, however they have a more difficult time to accurately handle use cases for a particular domain. SLMs can be tailored for a narrow specific use case and fine-tuned for an organization's particular domain.

Security and privacy

The large amount of data used for LLMs can include sensitive or private information. Since, for resource reasons, LLMs are not generally housed internally there is potential for data leakage. Data leakage comes in different forms. One familiar case is when sensitive data is inadvertently or maliciously shared externally. The second less obvious form is if private data is shared by the AI system as part of a response to someone who should not have access to the data. Additionally, cloud-based services rely on public APIs to transfer data which means that data moving to and from the cloud service (often referred to as data in motion) can be vulnerable to attacks. Masking, obfuscation and cryptography can be used to mitigate this, but housing SLMs inhouse offers a significant layer of oversight, security and privacy of the data.

Performance and adaptability

Smaller size leads to less latency and more responsive interactions. Small also means it is potentially quicker and cheaper to update, evolve and adapt. This last point is crucial for dynamic domains that need to be regularly fine-tuned.

Decisions, Decisions

With those benefits in mind, SLMs are not without their own challenges. From an infrastructure point of view if an organization needs to support multiple domains it may be too complicated to support and maintain multiple SLMs. Also, once an organization decides to take the SLM in house they will need the AL/ML expertise to choose, evolve, upgrade, etc the SLM. From a capability standpoint, the general language processing abilities will be slightly reduced. The decision to go big or small depends on your experience, goals and needs, but both SLMs and LLMs have their place for applications that language models excel at. I believe that as the adoption grows and expertise in language models proliferates, more organizations will adopt SLMs.