Understanding Foundation Model - Part 3 (Model Size)
The Power and Limits of Larger AI Models: What You Need to Know
One of the major reasons that AI has made progress recently is because it can do a lot more, which in turn is due to the increased model size. The bigger the model size, the more it can learn. The more it can learn, the better the model is (mostly). The size of the model is defined by the number of parameters a model has.
I have discussed what parameters actually mean in the blog below:
A large model usually has more parameters. More parameters mean the more complex relationships it can learn and the more information it can potentially store. Model size is a critical factor influencing several aspects of a foundation model's capabilities:
Performance and Capability: Generally, larger models tend to perform better on a wide range of tasks, especially in natural language processing. The more parameters there are, the greater the model's capacity to learn. It allows the model to:
Learn more intricate patterns and nuances in data.
Store a vast amount of knowledge derived from its training data.
Generalize better to unseen data.
Exhibit emergent capabilities, such as advanced reasoning or few-shot learning, that are not present in smaller models.
Computational Resources: Training and running larger models demand significantly more computational resources.
Training Time: Training a large foundation model can take weeks or even months.
Memory: Larger models require more memory to store their parameters and intermediate computations.
Data Requirements: Larger models often require correspondingly larger and more diverse datasets for effective training. To fully utilize their increased capacity, they need vast amounts of data to learn from.
Democratization: The immense resources needed (as discussed above) for large models can make them inaccessible to smaller organizations, researchers, and individuals, potentially centralizing AI development among a few well-resourced entities.
Quality factors in the Model
A large model does not necessarily mean that it is a good model. The quality of the model is also determined by the amount and quality of the data it is trained on. A large model trained on a small amount of data will underperform in comparison to a small model trained on a large amount of data. In an oversimplification, let’s assume a large model trained on a dataset with just one sentence: “I’m a large model.” Such a model will perform poorly for most use cases. Hence, the dataset it is trained on plays a significant role in the performance of the model.
Another factor affecting the quality of the output is the diversity of the data it is trained on. For example, a particular section of society might have two different norms and opinions on the same thing. The same holds true for political data. Now, if the model is not trained on a diverse dataset, it would lead to biased output, tone, as well as sentiments. A diverse dataset also helps the model not just specialize in one specific task, but in multiple tasks. For example, if the model is only trained on formal text, it would face difficulty in generating output for, say, poems.
The role of varied datasets in model training leads to both improvement in the model as well as the scarcity of more data to train the model on. Recently, the open availability of data to train these large models is slowly becoming scarce. This is also leading to newer companies taking up the special task of creating more human-intervened datasets.
The Trade-off
While larger models offer enhanced performance, there's a significant trade-off between model size and practical considerations. The cost of incrementally improving the model's output quality is very high. These models use so much data that incremental improvement requires a lot more resources than are currently available. The increased scarcity of new data is just one limiting factor. The amount of resources, both in terms of money and electricity requirements, is also significantly affecting the overall ability to scale model improvement.



