In AI, a parameter is a property learned from the data used to train the model.. This is an adjustable element that determines the behaviour and functionality of the AI model..
Parameters play a crucial role in the operation of AI models:
- They influence the way the model interprets data and generates responses..
- They allow the model to learn from the training data and generalise this knowledge to process new inputs..
- They are adjusted during training to optimise the model's performance on specific tasks..
In the case of language models, the parameters are often associated with the weights of the connections between the neurons in the neural network of the model.. The more parameters the model has, the more details and nuances it can learn from the data, enabling it to produce more complex and natural responses. Parameters are essential because they form the basis of the model's ability to 'understand' and generate language that sounds natural to human users.
These are the numbers (numerical values) that define the way in which the model transforms the entries (data) in exits (predictions).
Example: In a neural network, each connection between neurons has a weight (weight), and each neuron has a bias (bias). These weights and biases are the parameters.
Role
- The parameters store knowledge of the model, learned from the training data.
- They are adjusted via optimisation algorithms (e.g: gradient descent) to minimise the error between predictions and actual results.
Types of parameters
Driveable parameters : those that the model adjusts during training (e.g. the weights of a neural network).
Hyperparameters external parameters defined before training (e.g. learning rate, number of layers, etc.). They are not learned by the model.
Why are there so many parameters?
- Model capacity :
- The more parameters a model has, the more it is theoretically able to capture complex patterns in the data (e.g. GPT-3 with 175B, i.e. 175 billion parameters vs. BERT with 340M, 340 million parameters).
- However, too many parameters can lead to overlearning (overfitting) or high calculation costs.
- Cost and resources :
- Models with billions/billions of parameters (e.g. GPT-4) require supercomputers and massive amounts of data.
- Example: GPT-3 training would have cost several million dollars to calculate.
Examples
- GPT-4 (~1.8T parameters): each parameter influences the generation of text depending on the context.
- Stable Diffusion (890M parameters): parameters related to image generation via diffusion layers.
- BERT (340M parameters): parameters used to understand the relationships between words.
Key points
- Parameters ≠ Performance A model with fewer parameters but better training (eg: Chinchilla) can outperform a larger model.
- Balance Finding a compromise between model size, available data and resources is crucial in AI.