Home > AI Glossary > LLM (Large Language Model)

LLM (Large Language Model)

A LLM (Large Language Model), also known as large language modelis a type of artificial intelligence model trained on massive volumes of text data to understand, generate and predict text autonomously.

Based on the transform architecture, these models exploit self-attention mechanisms to capture the complex relationships between words in a text. This approach enables them to perform a multitude of tasks: machine translation, question answering, content creation, code generation and even information synthesis.

Comparison of the main LLMs

Model name	Company	Type	Size (Parameters)
o3	Open AI	Owner	5,000 billion
o3 mini	Open AI	Owner	20 billion
Gemini 2.0 Pro	Google	Owner	1,000 billion
Gemini 2.0 flash	Google	Owner	30 billion
DeepSeek R1	DeepSeack AI	Owner	685 billion
Llama 3.3	Meta	Open Source	70 billion
Pixtral Large	Mistral AI	Open Source	124 billion
Claudius 3.5 Sonnet	Anthropic	Owner	175 billion

How LLMs work

Massive data training

LLMs absorb billions of texts from books, articles, websites and conversations in order to learn linguistic patterns.
Example: GPT-3 was trained on around 45 TB of text data.

Architecture transform

They are based on neural networks of transformers which use layers of self-attention.
Illustration: In the sentence "He walks his dog", the model learns to link "He" to "dog" depending on the context.
What's more, their design enables efficient parallelization for processing very long sequences of text.

Self-supervised learning

LLMs use self-supervised learning techniques such as the prediction of hidden words or the prediction of the logical sequence of a text. This process enables them to "learn" without the need for manual data labelling.

🔎 Key features

Versatility : Thanks to fine-tuning or prompt engineering, the same LLM can be adapted to tasks as varied as answering questions, writing articles or generating code.
Creative generation : able to produce original texts (poems, scripts, etc.) or synthesise complex information.
Dynamic context : Some LLMs, such as ChatGPT, retain the history of an exchange, making it possible to manage long conversations efficiently.

👉 Practical applications

Virtual assistants : automated customer support, calendar management, chatbots...
Education : personalised tutorials, homework correction...
Search for : analysis of scientific publications and synthesis of data...
Content creation : writing articles, generating advertising scripts...

🚧 Challenges and limitations

Bias and toxicity : LLMs can reproduce stereotypes or disseminate erroneous information contained in training data.
Hallucinations : they can generate incorrect or invented facts (for example, incorrect historical dates).
Energy cost : the training and execution of these models are very energy-intensive (for example, it is estimated that training GPT-3 would consume around 1,300 MWh).
Privacy : the risk of data leakage (e-mail addresses, medical information, etc.) is real.
Ethical and regulatory issues : the need to ensure transparency, traceability of decisions (model cards) and compliance with the RGPD or theAI Act European.

📈 Future developments

Smaller, more efficient models : development of optimised architectures (e.g. TinyBERT) to reduce the carbon footprint.
AI aligned : use of techniques such as reinforcement learning from human feedback (RLHF) to limit harmful responses.
Ethical personalisation : adapting LLMs to specific needs without reinforcing existing biases.

📊 Key figures and statistics on LLMs

🌍 Worldwide

Market growth : the market for LLMs andGenerative AI is growing exponentially. Some studies (e.g. Goldman Sachs report) suggest that generative AI could increase global GDP by almost 7 % in the next ten years.
Corporate adoption : from 2021 to 2024, the number of companies adopting LLM-based solutions has risen significantly, with an increase of up to 200 % in certain regions.

In France

Linguistic representation : French remains under-represented in LLM training sets, with less than 5 % the text data used comes from French-language content, which may limit performance for the French language.
Adoption in industry : according to several surveys, around 25 % of large French companies have already tested LLM-based solutions, and nearly 40 % plan to invest in these technologies between now and 2025.
Investment and research : France, and Europe more generally, are strengthening their position by developing open source models (such as Mistral 7B) and supporting AI research, in order to reduce their dependence on predominantly English-speaking technologies.

Back to Glossary

field of training



AI, Machine Learning, data analysis

associated training



LLM (Large Language Model)

Comparison of the main LLMs

How LLMs work

Massive data training

Architecture transform

Self-supervised learning

🔎 Key features

👉 Practical applications

🚧 Challenges and limitations

📈 Future developments

📊 Key figures and statistics on LLMs

🌍 Worldwide

In France

field of training

AI, Machine Learning, data analysis

associated training

Artificial intelligence: issues and tools

Machine learning, methods and solutions

Deep Learning and neural networks: the basics

Group websites

Our 42 training centres

Contacts

Follow us