Multimodal AI

Multimodal artificial intelligence (multimodal AI) refers to AI systems capable of simultaneously processing, interpreting and integrating several types of data (or modalities), such as text, images, audio, video or sensory data, to generate more complete and nuanced responses or decisions.

Unlike traditional (unimodal) AI models, which specialise in a single type of data (text/images/video/audio), multimodal AI mimics human cognition by combining heterogeneous sources for enriched contextual understanding.