Generative Artificial Intelligence (GenAI) Glossary

A collection of terms and resources about Generative AI

Jan 27, 2025

Generative AI (GenAI) is one of the hottest fields in Artificial Intelligence right now.

Since the introduction of ChatGPT, the pace of innovation in this field has been absolutely astonishing, with new breakthroughs, tools, and applications emerging almost daily. The rapid pace of research in this field has made it challenging for data professionals to keep up, leading to both excitement and apprehension about the future of artificial intelligence.

But, despite what the hype says, GenAI draws from a mixture of traditional AI methods, advancements in machine learning, and progress in Natural Language Processing (NLP). While the concepts may initially seem complex, there’s no mysterious “black magic” at play — just innovative applications of machine learning (ML) and software engineering principles.

As the world gets saturated with AI hype, understanding the foundational concepts of GenAI and how they fit into the broader AI ecosystem is essential for data professionals. How do these pieces connect and tie with the traditional ML knowledge we may already have?

In this post, I aim to give you a concise glossary of key terms and concepts in the Generative AI world. This glossary will serve as a guide to help you navigate the jargon of the field, giving you a clearer understanding of the foundational ideas driving this exciting area of AI — I’ll keep updating this post as time goes by, with new terms and concepts.

Generative AI

Generative Artificial Intelligence is a subset of artificial intelligence aiming to create new content, based on AI models (able to create text, video, sound, etc).

The word generative originates from the word generate (create) — as these models are able to create new material by sampling from previously learned probability distributions and weights.

These new AI models can generate original content based on patterns and information they have already learned. This is significantly different if we compare it to traditional machine learning models that focus on tasks like classification or regression, focused on predicting specific outcomes based on features.

Generative AI models can also “create” their own features and data, introducing new dimensions in the AI world — particularly when we want to mimic how humans learn and interact with the world. Some of the most famous generative AI applications in the world are:

ChatGPT (Text Generator)
Dall-E (Image Generator)
Gemini (Text Generator)

Neural Networks

A set of models aimed to simulate how the brain works.

Neural Networks started with the birth of the Perceptron, one of the first models that started in 1958 with the goal of emulating the structure of synapses and axons of the human brain.

Since then, Neural Network models have evolved into deeply complex architectures that take linear algebra and multivariate calculus to the next level. They have shown extraordinary results in balancing between overfitting and underfitting, solving some of the worlds most difficult machine learning problems.

Some of the most famous architectures are Convolutional Neural Networks (for vision), Recurrent Neural Networks (for time-series or text) or Feed-Forward (for classical datasets) models. It’s also normal for ML applications to use a mixture of different architectures.

Perceptron Neural Network Model — Image Source

Convolutional Neural Network Example — Image Source

Multi-Modal

A characteristic of Generative AI algorithms that are able to represent similar concepts in multiple formats (picture, text, audio, video).

This ability to integrate different types of input into the same “reasoning” AI model is one of the largest discoveries in the past decade related to the field — transforming the way we work with AI models as we can now use different inputs to represent the same concept. This process mirrors how humans abstract concepts.

Multi-modal AI therefore refers to models designed to process and understand various types of data simultaneously — while also being capable of producing outputs in different formats.

Discriminative vs. Generative Models

We’ve seen that generative models are models aimed at learning the distribution of the data, therefore p(X,Y). These models have the goal of creating new data or understanding the underlying data structure to mimic it. They are also more general, generalizing for different tasks, other than the ones they were trained for.

On the other hand, discriminative models aim to predict specific tasks (classification or regression), being able to infer p(Y). Examples of these types of models are Decision Trees, Linear Regression or Random Forests.

Discriminative models are typically more accurate for specific tasks because they focus solely on mapping inputs to outputs. However, they lack the flexibility to generalize beyond their trained scope. In contrast, generative models have the advantage of versatility, as they can handle a wider range of tasks, including some they weren’t explicitly trained for.

Word Embeddings

Word embeddings are a representation of written text in vector format that is not directly related to the characters of the word or phrase.

One of the first successful instances of word embeddings is the research on the paper Word2Vec — this paper blew away academia by finding out how mathematical vectors related to words are related to each other. These vectors can be obtained by training a machine learning model that uses the context around the words.

These vectors can be compared and create interesting connections between similar words. The creation of successful word embeddings solved one of the oldest issues with natural language processing models, as we moved away from assuming that computers could represent human language using the characters of the word.

Word Embedding Comparison — Image Source

Latent Space

A latent space is a lower level dimensional space (for example, 2-Dimensions) that are able to represent higher level dimensional space (in this example, more than 2).

For example, Principal Component Analysis and T-SNE (T-Distributed Stochastic Neighbor Embedding) can produce latent spaces of original data, reducing the number of dimensions.

A latent space is any lower dimensional representation of original data. For example, in an image with different digits a latent space can represent thickness, shape and other characteristics. This latent space may be able to group together digits that are similar to each other, without relying on the full image.

Successfully creating latent spaces has been important to deal with the high dimensionality of training data, but also for questions of interpretability and transparency of AI models.

Latent Space VIsualization of the MNIST dataset — Image by Author

Tokenization

The traditional process of transforming sentences into chunks of text.

Tokenization was very important in the context of classical natural language processing as you needed to perform a lot of preprocessing steps- In classical NLP, tokens could be replaced via stemming or lemmatization and stop words were removed.

Today, tokenization is mostly used in the context of mapping words to integers, particularly as this is the widely supported format of transformer based ML models.

Prompt Engineering

A technique that is used to manipulate the output and role of a foundational model. Humans typically interact with these types of models via prompting — natural language that is fed to the model — and prompt engineering is the growing area of improving the quality of the answers via better inputs.

Unlike discriminative machine learning models, which output probabilities or numerical values, interaction with generative AI primarily occurs through text, subsequently converted into vector format. Prompt engineering allows users to significantly influence generative AI results, particularly with large language models (LLMs). Some aspects of prompts that can be improved via engineering are:

Specifying tone or style.
Providing examples.
Using constraints.

Generative Adversarial Networks (GANs)

GANs are a class of machine learning models designed to generate new and realistic data samples by putting a model called generator against a model called discriminator.

The generator aims to learn and replicate the data distribution of a specific dataset. In contrast, the discriminator’s objective is to determine whether presented data points are from the original dataset (real) or generated by the model (fake). In the process, the generator iteratively becomes better, fooling the discriminator and being able to produce data that is, likely, very similar to the original data point.

These generative models are used to create realistic images, videos or sounds, given that they have access to the real distributions they may want to model.

They were an important breakthrough in the AI world, specially because they’ve paved the way for current state of art models.

GAN Generated Images, based on the rightmost column — Image Source: Original GAN Paper

Variational Autoencoders (VAEs)

Variational autoencoders (VAEs) were early generative models that introduced the use of probability distributions for mapping data to and from a latent space.

An autoencoder comprises two main components: an encoder, which compresses the input data into a lower-dimensional latent representation, and a decoder, which reconstructs data from this latent space.

The learned latent space can then be sampled to generate new data similar to the training data.

Diffusion Models

Diffusion models are a class of generative models designed to generate complex data distributions, such as images, audio, or text, similar to GANs or VAEs. They differ from the last two because they work by gradually transforming simple noise distributions into structured outputs that resemble the target data. This is typically achieved through a series of incremental steps that progressively refine the noise into a meaningful sample.

Diffusion models have proven to be able to generate better data than variational autoenconders or GANS, being the current state-of-art in most tasks related to image and video generation.

Open-AI DALL-E 2 is powered by a Diffusion Model — https://openai.com/index/dall-e-2/

Attention Mechanism

The attention mechanism, introduced in the paper “Attention is All You Need” transformed neural network architectures by achieving a remarkable balance between performance and training speed.

This innovation significantly outperformed previous approaches, such as recurrent and feed-forward neural networks, on various evaluation metrics, demonstrating an amazing potential for processing sequential data.

Furthermore, attention mechanisms unified inference (the model’s internal processing) and generation (outputting results), marking a turning point in the NLP field. Using transformers (we’ll see them next) solved some of the largest issues that researchers were facing in both areas.

In simple terms, the attention mechanism is a mathematical approach to identifying which words in a sentence are most important for a given context. Unlike traditional methods that used fixed-size token windows and treated all context words as equally relevant, attention mechanisms mimic human language by assigning varying weights to different words within a sentence.

Transformer Models (GPT, BERT, etc.)

The Transformer model is a neural network architecture that relies entirely on the attention mechanism to draw relationships between different parts of an input sequence, such as words in a sentence. Unlike previous sequence models that processed data sequentially (like RNNs), Transformers process all parts of the input in parallel, allowing them to capture long-range dependencies.

At the heart of a Transformer are encoder and decoder layers, each containing multiple attention heads that allow the model to focus on different aspects of the input simultaneously. This enables the model to understand the context of words based on their relationships with other words in the sequence.

Transformers are an absolutely critical piece of GPT (Generative Pretrained Transformer), the model behind ChatGPT.

The Transformer Block — from the original “Attention is all you Need paper

Large Language Models (LLMs)

It’s difficult to define what is the threshold to consider one language model as a “large” one. But it’s becoming widely accepted that two important parameters may define them:

the model has been trained (at least) on most of the public internet available today.
The models contain weights in the order of billions.

The training of these large language models is only at the reach of big tech companies (and some startup challengers), as of beginning 2025. This should be something that will probably change in the next couple of years, with improvements in training techniques and hardware power.

Some examples of the most famous LLMs are GPT (OpenAI), LLAMA (Meta) or Claude (Anthropic) models.

Reinforcement Learning from Human Feedback (RLHF)

This reinforcement learning technique uses human feedback to give hints to the machine learning model about the best reward.

Reinforcement learning models typically refine their decision-making by interacting with the environment, which provides rewards based on their actions. Humans can enhance this process by offering feedback, providing the model with valuable insights into what works and what doesn’t.

RLHF has been specially used in the fine-tune of LLMs, specially as a way to safeguard the model from behaving harmfully.

Fine-Tuning

The process of tweaking a previously trained machine learning model to solve a specific problem. Particularly, with the age of Deep Learning, transfer learning has been shown to improve the performance of the neural network models on certain tasks by leveraging previously trained models.

One example is to pick up GPT 3.5 (an OpenAI LLM) and adjust its weights to be better at speaking with legal documents. This process can involve a new re-training of the GPT original models, in a way that its performance on retrieving and generating text regarding legal documents improves dramatically.

Text-to-Image Generation

The process of using multi-modal AI jump from a text prompt to an image. Typically, this involves generating a text embedding, which serves as a numerical representation of the prompt’s semantic meaning. This image generation tool can interpret the embedding and draw an image by sampling from a previous learned probability distribution, often trained on large datasets of paired text and images.

Data Augmentation

Deep learning models require a lot of data to learn effectively. To maximize their performance, data scientists often employ data augmentation techniques, artificially expanding the training data and improving model robustness.

In computer vision, for instance, a common augmentation strategy involves generating modified copies of existing images through rotations, flips, crops, or color adjustments. This exposes the model to a wider variety of perspectives and conditions, making it more useful to real-world scenarios.

Other examples include the creation of synthetic audio tracks for audio models or fake generated text to improve natural language processing pipelines.

But careful, using too much augmented data in the model training phase can have some dire consequences, such as bad performance in the real world.

Bias and Fairness

Since models are trained on real-world data, their predictions and internal weights can reflect and even amplify existing biases in the data.

Bias and fairness in machine learning models is an active area of research, with a lot of ethical discussion around it. How much should data scientists interfere with the training and application of AI models? And to what extent? These are questions that researchers are trying to answer everyday.

Consider a model trained on 1950s data to predict university attendance. Given the societal norms of that era, gender would likely be a dominant predictor, reflecting the then-prevalent expectation that women remain at home. This illustrates a crucial point: simply because a pattern exists in the data doesn’t justify its use in machine learning, especially when such models risk amplifying existing ethical biases.

AI Hallucination

This phenomenon occurs when generative AI models produce distorted ‘facts’ or information that has no basis in reality.

Hallucinations happen because these models sample from probability distributions. While the generated content may align mathematically with the patterns in the data, it can sometimes result in nonsensical or false information, as the model lacks an inherent understanding of factual accuracy.

Hallucination regarding Portugal — Image by Author, generated by ChatGPT

In the example above, I asked the model to invent facts about Portugal. Because I explicitly requested this, it might not be strictly defined as a hallucination. However, the model is still capable of confidently generating false information.

Copyright and Ownership Issues

Issues that stem from the usage and deployment of multi-modal AIs. For example, most of these tools may have learned from copyrighted content and although they don’t quote verbatim the text (or images) they have learned, there’s a lot of copyright issues associated with the usage of these models.

We have an ongoing discussion and it’s a critical one. This isn’t just a legal debate as it touches on the very nature of creativity, ownership, and the future of information. Some challenges for the usage of these models are related to their opacity, the role of inspiration in creative endeavours and the ethical considerations of profitting from models that have been trained on copyrighted material without compensating the original authors.

AI Safety and Alignment

Issues related with AI models are not only tied with copyright and ownership issues. How can we guarantee that a super-intelligent being will act in the best interest of mankind?

The debate surrounding AI safety and alignment is absolutely essential for the future. As AI systems become increasingly sophisticated and start to play an integral role in our future, several risks emerge. Some of those risks are seemingly drawn from science fiction, while others are disturbingly tangible — and the line between the two is becoming more blurry everyday. Some of these risks are:

Job displacement causes by Artificial Intelligence.
AI takes over, with humanity reduced to insignificance.
Weaponization of AI.

Model Interpretability

To improve the performance of machine learning models, data scientists gave up of most model interpretability (at least, for humans). While the first machine learning models were quite easy to understand when it comes to their outcomes (linear regression or decision trees), newer models are more difficult to comprehend and navigate.

For example, deep learning models have, normally, billions of weights and tunable parameters. It’s very hard for a human to grasp the decision process between features and prediction, something that makes this models very hard to interpret and translate into human like language and reality. Although we “know” how large language models work, there are still a lot of questions unanswered on how exactly they make their decisions and predictions.

So, model interpretability is an active field of research that has been very active, particularly as AI is embedded into critical sectors such as healthcare, banking or pharmaceutical.

On the classical machine learning world, some of the most important techniques developed regarding interpretability are LIME or SHAP.

The AI world is changing so fast that it’s becoming increasingly hard to keep up with all the concepts out there. I hope this blog post contributes to helping you understand some of the jargon and terminology of AI, at least on a high level.

Is there other topic/concept you would like to see approached in this glossary? Let me know in the comments!

The Data Journey by Ivo Bernardo

Discussion about this post