Generative AI is accelerating change in our businesses and lives - what exactly is it?

In this series I have talked about the urgency to make yourself aware of what is happening with AI in our world so you can understand it and take action (even if that means a conscious decision to "do nothing" – which I hope is not happening). I also explained that as humans we use models to understand our world and help us process inputs as we experience things and create outputs (such as understanding). Models are central to the way that generative AI, a very popular AI technology, works.

In this post we are going to explore generative AI and its mechanisms. This post will help you understand what is going on inside this new branch of AI so you have a basic understanding of what it is, how it works, and its requirements and limitations.

As a business or tech leader in an accelerating world that is fast-adopting AI, this foundational knowledge is critical to what you do, and understanding it will help you make better decisions about AI investments and use cases. Even if you are non-technical you will be able to understand this post.

What’s in a name? The meaning of the term generative AI

The term "generative" comes from the word "generate", meaning to create or produce, while “AI” in this instance refers to a class of artificial intelligence systems designed to create new content, rather than just analyze or classify existing data. Let’s go one level deeper:

"Generative": In machine learning (ML), a type of processing in AI systems, this term has long been used to describe models that can generate data similar to what they were trained on. For example, a generative model trained on images of dogs can produce new, realistic looking dog images.

"AI": Artificial Intelligence, which refers to the broad field focused on machines that can perform tasks requiring human-like intelligence.

So, generative AI refers to AI that can create new digital things, such as text, images, music, video, code, and more.

The name "ChatGPT" used in the very popular family of generative AI products from OpenAI reflects this. GPT is an acronym for Generative Pre-trained Transformer, emphasizing its ability to generate text using a transformer architecture. There is a very important word in the GTP acronym: Transformer. Let’s explore what that word refers to.

What Is a Transformer?

Think of a transformer as a very smart assistant that can read and understand context in a conversation or document—no matter how long it is. Here’s a simple analogy: reading like a human editor.

Imagine you're editing a report. You don’t just read one word at a time—you look at:

  • The whole sentence

  • The paragraph

  • Even the tone and topic of the entire document


This helps you understand what each word means in context.

A transformer model does the same thing—but with math. It looks at all the words at once and figures out:

  • Which words are most important

  • How they relate to each other

  • What should come next


How a transformer works (simplified for illustration)

  • Input: You give the AI a sentence or “prompt” – a question or comment to which you want a response.

  • Attention mechanism: It scans the entire input and decides which words matter most for understanding each part.

  • Prediction: It uses that understanding to generate the next word, sentence, or even image—depending on the model – that will form the output response.


Why transformers matter for AI models

Transformers are the engine behind generative AI. They power tools that:

  • Write reports and emails

  • Summarize documents

  • Generate images or presentations

  • Analyze customer feedback

  • Even assist in drug discovery or medical imaging


Generative AI has amazing capabilities related to text processing: Large Language Models (LLMs)

As I mentioned in my last post, as humans we learn about and employ models of language, like grammar, to help us write content that is accurate, easy to read, and well punctuated. Generative AI uses models as well when it generates text, and a very important model type, large language models (LLMs), is becoming part of our everyday business language.

Be careful when you see the term LLM as it may be used in various ways. Sometimes it is used as a type of use case, other times as the name of a model, or yet still as a technical architecture description. Let me explain.

LLMs are one of several technology architectures that generative AI can be built on, specifically designed to generate and understand text. We can also think of LLMs as both a model type and use case.

Here is why the distinction is important: LLMs as a use case refers to what the model is used for (e.g., writing an email, generating a photo, composing music), while LLM as a model type refers to underlying architecture used (e.g., Generative Adversarial Networks (GANs), diffusion models, or others – explained below). So, the term LLM does not just refer to a use case—they are a core model type within the generative AI ecosystem.

As a model type, LLMs like GPT (from OpenAI), Claude (from Anthropic), and LLaMA (from Meta) are built using transformer architectures. They are each uniquely trained on massive text datasets to generate human-like language. They are generative because they can produce new content—sentences, paragraphs, code, etc.—based on learned patterns.

Humans learn to process text and create content differently than computers using AI

Humans typically learn language through a combination of:

  • Explicit instruction (e.g., grammar rules in school)

  • Implicit learning (e.g., hearing and using language in context)

  • Cognitive frameworks that help us reason about meaning, intention, and structure


We often consciously apply grammar rules to write clearly and correctly. LLMs, on the other hand:

  • Don’t use explicit grammar rules

  • Learn by processing vast amounts of text and identifying statistical patterns

  • Use those patterns to predict the most likely next word or phrase in a given context


So instead of “knowing” grammar, LLMs model language behavior—they learn what looks grammatical and meaningful based on what they’ve seen, not by understanding the rules in a human sense. We can say that LLMs implicitly learn grammar through exposure to massive amounts of text. During training, the model sees billions of examples of how words, phrases, and sentences are structured. This allows it to:

  • Learn syntax (sentence structure)

  • Understand morphology (word forms)

  • Capture semantics (meaning)

  • Model pragmatics (contextual use)


Rather than applying fixed grammar rules, LLMs predict the next word (or token) based on patterns learned from data. In this sense grammar, for an AI, is emergent—it's a byproduct of statistical learning, not a separate module.

But LLMs incorporate several additional mechanisms and concepts beyond grammar:

  1. Transformer Architecture - the backbone of modern LLMs - uses attention mechanisms to weigh the importance of different words in context. This enables understanding of long-range dependencies in text.

  2. Tokenization Models - breaks text into manageable pieces (tokens). This helps the model handle different languages, punctuation, and even code.

  3. Embedding Models - converts tokens into high-dimensional vectors. This captures semantic relationships (e.g., "king" is to "queen" as "man" is to "woman").

  4. Positional Encoding - adds information about word order. This is crucial for understanding sentence structure and meaning.

  5. Fine-Tuning and Instruction Models - tailors the base model to specific tasks (e.g., summarization, translation). Uses curated datasets and human feedback to improve performance.

  6. Reinforcement Learning from Human Feedback (RLHF) - helps align model outputs with human preferences. Improves helpfulness, safety, and factual accuracy.

In summary, think of it this way: humans write by using grammar models to apply rules they’ve learned. LLMs write by mimicking the patterns of language they’ve statistically absorbed. Both use models to achieve their goals, and their different approaches can produce excellent writing—but they arrive there through very different paths.

Model “training”: what does that mean?

A concept mentioned in my posts that hasn't yet been explained is "model training". LLMs require vast amounts of text to effectively train a model. A controversial issue with model training is that the text that is used needs to be in a digital format, and that the primary source of the data is the Internet. This is one of the most debated aspects of generative AI development. The controversy around using Internet-based text for training large language models (LLMs) involves a complex mix of legal, ethical, technical, and societal factors.

Source: Microsoft Designer custom prompt

Let’s look at the factors in the controversy:

Copyright and intellectual property

  • The issue: Much of the Internet’s content is copyrighted (e.g., books, articles, blogs).

  • Concern: AI models may be trained on this content without permission or compensation to creators.

  • Debate: Is training on copyrighted data “fair use” or infringement?

  • Evaluation approach: Legal review, licensing audits

  • Possible resolution strategy: Licensing frameworks, creator compensation, opt-out tools

Consent and Data Ownership

  • The issue: Authors, publishers, and users often don’t know their content is being used.

  • Concern: Lack of transparency and consent undermines trust.

  • Emerging solutions: “Do Not Train” tags, opt-out registries, and licensing platforms.

  • Evaluation approach: Stakeholder engagement, transparency audits

  • Stakeholder engagement, transparency audits

  • Possible resolution strategy: Consent-based data collection, public registries

Bias and Representativeness

  • The issue: Internet data reflects societal biases and may overrepresent certain voices or cultures.

  • Concern: Models may reinforce stereotypes or marginalize underrepresented groups.

  • Mitigation: Bias audits, curated datasets, and inclusive data sourcing.

  • Evaluation approach: Bias testing, demographic analysis

  • Possible resolution strategy: Diverse data sourcing, fairness tuning

Misinformation and Quality Control

  • The issue: The Internet contains both high-quality and misleading content.

  • Concern: Models may learn and reproduce false or harmful information.

  • Approach: Filtering, fact-checking, and post-training alignment.

  • Evaluation approach: Content quality scoring, source validation

  • Possible resolution strategy: Curated datasets, post-training alignment

Transparency and Accountability

  • The issue: Users and regulators often don’t know what data was used.

  • Concern: Lack of explainability and traceability.

  • Trend: Model documentation (e.g., data cards, model cards) and regulatory disclosures.

  • Evaluation approach: Model/data documentation, third-party audits

  • Possible resolution strategy: Open model cards, regulatory compliance

What is model context?

When speaking of LLMs the term context is referred to frequently. The term "context" is absolutely central to how large language models (LLMs) like GPT, Claude, and others function. In LLMs, context refers to the textual information the model has access to at any given time when generating a response. This includes:

  • The prompt you give it

  • The conversation history

  • Any instructions or formatting cues

  • Sometimes even images or documents (in multimodal models)

Context is important for the following reasons…

  1. Understanding meaning - words and phrases can have different meanings depending on context. For example, “Bank” could mean a financial institution or the side of a river. Context helps the model choose the right interpretation.

  2. Maintaining coherence - In conversations or long documents, context allows the model to stay on topic, refer back to earlier points, and avoid repeating itself.

  3. Personalization - context allows the model to adapt tone and style, remember user preferences (within a session or across sessions, if allowed).

  4. Instruction following - LLMs use context to understand what task they’re being asked to do (e.g., summarize, translate, write code), how to format the output

All of this “context” is important (pun intended) as LLMs process input as a sequence of tokens (words or subwords). The model uses attention mechanisms to weigh the importance of each token relative to others. The context window defines how many tokens the model can “see” at once (e.g., GPT-4 can handle up to 128,000 tokens in some versions). The sizing has inherent limitations to it (bigger is generally better):

  1. Context window size – the model can only remember a finite number of tokens at once

  2. True memory -  unless designed to retain memory across sessions, the model forgets past interactions.

  3. Overload - too much or poorly structured context can confuse the model.

Context sizing is important and is constantly growing in new LLMs, and as a result these limitations are being addressed.

Generative AI is a broad category of capabilities, not just text processing

The first wave of commercial generative AI solutions—like ChatGPT, Jasper, and Copy.ai—primarily focused on text generation, including:

  • Writing articles, blogs, and marketing copy

  • Summarizing documents

  • Translating languages

  • Answering questions and generating code

These applications leveraged the strength of language models trained on massive text corpora.

Context sizing is important and, as mentioned above, is constantly growing in new LLMs. Today, these AI models have evolved to handle multiple types of data, not just text. These are called multimodal models, and they can process and generate:

Images – using Generative Adversarial Networks (GANs) (a type of machine learning model introduced by Ian Goodfellow and his team in 2014, especially known for its ability to generate realistic synthetic data, such as images, videos, and audio.)

  • Image generation (commercial products include: DALL·E, Midjourney, Stable Diffusion)

  • Image editing and in-painting

  • Image captioning and understanding

Video – using diffusion models (diffusion models learn to generate data by reversing a gradual noising process)

  • Generating short video clips from text prompts

  • Editing or enhancing video content

  • Synthesizing realistic avatars or animations

Audio – using audio transforms for sound

  • Text-to-speech (e.g., ElevenLabs, Amazon Polly)

  • Music generation (e.g., Suno, AIVA)

  • Voice cloning and dubbing

Data and Code – using Variational Autoencoders, which are a type of generative AI model that’s especially useful for learning compressed representations of data and generating new samples that resemble the original data.

  • Code generation (e.g., GitHub Copilot)

  • Data analysis and visualization

  • Synthetic data generation for training or testing

Under the hood (another example of an idiom mentioned in my second post on AI) these capabilities are powered by specialized architectures or extensions of transformer models, trained on different types of data:

  • Vision transformers (ViTs) for images

  • Audio transformers for sound

  • Multimodal transformers (like GPT-4V or Gemini) that combine text, image, and more

Looking forward

Generative AI is poised for significant expansion in both capability and impact in the months ahead. Based on current trajectories and emerging technologies, here’s what we are already seeing and can expect to see further advances in as the industry marches forward.

Multimodal Mastery

We are already seeing that AI can seamlessly integrate text, image, video, audio, and even 3D data.

Example: A single prompt can generate a narrated video with visuals, music, and subtitles.

Autonomous AI Agents

AI systems are already evolving from passive assistants to active agents that can plan, reason, and execute tasks across platforms. This is a topic further explained by the concept of Agentic AI (which I’ll cover in a future post).

Example: An AI that books travel, negotiates contracts, or manages lab workflows.

Real-Time Personalization

AI will continue to adapt instantly to user preferences, tone, and context—across devices and applications.

Example: Personalized medical summaries or investor briefings tailored to the reader’s expertise.

Domain-Specific Intelligence

Models will be fine-tuned for specialized fields like biotech, law, finance, and education.

Example: AI that understands molecular pathways or clinical trial protocols.

On-Device Generative AI

Smaller, efficient models will run directly on phones, wearables, and edge devices.

Benefits: Faster responses, better privacy, offline functionality.

Synthetic Data and Simulation

AI will generate realistic synthetic data for training, testing, and modeling—especially valuable in regulated industries.

Example: Simulated patient data for drug trials or digital twins for organs.

Human-AI Collaboration Tools

AI’s ability as a co-creator, helping professionals brainstorm, design, and iterate, will expand dramatically.

Example: Product designers can cut development times.

Ethical and Transparent AI

Advances in explainability, fairness, and governance will make AI safer and more trustworthy.

Example: AI that can justify its decisions in clinical or legal settings.

Generative AI in Education and Training

AI’s ability to create adaptive learning environments, simulations, and personalized tutoring will expand.

Example: Medical students practicing diagnosis with AI-generated patient scenarios.

Integration with Robotics and IoT

Generative AI will power intelligent machines that can interact with the physical world.

Example: AI-driven lab robots or smart manufacturing systems.

As this series continues we'll look at the vast array of commercial solutions that are appearing and build an understanding of how they are affecting the way we learn, work, and live our daily lives.

Previous
Previous

Series 1: Leadership in the Age of AI and Acceleration

Next
Next

How you sense the world can help you understand AI