Generative AI is accelerating change in our businesses and lives - what exactly is it?
In this series I have talked about the urgency to make yourself aware of what is happening with AI in our world so you can understand it and take action (even if that means a conscious decision to "do nothing" – which I hope is not happening). I also explained that as humans we use models to understand our world and help us process inputs as we experience things and create outputs (such as understanding). Models are central to the way that generative AI, a very popular AI technology, works.
In this post we are going to explore generative AI and its mechanisms. This post will help you understand what is going on inside this new branch of AI so you have a basic understanding of what it is, how it works, and its requirements and limitations.
As a business or tech leader in an accelerating world that is fast-adopting AI, this foundational knowledge is critical to what you do, and understanding it will help you make better decisions about AI investments and use cases. Even if you are non-technical you will be able to understand this post.
What’s in a name? The meaning of the term generative AI
The term "generative" comes from the word "generate", meaning to create or produce, while “AI” in this instance refers to a class of artificial intelligence systems designed to create new content, rather than just analyze or classify existing data. Let’s go one level deeper:
"Generative": In machine learning (ML), a type of processing in AI systems, this term has long been used to describe models that can generate data similar to what they were trained on. For example, a generative model trained on images of dogs can produce new, realistic looking dog images.
"AI": Artificial Intelligence, which refers to the broad field focused on machines that can perform tasks requiring human-like intelligence.
So, generative AI refers to AI that can create new digital things, such as text, images, music, video, code, and more.
The name "ChatGPT" used in the very popular family of generative AI products from OpenAI reflects this. GPT is an acronym for Generative Pre-trained Transformer, emphasizing its ability to generate text using a transformer architecture. There is a very important word in the GTP acronym: Transformer. Let’s explore what that word refers to.
What Is a Transformer?
Think of a transformer as a very smart assistant that can read and understand context in a conversation or document—no matter how long it is. Here’s a simple analogy: reading like a human editor.
Imagine you're editing a report. You don’t just read one word at a time—you look at:
The whole sentence
The paragraph
Even the tone and topic of the entire document
This helps you understand what each word means in context.
A transformer model does the same thing—but with math. It looks at all the words at once and figures out:
Which words are most important
How they relate to each other
What should come next
How a transformer works (simplified for illustration)
Input: You give the AI a sentence or “prompt” – a question or comment to which you want a response.
Attention mechanism: It scans the entire input and decides which words matter most for understanding each part.
Prediction: It uses that understanding to generate the next word, sentence, or even image—depending on the model – that will form the output response.
Why transformers matter for AI models
Transformers are the engine behind generative AI. They power tools that:
Write reports and emails
Summarize documents
Generate images or presentations
Analyze customer feedback
Even assist in drug discovery or medical imaging
Generative AI has amazing capabilities related to text processing: Large Language Models (LLMs)
As I mentioned in my last post, as humans we learn about and employ models of language, like grammar, to help us write content that is accurate, easy to read, and well punctuated. Generative AI uses models as well when it generates text, and a very important model type, large language models (LLMs), is becoming part of our everyday business language.
Be careful when you see the term LLM as it may be used in various ways. Sometimes it is used as a type of use case, other times as the name of a model, or yet still as a technical architecture description. Let me explain.
LLMs are one of several technology architectures that generative AI can be built on, specifically designed to generate and understand text. We can also think of LLMs as both a model type and use case.
Here is why the distinction is important: LLMs as a use case refers to what the model is used for (e.g., writing an email, generating a photo, composing music), while LLM as a model type refers to underlying architecture used (e.g., Generative Adversarial Networks (GANs), diffusion models, or others – explained below). So, the term LLM does not just refer to a use case—they are a core model type within the generative AI ecosystem.
As a model type, LLMs like GPT (from OpenAI), Claude (from Anthropic), and LLaMA (from Meta) are built using transformer architectures. They are each uniquely trained on massive text datasets to generate human-like language. They are generative because they can produce new content—sentences, paragraphs, code, etc.—based on learned patterns.
Humans learn to process text and create content differently than computers using AI
Humans typically learn language through a combination of:
Explicit instruction (e.g., grammar rules in school)
Implicit learning (e.g., hearing and using language in context)
Cognitive frameworks that help us reason about meaning, intention, and structure
We often consciously apply grammar rules to write clearly and correctly. LLMs, on the other hand:
Don’t use explicit grammar rules
Learn by processing vast amounts of text and identifying statistical patterns
Use those patterns to predict the most likely next word or phrase in a given context
So instead of “knowing” grammar, LLMs model language behavior—they learn what looks grammatical and meaningful based on what they’ve seen, not by understanding the rules in a human sense. We can say that LLMs implicitly learn grammar through exposure to massive amounts of text. During training, the model sees billions of examples of how words, phrases, and sentences are structured. This allows it to:
Learn syntax (sentence structure)
Understand morphology (word forms)
Capture semantics (meaning)
Model pragmatics (contextual use)
Rather than applying fixed grammar rules, LLMs predict the next word (or token) based on patterns learned from data. In this sense grammar, for an AI, is emergent—it's a byproduct of statistical learning, not a separate module.
But LLMs incorporate several additional mechanisms and concepts beyond grammar:
Transformer Architecture - the backbone of modern LLMs - uses attention mechanisms to weigh the importance of different words in context. This enables understanding of long-range dependencies in text.
Tokenization Models - breaks text into manageable pieces (tokens). This helps the model handle different languages, punctuation, and even code.
Embedding Models - converts tokens into high-dimensional vectors. This captures semantic relationships (e.g., "king" is to "queen" as "man" is to "woman").
Positional Encoding - adds information about word order. This is crucial for understanding sentence structure and meaning.
Fine-Tuning and Instruction Models - tailors the base model to specific tasks (e.g., summarization, translation). Uses curated datasets and human feedback to improve performance.
Reinforcement Learning from Human Feedback (RLHF) - helps align model outputs with human preferences. Improves helpfulness, safety, and factual accuracy.
“In summary, think of it this way: humans write by using grammar models to apply rules they’ve learned. LLMs write by mimicking the patterns of language they’ve statistically absorbed. Both use models to achieve their goals, and their different approaches can produce excellent writing—but they arrive there through very different paths.”
Model “training”: what does that mean?
A concept mentioned in my posts that hasn't yet been explained is "model training". LLMs require vast amounts of text to effectively train a model. A controversial issue with model training is that the text that is used needs to be in a digital format, and that the primary source of the data is the Internet. This is one of the most debated aspects of generative AI development. The controversy around using Internet-based text for training large language models (LLMs) involves a complex mix of legal, ethical, technical, and societal factors.
Source: Microsoft Designer custom prompt
Let’s look at the factors in the controversy:
Copyright and intellectual property
The issue: Much of the Internet’s content is copyrighted (e.g., books, articles, blogs).
Concern: AI models may be trained on this content without permission or compensation to creators.
Debate: Is training on copyrighted data “fair use” or infringement?
Evaluation approach: Legal review, licensing audits
Possible resolution strategy: Licensing frameworks, creator compensation, opt-out tools
Consent and Data Ownership
The issue: Authors, publishers, and users often don’t know their content is being used.
Concern: Lack of transparency and consent undermines trust.
Emerging solutions: “Do Not Train” tags, opt-out registries, and licensing platforms.
Evaluation approach: Stakeholder engagement, transparency audits
Stakeholder engagement, transparency audits
Possible resolution strategy: Consent-based data collection, public registries
Bias and Representativeness
The issue: Internet data reflects societal biases and may overrepresent certain voices or cultures.
Concern: Models may reinforce stereotypes or marginalize underrepresented groups.
Mitigation: Bias audits, curated datasets, and inclusive data sourcing.
Evaluation approach: Bias testing, demographic analysis
Possible resolution strategy: Diverse data sourcing, fairness tuning
Misinformation and Quality Control
The issue: The Internet contains both high-quality and misleading content.
Concern: Models may learn and reproduce false or harmful information.
Approach: Filtering, fact-checking, and post-training alignment.
Evaluation approach: Content quality scoring, source validation
Possible resolution strategy: Curated datasets, post-training alignment
Transparency and Accountability
The issue: Users and regulators often don’t know what data was used.
Concern: Lack of explainability and traceability.
Trend: Model documentation (e.g., data cards, model cards) and regulatory disclosures.
Evaluation approach: Model/data documentation, third-party audits
Possible resolution strategy: Open model cards, regulatory compliance
What is model context?
When speaking of LLMs the term context is referred to frequently. The term "context" is absolutely central to how large language models (LLMs) like GPT, Claude, and others function. In LLMs, context refers to the textual information the model has access to at any given time when generating a response. This includes:
The prompt you give it
The conversation history
Any instructions or formatting cues
Sometimes even images or documents (in multimodal models)
Context is important for the following reasons…
Understanding meaning - words and phrases can have different meanings depending on context. For example, “Bank” could mean a financial institution or the side of a river. Context helps the model choose the right interpretation.
Maintaining coherence - In conversations or long documents, context allows the model to stay on topic, refer back to earlier points, and avoid repeating itself.
Personalization - context allows the model to adapt tone and style, remember user preferences (within a session or across sessions, if allowed).
Instruction following - LLMs use context to understand what task they’re being asked to do (e.g., summarize, translate, write code), how to format the output
All of this “context” is important (pun intended) as LLMs process input as a sequence of tokens (words or subwords). The model uses attention mechanisms to weigh the importance of each token relative to others. The context window defines how many tokens the model can “see” at once (e.g., GPT-4 can handle up to 128,000 tokens in some versions). The sizing has inherent limitations to it (bigger is generally better):
Context window size – the model can only remember a finite number of tokens at once
True memory - unless designed to retain memory across sessions, the model forgets past interactions.
Overload - too much or poorly structured context can confuse the model.
Context sizing is important and is constantly growing in new LLMs, and as a result these limitations are being addressed.
Generative AI is a broad category of capabilities, not just text processing
The first wave of commercial generative AI solutions—like ChatGPT, Jasper, and Copy.ai—primarily focused on text generation, including:
Writing articles, blogs, and marketing copy
Summarizing documents
Translating languages
Answering questions and generating code
These applications leveraged the strength of language models trained on massive text corpora.
Context sizing is important and, as mentioned above, is constantly growing in new LLMs. Today, these AI models have evolved to handle multiple types of data, not just text. These are called multimodal models, and they can process and generate:
Images – using Generative Adversarial Networks (GANs) (a type of machine learning model introduced by Ian Goodfellow and his team in 2014, especially known for its ability to generate realistic synthetic data, such as images, videos, and audio.)
Image generation (commercial products include: DALL·E, Midjourney, Stable Diffusion)
Image editing and in-painting
Image captioning and understanding
Video – using diffusion models (diffusion models learn to generate data by reversing a gradual noising process)
Generating short video clips from text prompts
Editing or enhancing video content
Synthesizing realistic avatars or animations
Audio – using audio transforms for sound
Text-to-speech (e.g., ElevenLabs, Amazon Polly)
Music generation (e.g., Suno, AIVA)
Voice cloning and dubbing
Data and Code – using Variational Autoencoders, which are a type of generative AI model that’s especially useful for learning compressed representations of data and generating new samples that resemble the original data.
Code generation (e.g., GitHub Copilot)
Data analysis and visualization
Synthetic data generation for training or testing
Under the hood (another example of an idiom mentioned in my second post on AI) these capabilities are powered by specialized architectures or extensions of transformer models, trained on different types of data:
Vision transformers (ViTs) for images
Audio transformers for sound
Multimodal transformers (like GPT-4V or Gemini) that combine text, image, and more
Looking forward
Generative AI is poised for significant expansion in both capability and impact in the months ahead. Based on current trajectories and emerging technologies, here’s what we are already seeing and can expect to see further advances in as the industry marches forward.
Multimodal Mastery
We are already seeing that AI can seamlessly integrate text, image, video, audio, and even 3D data.
Example: A single prompt can generate a narrated video with visuals, music, and subtitles.
Autonomous AI Agents
AI systems are already evolving from passive assistants to active agents that can plan, reason, and execute tasks across platforms. This is a topic further explained by the concept of Agentic AI (which I’ll cover in a future post).
Example: An AI that books travel, negotiates contracts, or manages lab workflows.
Real-Time Personalization
AI will continue to adapt instantly to user preferences, tone, and context—across devices and applications.
Example: Personalized medical summaries or investor briefings tailored to the reader’s expertise.
Domain-Specific Intelligence
Models will be fine-tuned for specialized fields like biotech, law, finance, and education.
Example: AI that understands molecular pathways or clinical trial protocols.
On-Device Generative AI
Smaller, efficient models will run directly on phones, wearables, and edge devices.
Benefits: Faster responses, better privacy, offline functionality.
Synthetic Data and Simulation
AI will generate realistic synthetic data for training, testing, and modeling—especially valuable in regulated industries.
Example: Simulated patient data for drug trials or digital twins for organs.
Human-AI Collaboration Tools
AI’s ability as a co-creator, helping professionals brainstorm, design, and iterate, will expand dramatically.
Example: Product designers can cut development times.
Ethical and Transparent AI
Advances in explainability, fairness, and governance will make AI safer and more trustworthy.
Example: AI that can justify its decisions in clinical or legal settings.
Generative AI in Education and Training
AI’s ability to create adaptive learning environments, simulations, and personalized tutoring will expand.
Example: Medical students practicing diagnosis with AI-generated patient scenarios.
Integration with Robotics and IoT
Generative AI will power intelligent machines that can interact with the physical world.
Example: AI-driven lab robots or smart manufacturing systems.
As this series continues we'll look at the vast array of commercial solutions that are appearing and build an understanding of how they are affecting the way we learn, work, and live our daily lives.