What Makes AI Smart? Understanding Models, Data & Training

The companies that will win with AI are not those with the most data, but those with the clearest thinking about what problems they're solving.

— Andrew Ng, AI Fund

01 The Model That Learned Finance

In 2023, Bloomberg built its own large language model from scratch: BloombergGPT. It was a 50-billion-parameter system trained on more than 700 billion tokens of data. Roughly half came from Bloomberg's proprietary archive of financial news, filings, analyst reports, and market commentary accumulated over two decades.

50B
parameters in BloombergGPT, trained on 700 billion tokens of financial and general data

The outcome was predictable but instructive. BloombergGPT outperformed general-purpose models on highly specific financial tasks such as sentiment analysis and headline classification. It understood financial language because it was shaped by financial language.

Then something interesting happened.

⚡

The Plot Twist

When researchers compared BloombergGPT to GPT-4 — a general-purpose model trained on broader and more diverse data — GPT-4 outperformed it on many financial reasoning tasks.

The lesson is not about which product is better. The lesson is about how AI actually works.

Model performance depends on three variables: the architecture of the model, the data used to train it, and the method used to train it. Understanding these variables changes the quality of the questions you ask when AI proposals land on your desk.

◈

02 What a Model Actually Is

An AI model is not just software. It is the output of a training process.

💡 Think of it this way

Think of it like an institutional playbook shaped by experience. An actuarial team, an investment desk, and a compliance function would each write very different playbooks — because they were trained on different data and optimized for different outcomes.

AI models are no different. A model contains millions or billions of learned numerical patterns that transform input into output.

The differences between models come down to three things:

📊 Data

What information shaped the model
Quality, volume, and relevance of training data
Domain-specific vs. general-purpose sources

🏗️ Architecture

How complex the model's structure is
The number and arrangement of layers
The computational design choices

🎯

Objective: What Was It Built to Do?

Every model is optimized to achieve a specific objective. A model trained to classify sentiment reads the world differently than one trained to generate text or detect fraud.

When your team talks about "foundation models," they mean large general-purpose systems trained on broad datasets. When they propose "fine-tuning," they mean adapting one of those systems to your specific data and context.

The executive question is never "Which model is best?" It is: "Which model fits our data, risk profile, and strategic objective?"

◈

03 Parameters: Capacity Is Not Accuracy

Parameters are the internal numerical values that get adjusted during training. Large models may contain hundreds of billions — or even trillions — of them.

Each parameter captures a small pattern learned from data. Together, they encode the model's "experience."

The Parameter Paradox

More Parameters Greater capacity for nuance Higher compute cost

≠

Better Performance For your specific use case Relevance wins

🎯

The Bloomberg Lesson

A smaller model trained on high-quality domain data can outperform a much larger general-purpose model in specialized tasks. Bloomberg's experiment demonstrated that. GPT-4's broader success demonstrated the counterpoint.

Scale matters. But relevance matters more.

For leaders, parameter count signals computational cost and capability range, not guaranteed precision. The next time a vendor pitches you on a model with a trillion parameters, the right follow-up question is: "What data was it trained on, and how does that match our domain?"

◈

04 How Models Learn

The training method determines what a model can do — and where it will fail. There are three dominant approaches.

📋

Supervised Learning

Uses labeled examples. The model learns from historical decisions paired with known outcomes. This is the backbone of credit scoring, fraud detection, underwriting classification, and churn prediction.

Strength: Precision with quality labels

Watch out

If the labels are biased or flawed, the model will faithfully learn those flaws.

🔍

Unsupervised Learning

Looks for patterns without labels. It discovers clusters, anomalies, or hidden structures in data. This supports segmentation, anomaly detection, and portfolio risk discovery.

Strength: Discovering the unknown

Watch out

Its power is pattern discovery. Its weakness is interpretability — it finds patterns but can't always explain them.

🎮

Reinforcement Learning

Trains through feedback. The model takes actions, receives rewards or penalties, and optimizes over time. This underpins algorithmic trading, dynamic pricing, and large language model alignment.

Strength: Optimization through trial

Watch out

Its risk lies in optimizing for the wrong objective if guardrails are weak.

If you know which training method underlies an AI tool, you can anticipate its strengths — and its blind spots.

◈

05 Hallucination: Probability Is Not Truth

Language models generate responses by predicting the most probable next word in a sequence. They are optimized for coherence, not factual accuracy.

When signals in training data are weak or conflicting, the model does not say, "I don't know." It generates the most statistically plausible continuation.

🎭

That Is Hallucination

In financial services, this creates real risk. A model may confidently:

Reference regulations that do not exist
Fabricate case law
Construct financial assumptions without grounding
And present all of it with perfect confidence

⚠

This Is Not a Bug

Hallucination is not a bug that disappears with better prompts. It is structural. The model does not know when it is wrong.

Model generates plausible-sounding output

Optimized for coherence, not truth

↓

Output passes the "sounds right" test

Confident language, logical structure, proper formatting

↓

Unverified output enters decision workflow

Fabricated data reaches board decks, compliance filings, or client reports

Production systems using generative AI must include human verification, audit trails, and governance workflows.

◈

06 The Black Box Problem

Advanced AI models are often opaque. With billions of parameters trained across vast datasets, it is difficult to trace specific outputs back to specific inputs.

This creates tension in regulated industries.

🏛️

The Regulatory Reality

If a regulator asks why a loan was denied, "the model decided" is not defensible. Explainability and auditability are governance requirements.

🔬 Feature Attribution

👁️ Attention Analysis

🧪 Model Distillation

⚖️ Emerging but Incomplete

There are emerging techniques, but the trade-off remains:

More complex models are typically harder to explain. Accuracy without transparency can still create regulatory exposure.

🎯

Leadership Responsibility

Leaders must factor explainability into deployment decisions, especially in lending, underwriting, claims, and compliance workflows. A high-performing model that cannot be explained is a liability, not an asset.

◈

07 Executive Takeaways

Training data drives performance

Data governance is strategic infrastructure. The quality of what goes in determines the quality of what comes out. Invest in data before investing in models.

Model size does not equal model suitability

Fit matters more than scale. A focused model trained on relevant data will often outperform a giant trained on everything. Ask about data, not just parameter count.

Training method predicts failure mode

Understand what type of learning underpins the system. Supervised, unsupervised, and reinforcement learning each have distinct strengths and blind spots.

Hallucination is structural

Human review is mandatory in high-impact workflows. The model does not know when it is wrong. Build verification into every production pipeline.

Explainability is not optional in regulated industries

Accuracy without transparency creates regulatory exposure. Factor interpretability into every model selection decision.

Coming Next

Article 3: The Anatomy of an LLM

What It Really Is and How It Works

Large language models are often described as if they are digital minds. They are not. In the next article, we examine what an LLM actually is at a structural level. What is a transformer architecture? What role do tokens play? What happens between input and output? Understanding the anatomy eliminates both mystique and misplaced fear.

Read Article 3

References

Wu et al., BloombergGPT: A Large Language Model for Finance, 2023
OpenAI, GPT-4 Technical Report, 2023
Ouyang et al., Training Language Models to Follow Instructions with Human Feedback, 2022