The goal is not to build AI that is smarter than us. The goal is to build AI we are smart enough to use well.
— Adapted from AI Fluency Principles01 The Model That Learned Finance
In 2023, Bloomberg built its own large language model from scratch: BloombergGPT. It was a 50-billion-parameter system trained on more than 700 billion tokens of data. Roughly half came from Bloomberg's proprietary archive of financial news, filings, analyst reports, and market commentary accumulated over two decades.
The outcome was predictable but instructive. BloombergGPT outperformed general-purpose models on highly specific financial tasks such as sentiment analysis and headline classification. It understood financial language because it was shaped by financial language.
Then something interesting happened.
The Plot Twist
When researchers compared BloombergGPT to GPT-4 — a general-purpose model trained on broader and more diverse data — GPT-4 outperformed it on many financial reasoning tasks.
The lesson is not about which product is better. The lesson is about how AI actually works.
Model performance depends on three variables: the architecture of the model, the data used to train it, and the method used to train it. Understanding these variables changes the quality of the questions you ask when AI proposals land on your desk.
02 What a Model Actually Is
An AI model is not just software. It is the output of a training process.
Think of it like an institutional playbook shaped by experience. An actuarial team, an investment desk, and a compliance function would each write very different playbooks — because they were trained on different data and optimized for different outcomes.
AI models are no different. A model contains millions or billions of learned numerical patterns that transform input into output.
The differences between models come down to three things:
- What information shaped the model
- Quality, volume, and relevance of training data
- Domain-specific vs. general-purpose sources
- How complex the model's structure is
- The number and arrangement of layers
- The computational design choices
Objective: What Was It Built to Do?
Every model is optimized to achieve a specific objective. A model trained to classify sentiment reads the world differently than one trained to generate text or detect fraud.
When your team talks about "foundation models," they mean large general-purpose systems trained on broad datasets. When they propose "fine-tuning," they mean adapting one of those systems to your specific data and context.
The executive question is never "Which model is best?" It is: "Which model fits our data, risk profile, and strategic objective?"
03 Parameters: Capacity Is Not Accuracy
Parameters are the internal numerical values that get adjusted during training. Large models may contain hundreds of billions — or even trillions — of them.
Each parameter captures a small pattern learned from data. Together, they encode the model's "experience."
The Bloomberg Lesson
A smaller model trained on high-quality domain data can outperform a much larger general-purpose model in specialized tasks. Bloomberg's experiment demonstrated that. GPT-4's broader success demonstrated the counterpoint.
Scale matters. But relevance matters more.
For leaders, parameter count signals computational cost and capability range, not guaranteed precision. The next time a vendor pitches you on a model with a trillion parameters, the right follow-up question is: "What data was it trained on, and how does that match our domain?"
04 How Models Learn
The training method determines what a model can do — and where it will fail. There are three dominant approaches.
Supervised Learning
Uses labeled examples. The model learns from historical decisions paired with known outcomes. This is the backbone of credit scoring, fraud detection, underwriting classification, and churn prediction.
If the labels are biased or flawed, the model will faithfully learn those flaws.
Unsupervised Learning
Looks for patterns without labels. It discovers clusters, anomalies, or hidden structures in data. This supports segmentation, anomaly detection, and portfolio risk discovery.
Its power is pattern discovery. Its weakness is interpretability — it finds patterns but can't always explain them.
Reinforcement Learning
Trains through feedback. The model takes actions, receives rewards or penalties, and optimizes over time. This underpins algorithmic trading, dynamic pricing, and large language model alignment.
Its risk lies in optimizing for the wrong objective if guardrails are weak.
If you know which training method underlies an AI tool, you can anticipate its strengths — and its blind spots.
05 Hallucination: Probability Is Not Truth
Language models generate responses by predicting the most probable next word in a sequence. They are optimized for coherence, not factual accuracy.
When signals in training data are weak or conflicting, the model does not say, "I don't know." It generates the most statistically plausible continuation.
That Is Hallucination
In financial services, this creates real risk. A model may confidently:
- Reference regulations that do not exist
- Fabricate case law
- Construct financial assumptions without grounding
- And present all of it with perfect confidence
This Is Not a Bug
Hallucination is not a bug that disappears with better prompts. It is structural. The model does not know when it is wrong.
Model generates plausible-sounding output
Optimized for coherence, not truth
Output passes the "sounds right" test
Confident language, logical structure, proper formatting
Unverified output enters decision workflow
Fabricated data reaches board decks, compliance filings, or client reports
Production systems using generative AI must include human verification, audit trails, and governance workflows.
06 The Black Box Problem
Advanced AI models are often opaque. With billions of parameters trained across vast datasets, it is difficult to trace specific outputs back to specific inputs.
This creates tension in regulated industries.
The Regulatory Reality
If a regulator asks why a loan was denied, "the model decided" is not defensible. Explainability and auditability are governance requirements.
There are emerging techniques, but the trade-off remains:
More complex models are typically harder to explain. Accuracy without transparency can still create regulatory exposure.
Leadership Responsibility
Leaders must factor explainability into deployment decisions, especially in lending, underwriting, claims, and compliance workflows. A high-performing model that cannot be explained is a liability, not an asset.
07 Executive Takeaways
Training data drives performance
Data governance is strategic infrastructure. The quality of what goes in determines the quality of what comes out. Invest in data before investing in models.
Model size does not equal model suitability
Fit matters more than scale. A focused model trained on relevant data will often outperform a giant trained on everything. Ask about data, not just parameter count.
Training method predicts failure mode
Understand what type of learning underpins the system. Supervised, unsupervised, and reinforcement learning each have distinct strengths and blind spots.
Hallucination is structural
Human review is mandatory in high-impact workflows. The model does not know when it is wrong. Build verification into every production pipeline.
Explainability is not optional in regulated industries
Accuracy without transparency creates regulatory exposure. Factor interpretability into every model selection decision.
References
- Wu et al., BloombergGPT: A Large Language Model for Finance, 2023
- OpenAI, GPT-4 Technical Report, 2023
- Ouyang et al., Training Language Models to Follow Instructions with Human Feedback, 2022