Prediction Markets: Probability of Jerome Powell using Kalshi words in FOMC speech.
Essentially, what we are looking to determine the likelyhood Jerome Powell will say certian words.
Keep in mind that people expect the Fed to maintain rates as they are.
Data
We sourced the files directly from the FOMC website. These are downloaded as PDF’s, opened and parsed. Most of this was done by refactoring an existing repo (noted just below)
Our data includes speeches from 2020 to present. 43 files to be exact.
Note, Q&A is included. Though, if the word is in the question it is not counted. Only words from Jerome Powell are used.
Code Overview
This script analyzes Federal Reserve press conference transcripts to measure how often specific words (defined by a Kalshi prediction market) are spoken by Chair Jerome Powell, then aggregates those results and saves them to a CSV file.
Step-by-Step Description
1. Imports and Constants
- Imports standard libraries for regex processing, dates, and file handling.
- Imports custom Kalshi helper functions (
event,save_csv). - Defines Kalshi tickers for a word-based prediction market related to Fed mentions.
import re
import datetime
from pathlib import Path
from kalshi_functions import save_csv, event
WORD_SERIES = 'KXFEDMENTION'
WORD_EVENT = 'KXFEDMENTION-26JAN'
WORD_MARKET = 'KXFEDMENTION-26JAN-PROJ'
# link -> https://kalshi.com/markets/kxfedmention/fed-mention/kxfedmention-26jan
2. Load and Parse Transcript Files
- Reads all
.txtfiles from a directory containing FOMC press conference transcripts. - Extracts the date from each filename and converts it into a
datetimeobject. - Stores each transcript’s:
- filename
- date
- full text content
- placeholder for Jerome Powell–only text
3. Extract Jerome Powell’s Speech
- Splits the transcript text using
<NAME>...</NAME>tags. - Identifies sections following
<NAME>CHAIR POWELL</NAME>.. - Concatenates only Jerome Powell’s spoken text into a single string per transcript.
# Open TXT files
p = Path('..\\fed_test\\fed_press_conferences')
files = []
for file_path in p.glob('*.txt'):
with open(file_path, 'r', encoding='utf-8-sig') as f:
content = f.read()
date_str = file_path.name.replace('FOMCpresconf', '').replace('.txt', '')
files.append({'file_name': file_path.name,
'date': datetime.datetime(year=int(float(date_str[:4])), month=int(float(date_str[4:6])), day=int(float(date_str[6:]))),
'content': content,
'j_text': []})
# get Jerome only text.
for d in files:
str_lst = re.split(r'(<NAME>.*?</NAME>. )', files[0]['content'])
j_text = []
for i, s in enumerate(str_lst):
if s:
if s == '<NAME>CHAIR POWELL</NAME>. ':
j_text.append(str_lst[i + 1].strip())
d['j_text'] = ' '.join(j_text)
4. Retrieve Kalshi Market Words
- Queries the Kalshi API for the specified event.
- Extracts the list of words tied to individual markets.
- Handles compound words (e.g.,
"inflation / inflationary") by splitting them. - Associates each word with its current market price.
# Kalshi
# Get even words
categories = event(event_ticker=WORD_EVENT)
# Collect Words. Split where necessary.
word_lst = []
for c in categories['markets']:
if '/' in c['custom_strike']['Word']:
ws = c['custom_strike']['Word'].split(' / ')
for w in ws:
word_lst.append({'word': w, 'cost': float(c['last_price_dollars'])})
else:
word_lst.append({'word': c['custom_strike']['Word'], 'cost': float(c['last_price_dollars'])})
5. Count Word Mentions Per Speech
- For each transcript:
- Initializes a counter for every tracked word.
- Counts how many times each word appears in Jerome Powell’s text (case-insensitive).
- Stores per-speech word counts.
# Parse how many words appear in the individual speeches.
for i in files:
word_cnt = {i['word'].lower(): 0 for i in word_lst}
for w in word_cnt:
word_cnt[w] += i['j_text'].lower().count(w)
i['word_cnt'] = word_cnt
Note, we are applying .lower() to the speeches and Kalshi words.
6. Aggregate Statistics Across All Speeches
- For each word:
- Counts how many speeches included the word at least once (
said). - Counts how many speeches did not (
not_said). - Sums total occurrences across all speeches.
- Computes the percentage of speeches where the word was said.
- Counts how many speeches included the word at least once (
7. Export Results
- Merges Kalshi market data with the aggregated word statistics.
- Saves the final dataset to
2025_01_07_Fed_words.csv.
# Calc Odds and reformat for output.
final_word_cnt = {i['word'].lower(): {'said': 0, 'not_said': 0, 'count': 0, 'pct_said': 0.0} for i in word_lst}
for i in files:
for k, v in i['word_cnt'].items():
if v:
final_word_cnt[k]['said'] += 1
final_word_cnt[k]['count'] += v
else:
final_word_cnt[k]['not_said'] += 1
for k, v in final_word_cnt.items():
v['pct_said'] = v['said'] / len(files)
# print(f'{k}: ({v['said']}, {v['not_said']}, {v['count']}), ({v['said'] / len(files)})')
save_csv(file_path='2025_01_07_Fed_words.csv', data=[i | final_word_cnt[i['word'].lower()] for i in word_lst])
Purpose
The script combines natural language analysis of Fed press conferences with Kalshi market data to estimate how frequently Jerome Powell mentions specific words and how consistently those words appear across speeches, enabling comparison with market-implied expectations.
Results
Since the markets work in a binary function, the cost can be inferred as the percent likelyhood the word or phase will be said. This means where there is a delta between Cost and Pct Said, we have some alpha.
A few caveats, words like AI or Anchor (Probably Anchored) could be part of other words. Be mindful of that, the searching for the word in the speeches does not factor this in.
Key standouts are where Pct_said is greater or less than the Cost.
| Word | Cost | Pct Said | Said | Not Said | Count | Jan 26 |
|---|---|---|---|---|---|---|
| Egg | 0.05 | 0.0 | 0 | 43 | 0 | No |
| Trade War | 0.05 | 0.0 | 0 | 43 | 0 | Yes |
| Soft Landing | 0.07 | 0.093 | 4 | 39 | 8 | No |
| Bitcoin | 0.07 | 0.047 | 2 | 41 | 2 | No |
| National Debt | 0.08 | 0.0 | 0 | 43 | 0 | No |
| ADP | 0.09 | 0.023 | 1 | 42 | 1 | No |
| Stagflation | 0.10 | 0.023 | 1 | 42 | 3 | No |
| Anchor | 0.11 | 0.977 | 42 | 1 | 113 | No |
| Gold | 0.12 | 0.023 | 1 | 42 | 1 | No |
| Pardon | 0.13 | 0.116 | 5 | 38 | 6 | No |
| Trump | 0.14 | 0.047 | 2 | 41 | 2 | No |
| Gas | 0.16 | 0.233 | 10 | 33 | 18 | No |
| Gasoline | 0.16 | 0.163 | 7 | 36 | 9 | No |
| Natural Gas | 0.16 | 0.023 | 1 | 42 | 2 | No |
| Yield Curve | 0.16 | 0.14 | 6 | 37 | 11 | No |
| Consumer Confidence | 0.18 | 0.07 | 3 | 40 | 3 | No |
| Dot Plot | 0.20 | 0.116 | 5 | 38 | 9 | No |
| Recession | 0.23 | 0.535 | 23 | 20 | 56 | No |
| QE | 0.24 | 0.14 | 6 | 37 | 8 | No |
| Quantitative Easing | 0.24 | 0.093 | 4 | 39 | 4 | No |
| Tax | 0.31 | 0.163 | 7 | 36 | 13 | No |
| Beige Book | 0.33 | 0.14 | 6 | 37 | 10 | Yes |
| Volatility | 0.35 | 0.14 | 6 | 37 | 8 | No |
| Probability | 0.36 | 0.163 | 7 | 36 | 8 | No |
| QT | 0.40 | 0.256 | 11 | 32 | 15 | No |
| Quantitative Tightening | 0.40 | 0.0 | 0 | 43 | 0 | No |
| Median | 0.42 | 0.698 | 30 | 13 | 160 | No |
| Dissent | 0.42 | 0.163 | 7 | 36 | 19 | No |
| Projection | 0.50 | 0.674 | 29 | 14 | 180 | Yes |
| Pandemic | 0.57 | 0.953 | 41 | 2 | 234 | Yes |
| Shutdown | 0.59 | 0.279 | 12 | 31 | 21 | Yes |
| Shut Down | 0.59 | 0.047 | 2 | 41 | 3 | Yes |
| Tariff Inflation | 0.59 | 0.093 | 4 | 39 | 13 | Yes |
| Softening | 0.61 | 0.442 | 19 | 24 | 51 | Yes |
| Credit | 0.63 | 0.512 | 22 | 21 | 99 | No |
| Goods inflation | 0.69 | 0.279 | 12 | 31 | 33 | No |
| AI | 0.80 | 1.0 | 43 | 0 | 2966 | Yes |
| Artificial Intelligence | 0.80 | 0.023 | 1 | 42 | 1 | Yes |
| Layoff | 0.83 | 0.256 | 11 | 32 | 25 | Yes |
| Balance Sheet | 0.84 | 0.907 | 39 | 4 | 183 | No |
| Uncertainty | 0.84 | 0.791 | 34 | 9 | 112 | Yes |
| Restrictive | 0.85 | 0.698 | 30 | 13 | 199 | Yes |
| Unchanged | 0.89 | 0.419 | 18 | 25 | 21 | Yes |
| Balance of Risk | 0.94 | 0.558 | 24 | 19 | 66 | Yes |
| Expectation | 0.95 | 1.0 | 43 | 0 | 322 | Yes |
| Good Afternoon | 0.97 | 1.0 | 43 | 0 | 43 | Yes |
Next Steps
Further analysis can be done around the following question:
- We are assuming rates will stay the same, what is the word distribution for speeches prior to when the Fed was in that position?
- Use regex to find the words witj ‘\b {word} \b’, Eliminating if the word is apart of other words.
- Words used in recent/previous, predictive of next statement.
- Words used in other types of recent releases.
- Does time of year matter?
TODO
- Refine regex for word boundary matching
- Compare with other release types
Last Updated: 2026-01-27