Prediction Markets: Probability of Jerome Powell using Kalshi words in FOMC speech.

Kalshi: Expected Words Said

Essentially, what we are looking to determine the likelyhood Jerome Powell will say certian words.

Kalshi: Expected Rate Change

Keep in mind that people expect the Fed to maintain rates as they are.

Data

We sourced the files directly from the FOMC website. These are downloaded as PDF’s, opened and parsed. Most of this was done by refactoring an existing repo (noted just below)

Our data includes speeches from 2020 to present. 43 files to be exact.

Link to the repo used

Note, Q&A is included. Though, if the word is in the question it is not counted. Only words from Jerome Powell are used.

Code Overview

This script analyzes Federal Reserve press conference transcripts to measure how often specific words (defined by a Kalshi prediction market) are spoken by Chair Jerome Powell, then aggregates those results and saves them to a CSV file.


Step-by-Step Description

1. Imports and Constants

  • Imports standard libraries for regex processing, dates, and file handling.
  • Imports custom Kalshi helper functions (event, save_csv).
  • Defines Kalshi tickers for a word-based prediction market related to Fed mentions.
import re
import datetime
from pathlib import Path
from kalshi_functions import save_csv, event

WORD_SERIES = 'KXFEDMENTION'
WORD_EVENT = 'KXFEDMENTION-26JAN'
WORD_MARKET = 'KXFEDMENTION-26JAN-PROJ'

# link -> https://kalshi.com/markets/kxfedmention/fed-mention/kxfedmention-26jan

2. Load and Parse Transcript Files

  • Reads all .txt files from a directory containing FOMC press conference transcripts.
  • Extracts the date from each filename and converts it into a datetime object.
  • Stores each transcript’s:
    • filename
    • date
    • full text content
    • placeholder for Jerome Powell–only text

3. Extract Jerome Powell’s Speech

  • Splits the transcript text using <NAME>...</NAME> tags.
  • Identifies sections following <NAME>CHAIR POWELL</NAME>..
  • Concatenates only Jerome Powell’s spoken text into a single string per transcript.
# Open TXT files
p = Path('..\\fed_test\\fed_press_conferences')

files = []
for file_path in p.glob('*.txt'):
    with open(file_path, 'r', encoding='utf-8-sig') as f:
        content = f.read()
        date_str = file_path.name.replace('FOMCpresconf', '').replace('.txt', '')
        files.append({'file_name': file_path.name, 
                      'date': datetime.datetime(year=int(float(date_str[:4])), month=int(float(date_str[4:6])), day=int(float(date_str[6:]))), 
                      'content': content,
                      'j_text': []})

# get Jerome only text.
for d in files:
    str_lst = re.split(r'(<NAME>.*?</NAME>. )', files[0]['content'])
    j_text = []
    for i, s in enumerate(str_lst):
        if s:
            if s == '<NAME>CHAIR POWELL</NAME>. ':
                j_text.append(str_lst[i + 1].strip())
    d['j_text'] = ' '.join(j_text)

4. Retrieve Kalshi Market Words

  • Queries the Kalshi API for the specified event.
  • Extracts the list of words tied to individual markets.
  • Handles compound words (e.g., "inflation / inflationary") by splitting them.
  • Associates each word with its current market price.
# Kalshi
# Get even words
categories = event(event_ticker=WORD_EVENT)

# Collect Words. Split where necessary.
word_lst = []
for c in categories['markets']:
    if '/' in c['custom_strike']['Word']:
        ws = c['custom_strike']['Word'].split(' / ')
        for w in ws:
            word_lst.append({'word': w, 'cost': float(c['last_price_dollars'])})
    else:
        word_lst.append({'word': c['custom_strike']['Word'], 'cost': float(c['last_price_dollars'])})

5. Count Word Mentions Per Speech

  • For each transcript:
    • Initializes a counter for every tracked word.
    • Counts how many times each word appears in Jerome Powell’s text (case-insensitive).
    • Stores per-speech word counts.
# Parse how many words appear in the individual speeches.
for i in files:
    word_cnt = {i['word'].lower(): 0 for i in word_lst}

    for w in word_cnt:
        word_cnt[w] += i['j_text'].lower().count(w)
    i['word_cnt'] = word_cnt

Note, we are applying .lower() to the speeches and Kalshi words.

6. Aggregate Statistics Across All Speeches

  • For each word:
    • Counts how many speeches included the word at least once (said).
    • Counts how many speeches did not (not_said).
    • Sums total occurrences across all speeches.
    • Computes the percentage of speeches where the word was said.

7. Export Results

  • Merges Kalshi market data with the aggregated word statistics.
  • Saves the final dataset to 2025_01_07_Fed_words.csv.
# Calc Odds and reformat for output.
final_word_cnt = {i['word'].lower(): {'said': 0, 'not_said': 0, 'count': 0, 'pct_said': 0.0} for i in word_lst}
for i in files:
    for k, v in i['word_cnt'].items():
        if v:
            final_word_cnt[k]['said'] += 1
            final_word_cnt[k]['count'] += v
        else:
            final_word_cnt[k]['not_said'] += 1


for k, v in final_word_cnt.items():
    v['pct_said'] = v['said'] / len(files)
    # print(f'{k}: ({v['said']}, {v['not_said']}, {v['count']}), ({v['said'] / len(files)})')

save_csv(file_path='2025_01_07_Fed_words.csv', data=[i | final_word_cnt[i['word'].lower()] for i in word_lst])

Purpose

The script combines natural language analysis of Fed press conferences with Kalshi market data to estimate how frequently Jerome Powell mentions specific words and how consistently those words appear across speeches, enabling comparison with market-implied expectations.


Results

Since the markets work in a binary function, the cost can be inferred as the percent likelyhood the word or phase will be said. This means where there is a delta between Cost and Pct Said, we have some alpha.

A few caveats, words like AI or Anchor (Probably Anchored) could be part of other words. Be mindful of that, the searching for the word in the speeches does not factor this in.

Key standouts are where Pct_said is greater or less than the Cost.

Word Cost Pct Said Said Not Said Count Jan 26
Egg 0.05 0.0 0 43 0 No
Trade War 0.05 0.0 0 43 0 Yes
Soft Landing 0.07 0.093 4 39 8 No
Bitcoin 0.07 0.047 2 41 2 No
National Debt 0.08 0.0 0 43 0 No
ADP 0.09 0.023 1 42 1 No
Stagflation 0.10 0.023 1 42 3 No
Anchor 0.11 0.977 42 1 113 No
Gold 0.12 0.023 1 42 1 No
Pardon 0.13 0.116 5 38 6 No
Trump 0.14 0.047 2 41 2 No
Gas 0.16 0.233 10 33 18 No
Gasoline 0.16 0.163 7 36 9 No
Natural Gas 0.16 0.023 1 42 2 No
Yield Curve 0.16 0.14 6 37 11 No
Consumer Confidence 0.18 0.07 3 40 3 No
Dot Plot 0.20 0.116 5 38 9 No
Recession 0.23 0.535 23 20 56 No
QE 0.24 0.14 6 37 8 No
Quantitative Easing 0.24 0.093 4 39 4 No
Tax 0.31 0.163 7 36 13 No
Beige Book 0.33 0.14 6 37 10 Yes
Volatility 0.35 0.14 6 37 8 No
Probability 0.36 0.163 7 36 8 No
QT 0.40 0.256 11 32 15 No
Quantitative Tightening 0.40 0.0 0 43 0 No
Median 0.42 0.698 30 13 160 No
Dissent 0.42 0.163 7 36 19 No
Projection 0.50 0.674 29 14 180 Yes
Pandemic 0.57 0.953 41 2 234 Yes
Shutdown 0.59 0.279 12 31 21 Yes
Shut Down 0.59 0.047 2 41 3 Yes
Tariff Inflation 0.59 0.093 4 39 13 Yes
Softening 0.61 0.442 19 24 51 Yes
Credit 0.63 0.512 22 21 99 No
Goods inflation 0.69 0.279 12 31 33 No
AI 0.80 1.0 43 0 2966 Yes
Artificial Intelligence 0.80 0.023 1 42 1 Yes
Layoff 0.83 0.256 11 32 25 Yes
Balance Sheet 0.84 0.907 39 4 183 No
Uncertainty 0.84 0.791 34 9 112 Yes
Restrictive 0.85 0.698 30 13 199 Yes
Unchanged 0.89 0.419 18 25 21 Yes
Balance of Risk 0.94 0.558 24 19 66 Yes
Expectation 0.95 1.0 43 0 322 Yes
Good Afternoon 0.97 1.0 43 0 43 Yes

Next Steps

Further analysis can be done around the following question:

  • We are assuming rates will stay the same, what is the word distribution for speeches prior to when the Fed was in that position?
  • Use regex to find the words witj ‘\b {word} \b’, Eliminating if the word is apart of other words.
  • Words used in recent/previous, predictive of next statement.
  • Words used in other types of recent releases.
  • Does time of year matter?

TODO

  • Refine regex for word boundary matching
  • Compare with other release types

Home

Last Updated: 2026-01-27


This site uses Just the Docs, a documentation theme for Jekyll.