Roustaei, Javad (2025) Narrative Signals from Bank Filings as Early-Warning Indicators: Unsupervised NLP with Regime-Switching Models. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
3MBRoustaei_MSc_F2025.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Financial distress in banks often surfaces first as shifts in narrative tone and content within
mandated disclosures. This thesis studies whether unsupervised Natural Language Processing(NLP) features extracted from U.S. bank 10-K/10-Q filings can anticipate transitions
into high-risk regimes. We build filing-level signals from (i) dictionary-based sentiment
with negation/intensity handling, (ii) topic-mixture drift measured by Jensen–Shannon
divergence, and (iii) section-focused embedding clusters (MD&A and Risk Factors) with
a cluster-change indicator. These features are integrated in a parsimonious two-state
Gaussian Hidden Markov Model (HMM) to produce a continuous distress probability per
filing.
Evaluation uses market-based forward drawdown labels (e.g., f −20% within 60–120
trading days) and emphasizes precision, recall, F1, AUC-PR, and lead time rather than raw
accuracy. Single-feature HMMs (sentiment only; cluster-change only) provide transparent
baselines. A multifeature HMM improves recall of distress episodes relative to those
baselines but can generate more false alarms. To increase actionability, we introduce a
hybrid regime–market filter that requires both an elevated HMM distress probability and
contemporaneous market stress (elevated trailing drawdown or volatility) with a short
persistence rule. This hybrid step substantially lifts precision and F1—typically by several
tens of percentage points—while retaining non-trivial lead time (often one filing) in case
studies such as Silicon Valley Bank (SVB) versus a non-failure peer Huntington Bancshares
(HBAN).
Robustness checks vary distress windows, thresholds, persistence, and regime count,
and show qualitatively stable trade-offs between sensitivity and specificity. The contribution
is a transparent, reproducible pipeline that couples unsupervised narrative signals with
regime switching and a pragmatic market confirmation step, yielding an early-warning
signal suitable for supervisory screening and risk monitoring.
| Divisions: | Concordia University > Faculty of Arts and Science > Mathematics and Statistics |
|---|---|
| Item Type: | Thesis (Masters) |
| Authors: | Roustaei, Javad |
| Institution: | Concordia University |
| Degree Name: | M. Sc. |
| Program: | Mathematics |
| Date: | 29 August 2025 |
| Thesis Supervisor(s): | Brugiapaglia, Simone and Hyndman, Cody |
| ID Code: | 996160 |
| Deposited By: | Javad Roustaei |
| Deposited On: | 04 Nov 2025 17:11 |
| Last Modified: | 04 Nov 2025 17:11 |
Repository Staff Only: item control page


Download Statistics
Download Statistics