Trigram Viterbi POS Tagger

NLP · Python, HMMs, Viterbi, Penn Treebank

Overview

Implemented a part-of-speech tagger using a trigram Hidden Markov Model on Penn Treebank data. Compared greedy search, beam search, and the Viterbi algorithm with Laplace and linear interpolation smoothing.

Writeup

Details

Parameters were estimated with maximum likelihood estimation on WSJ-tagged text. Unknown words were handled with a suffix tree trained on the training set. Smoothing used Laplace (α = 0.01) and linear interpolation across unigram, bigram, and trigram models. Transition and emission probabilities were computed with vectorized NumPy for fast training and inference.