Proteogenomics is a research field where proteome data is used to improve gene annotation. To achieve this, customized protein databases are constructed to match proteomic data. We perform a proteogenomic analysis using N-terminal COFRADIC data in order to identify novel translational initiation start sites. We use a multistage search strategy where spectra that remained unidentified after searching the Arabidopsis proteome are used for our proteogenomic analysis. Here, the unidentified spectra were searched against a customized N-terminal peptide library derived from a six-frame translation of the Arabidopsis (Arabidopsis thaliana) genome as well as Augustus predicted gene models.