Adjusting for background mutation frequency biases improves the identification of cancer driver genes.


A common goal of tumor sequencing projects is finding genes whose mutations are selected for during tumor development. This is accomplished by choosing genes that have more non-synonymous mutations than expected from an estimated background mutation frequency. While this background frequency is unknown, it can be estimated using both the observed synonymous mutation frequency and the non-synonymous to synonymous mutation ratio. The synonymous mutation frequency can be determined across all genes or in a gene-specific manner. This choice introduces an interesting trade-off. A gene-specific frequency adjusts for an underlying mutation bias, but is difficult to estimate given missing synonymous mutation counts. Using a genome-wide synonymous frequency is more robust, but is less suited for adjusting biases. Studying four evaluation criteria for identifying genes with high non-synonymous mutation burden (reflecting preferential selection of expressed genes, genes with mutations in conserved bases, genes with many protein interactions, and genes that show loss of heterozygosity), we find that the gene-specific synonymous frequency is superior in the gene expression and protein interaction tests. In conclusion, the use of the gene-specific synonymous mutation frequency is well suited for assessing a gene’s non-synonymous mutation burden.

IEEE transactions on nanobioscience