Codon Usage Signatures and Codon Pair Usage in Genes Associated with Multiple Sclerosis
Abstract
This study investigates codon usage patterns and biases in 258 Multiple Sclerosis (MS)-associated genes
compared to 137 housekeeping (HK) genes, using computational and statistical approaches. Codon usage
indices such as RSCU, CAI, ENC, GC3%, and the P2 index were analyzed alongside rare amino acid
usage, mutation-driven nucleotide skew, neutrality and parity plots, and codon pair bias. Machine learning
models (SVM, RF, KNN) were trained on different codon feature sets to classify MS vs. HK genes.
Results showed that MS genes exhibit a GC-rich codon bias, strong translational selection (P2 > 0.5), and
mutation pressure predominantly at third codon positions. Seventeen codons were identified as
significantly different via Mann-Whitney U test. SVM achieved the highest classification accuracy (81%)
with full codons, while feature selection improved performance for other models. The findings underscore
the influence of both compositional and adaptive forces on MS gene codon usage, with potential
implications for gene therapy and synthetic design.
Collections
- Class of 2025 [41]