Mechanisms of structural variation
Structural variants (SVs) are formed through diverse DNA repair pathways, each leaving distinct sequence signatures around their breakpoints. However, the criteria distinguishing these mechanisms are often heuristic and ambiguous. To address this, we represent breakpoint-flanking DNA segments through both genomic language model embeddings and suffix-array-based features, and train machine learning models to classify known SV formation mechanisms and to discover formation patterns among variants that resist existing mechanistic categories.