Automatic workflow for the classification of local DNA conformations

Čech P.1, Kukal J.2, Schneider B.3, Černý J.3, Svozil D.1

1Laboratory of Informatics and Chemistry, ICT Prague, Technická 5, 166 28, Prague 6, Czech Republic.

2Faculty of Nuclear Sciences and Physical Engineering, CTU Prague, Trojanova 13, 122 00, Prague 2, Czech Republic.

3Institute of Biotechnology AS CR, v. v. i., Vídeňská 1083, 142 00, Prague 4, Czech Republic.

 

A large number of crystal and NMR structures reveals the considerable structural polymorphism of DNA at the local level. DNA is highly variable with dinucleotide steps exhibiting substantial flexibility in a sequence-dependent manner. The existing classification of DNA dinucleotides [1] is based on the considerable amount of manual work, which is time consuming and error prone. To overcome this limitation, we developed an automatic workflow for the classification of DNA dinucleotide conformations [2]. Using the workflow, dinucleotides with unassigned conformation can be either classified into one of already known 24 classes or they can be flagged as unclassifiable. If they exist, new classes in the set of unclassified dinucleotides are automatically identified by our nonhierarchical single-pass clustering algorithm. The project illustrates the utility of various machine learning approaches in the classification of local DNA conformations.

 

1.         Svozil D., et al., DNA conformations and their sequence preferences. Nucleic Acids Research, 2008, 36(11):3690-3706.

2.         Čech P., et al., Automatic workflow for the classification of local DNA conformations. BMC Bioinformatics, 2013, 14(1).