All news

Scientists at Siberian Federal University speed up genome analysis ten times

Algorithms of fast search through texts were being used extensively in the modern world

MOSCOW, July 25. /TASS/. Scientists in Krasnoyarsk have created an algorithm for quick sequence similarity search, which accelerates the speed of genome structure analysis ten times and more, one of the project’s authors, Dr. Sc. (Physics and Mathematics) Sergei Tsaryov, a professor at the Institute of Space and Information Technologies of the Siberian Federal University, told TASS.

Open data bases of gene structures allow researchers to request on-line search for similar genetic structures or their components for genome analysis. The problem is the processing of such requests may last days. Scientists encounter similar problems in putting together the full genome of a living organism, for instance the genomes of coniferous plants, one of their key features being a high share of repeats. Such process may keep major computer clusters busy for weeks. Mathematicians are currently working on creating faster algorithms to facilitate the search for identical sequences in large amounts of similar text data.

“Our method is called generalizing Nonius scale-assisted quick search. In a sense it resembles the way a caliper works. The tool has the main scale and a Vernier scale, which gives interpolated measurements with an accuracy of fractions of the main scale’s division. This principle accelerates the process dramatically. In some cases ten times and more. Moreover, our algorithm is capable of identifying similar parts of the DNA where other algorithms may overlook them,” Tsaryov said.

He recalled that algorithms of fast search through texts were being used extensively in the modern world. With the fast modern computers and sophisticated algorithms of quick search for crucial information in the mass of accumulated data it takes users seconds to retrieve crucial information from the world web. Search for similar texts with the systems discovering and preventing student and academic plagiarism, for mistakes in long texts and so on and so forth are other examples.

Research into the problem began in 2015, when Tsaryov and Professor Mikhail Sadovsky, a biophysicist in Krasnoyarsk, made a decision to create a new algorithm for quick search matching the needs of genomics. The first results, obtained in 2016, demonstrate how the algorithm works in relation to the human genome and the genome of a species of drosophila. They were compared with the already available algorithms of search for genome information, including the oldest of them called BLAST. It turned out that the algorithm proposed by Krasnoyarsk scientists outperforms all older counterparts considerably in terms of speed.

The researchers plan to couple their algorithm with the existing ones are being used for search through genome basis, thus improving their efficiency.

“Also, we plan to test this idea in adjoining fields of science and knowledge, such as the search for similar texts in the systems designed to resist plagiarism or accelerate Internet searches. True, this is a very different field, but progress is clearly possible there, too,” Tsarev said.