Computer Science gives Darwinism a big push
CRACS-INESC Porto and IBMC make history
Researchers from the Institute for Molecular and Cell Biology (IBMC) and The Center for Research in Advanced Computer Systems (CRACS) of INESC Porto Associated Laboratory (LA) have joined forces to make history in the evolution of the Theory of Natural Selection. For the first time, the genome of an animal has been sequenced by a team of Portuguese researchers. This was not just any genome; the chosen animal was a Drosophila Americana, a fruit fly whose diverse patterns of longevity could explain, for example, the phenomenon of population aging. “Completing the sequence of the genome of this fly was really necessary so we could move our study forward” states Jorge Vieira, the director of the Molecular Evolution Group at IBMC. He goes on to emphasise that the foundations of this Genetic “victory” were laid down by the progress made in Computer Science.
Why is it that some humans live longer than others?
The IBMC/CRACS-INESC Porto LA team has decoded the two strains of the Drosophila Americana genome, a type of fruit fly that presents great variation in longevity. This characteristic makes the fly an excellent model for genetic studies on population aging. The genetic differences between the two strains could lead to the discovery of the genes responsible for variations in the longevity of flies and eventually in humans too.
By the end of the first phase of the project two genes had been found that could explain 35% of variations in longevity. Despite these genes being non-existent in humans, because we are all animals, our basic genetics are the same. Therefore, using these genes with common functions it will be possible to explain the variations in longevity for humans.
The decoding of the genome (sequence and assembly) was crucial to move the scientific investigation forward so that characteristics in certain genes could be identified more quickly. These characteristics can explain why some individuals from the same species show variations in their patterns of longevity, their resistance to the cold and their development time.
"It's the algorithms, stupid"
“We were able to achieve something that was very difficult, the sequencing of the genome” Jorge Vieira explains. Despite this sequencing having been completed in Austria using sequencing machines at the “Institut für Populationsgenetik”, this project is completely Portuguese. The organisation of the sequences, the actual assembly of the genome, is being completed in our laboratories in CRACS-INESC Porto LA in Portugal by Nuno Fonseca.
The first phase of this assembly process took place between May and October 2010. Algorithms form the basis of this process, they overlap the sequences that are similar in order to get a preliminary version of the genome. The Austrian sequencing machines generate millions of small pieces of DNA (from various flies) that were previously organised by the assembly algorithms to obtain the DNA sequence. Without resorting to these algorithms it would be impossible to assemble the genome because of its complexity.
Using the metaphor of a puzzle Nuno Fonseca explains that, “the parts can fit together in many different ways and the challenge is finding the best way to assemble them. This requires powerful computers, with high processing capacities and enormous memories. It is the opposite of a normal puzzle, not recognising the image of the puzzle after putting it together”. This is why the final result is always an estimate.
Computer Science forms the basis of Genetic progress
“Without Computer Science, the human genome sequencing project would not have got to where it is today” claims the CRACS-INESC Porto LA researcher. This opinion is also shared by Jorge Vieira from IBMC. He states that the contribution made by Computer Science was “very important” because huge quantities of data are being produced in many different digital formats such as microscopic images, genomes, DNA and protein sequences, gene expressions, interactions between genes and even scientific literature.
“Fortunately”, explains the director of the Molecular Evolution group at IBMC “this explosion of data has been met by an increase in the computer’s capacity to process this data”. However, Nuno Fonseca warns that “the assembly algorithms of the genome are not 100% accompanying the development of sequencing technologies”.
Nevertheless Jorge Vieira considers that “we are still at the exploratory stage”. The next few years will focus on strengthening and optimising our already existing knowledge: “but determining the genomic sequencing of an animal or plant will be something trivial by 2020”. What is currently missing is understanding what all of the data means from a biological perspective and in order to understand it, new multi-disciplinary research teams will need to be formed states Jorge Vieira.
Scientific progress means multidisciplinary collaboration.
After years of successful collaboration between the two institutions it was IBMC that presented the CRACS-INESC Porto LA researcher with the challenge of decoding the two strains of the Drosophila Americana genome. Nuno Fonseca and Jorge Vieira both guarantee that they will continue working together and, in the words of Jorge Vieira, they “will be even more ambitious”.
It is this sharing of knowledge and competencies that will establish the future of the investigation and of scientific progress. Specialised knowledge coming from different areas is really necessary for these large projects “this is very rarely found in just one Research Institution or Associated Laboratory. However strong a Scientific Institution is, it will never cover all of these areas” states Jorge Vieira. This is why the results of the first stage of the investigation were published on a website with open access; so that they could be used by others scientists studying evolution and comparative genetics.
This opinion is shared by Fernando Silva, the coordinator of CRACS-INESC Porto LA who sees interdisciplinary research as “increasingly more important for scientific innovation and it will become even more so when we realise that the inherent complexity of problem solving requires new forms of analysis, of learning and of thinking.” The valences that exist in CRACS and INESC-Porto LA both in algorithms and in scalable design solutions can be applied to multidisciplinary areas such as bioinformatics. “Projects like this one should be encouraged and we should help them multiply” adds Fernando Silva.