Summer Research Webpage

Daniel Yehdego

                           Project Description

The central dogma of molecular biology states that the genetic information of an organism is transferred from Deoxyribonucleic Acid (DNA) to Ribonucleic Acid (RNA) and then to Proteins. For a long time DNA was considered as the primary actor in storing the genetic code with RNA cast into secondary role of carrier of this information. But a string of discoveries in the last decade have proved that smaller RNA molecules operate many cell controls. The knowledge about RNA is expanding rapidly. It is now known that RNA catalyzes reactions, directs the site-specific modification of RNA nucleotides, modulates protein expression and serves in protein localization. Therefore, understanding the function of RNA molecules is key to unlocking the pathways of disease and biology.

 

Knowing the precise three dimensional structure of RNA is one of the foremost goals of molecular biology, for it is this structure that determines the molecule's function. Nuclear Magnetic Resonance and X-ray crystallography are some of the available experimental methods generally used for this purpose. But these are very costly, time consuming and not always feasible methods. As a result, it is easy to determine the sequence of RNA compared to the three dimensional structure. The gap between the number of proteins whose sequence is known (in thousands) compared to whose complete three dimensional structure is known (in hundreds) is widening on an yearly basis. This has lead to intense research into structure predicting methods using computational algorithms.

 

The building blocks of DNA and RNA are nucleotides. Three components are present in RNA nucleotides: the nitrogenous base, the sugar and the phosphate group. The RNA backbone is made of ribose five atom carbon-sugars counted from 1' through 5' and it is attached by two phosphate groups in 3' and 5', respectively. The nitrogen base in RNA are made of four different bases, Adenine (A), Guanine (G), Cytosine(C), and Uracil(U). Uracil is replaced by Thymine (T) in DNA. The phosphate groups in the backbone of RNA have a negative charge which makes RNA a charged molecule. Due to this, the RNA molecule in a cell is not inherently stable and to gain stability, it folds on itself. A nucleotide in one part of RNA can make base-pair with a complementary nucleotide in another part of RNA. Furthermore a nucleotide sequence uniquely determines the folding pattern and hence we can attempt to predict its structure. Listing out all the base pairs given a nucleotide sequence is considered as secondary structure prediction. The secondary structure of RNA is the scaffolding of its tertiary structure. It is well known that RNA folding is hierarchical: "the primary sequence determines the secondary structure and the secondary structure in turn determines the tertiary folding."

 

Essentially, all RNA secondary structures are made up of elements that can be classified into two basic categories: stem-loops and pseudoknots. Both kinds of secondary structure elements have been implicated in important biological processes such as gene expression and regulation for stem-loops and pseudoknots. We also note that in both stem-loops and pseudoknots, it is necessary to have a stretch of nucleotide sequence (ACCGUC in Fig. 1a and b) followed by its inverted complementary sequence (GACGGU) downstream. For simplicity, we shall refer to these kinds of patterns as close inversions. The development of mathematical models and computational prediction algorithms for stem-loop structures based on thermodynamic models started in the 1980s. Pseudoknots, because of the extra base pairings, must be represented by more complex models and data structures. Despite the computing power of supercomputers and emerging advanced technologies, e.g., multi-core architectures, the prediction of secondary structures of long RNA sequences (on the order of thousands of nucleotides) based on thermodynamic methods is still not feasible, especially if the structures include complex secondary structures like pseudoknots. The time and space required for accurate predictions of pseudoknots based on energy minimizations grow very rapidly with the sequence length shows the time and memory (in logarithmic scale) allocated for the prediction of RNA pseudoknots with various lengths using one of the most accurate prediction programs, Pknots-RE. The algorithm underlying Pknots-RE has a runtime and memory demand in the order of n6 and n4, respectively, where n is the length of the
                                                                    
                                                                      Pages 1 2

 

                    Week One

 

                    Week Two

 

                    Week Three

 

                   Week Four

 

                   Week Five

 

                   Week Six

 

                   Week Seven

 

                   Week Eight

 

                   Week Nine