Thus, a typical homology search returns a set of high-scoring pairs (HSPs), consisting of the subsequences of the database and the query sequence that can be aligned to one another with a high similarity score. The similarity measure reflects the likelihood of two sequences to be evolutionarily related. The focus of the field is on designing faster and more sensitive methods to search for sequences similar to a query DNA or protein sequence in one or more huge databases. Sequence homology search has been a core topic in the bioinformatics literature since the seminal paper introducing BLAST (Altschul et al., 1990). In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene.Īvailability: The Java implementation is available for download from On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Results: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation. These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). Motivation: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |