Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Bioinformatics_lectures / lecture4.pptx
Скачиваний:
1
Добавлен:
21.02.2016
Размер:
196.49 Кб
Скачать

ALGORITHM

For each cell, compute

Match score: sum of preceding diagonal cell and score of aligning the two letters (+1 if match, ­1 if no match)

Horizontal gap score: sum of score to the left and gap score (­1)

Vertical gap score: sum of score above and gap score (­1)

Choose highest score and point arrow towards maximum cell

When you finish, trace arrows back from lower right to get alignment

SMITH–WATERMAN ALGORITHM

The Smith–Waterman algorithm is a well­ known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences.

Instead of looking at the total sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.

Smith–Waterman is a dynamic programming algorithm

SMITH­WATERMAN

The algorithm was first proposed by Temple F. Smith and Michael S. Waterman in 1981.

Modification of Needleman­Wunsch

Edges of matrix initialized to 0

Maximum score never less than 0

No pointer unless score greater than 0

Trace­back starts at highest score (rather than lower right) and ends at 0

GLOBAL VS. LOCAL

Global – both sequences aligned along entire lengths

Local – best subsequence alignment found

Global alignment of two genomic sequences may not align exons

Local alignment would only pick out maximum scoring exon

COMPLEXITY

O(mn) time and memory

This is impractical for long sequences!

Observation: during fill phase of the algorithm, we only use two rows at a time

Instead of calculating whole matrix, calculate score of maximum scoring alignment, and restrict search along diagonal

OTHER OBSERVATIONS

Most boxes have a score of 0 – wasted computation

Idea: make alignments where positive scores most likely (approximation)

BLAST

BLAST

In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino­acid sequences of different proteins or the nucleotides of DNA sequences.

A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold.

BLAST is one of the most widely used bioinformatics programs, because it addresses a fundamental problem and the heuristic algorithm it uses is much faster than calculating an optimal alignment.

This emphasis on speed is vital to making the algorithm practical on the huge genome databases currently available, although subsequent algorithms can be even faster.

Before fast algorithms such as BLAST

and FASTA were developed, doing database searches for protein or nucleic sequences was very time consuming because a full alignment procedure (e.g., the Smith–Waterman algorithm) was used.

EXAMPLES

Examples of other questions that researchers use BLAST to answer are:

Which bacterial species have a protein that is related in lineage to a certain protein with known amino­acid sequence?

Where does a certain sequence of DNA originate?

What other genes encode proteins that exhibit structures or motifs such as ones that have just been determined?

BLAST is also often used as part of other algorithms that require approximate sequence matching.

PROGRAM

The BLAST algorithm and the computer program that implements it were developed by Stephen Altschul, Warren Gish, and David Lipman at the U.S.National Center for Biotechnology Information (NCBI), Webb Miller at the Pennsylvania State University, and Gene Myers at the University of Arizona.

It is available on the web on the NCBI website

Соседние файлы в папке Bioinformatics_lectures