FASTA

Fasta (Bill Pearson) est l'autre programme populaire d'alignement local.
Fasta utilise egalement une heuristique de table de hashing pour accelerer la recherche, mais ne requiert pas de pre-traitement de la banque de données. Plus simple à implémenter.
La version 3 évalue statistiquement les matchs (E value + distribution valeurs extremes).
Fasta est significativement plus lent que Blast.
Existe également en version tfasta, fastx, tfastx.
Le "package" Fasta est accompagné sous Unix des programmes lfasta (alignement local avec plusieurs solutions possible) et align (alignement global).

Le serveur Fasta de Pasteur:

FASTA (W. Pearson)

your e-mail
(

= required,

= conditionally required)

Fasta program

Query sequence File : please enter either :

the name of a file:
or the actual data here:

(sequence format)

Is it a DNA or protein sequence ? DNA protein

Protein Database

Nucleotid Database

Control Options

Optimization Options

Report options

Other Options

Control Options

ktup : sensitivity and speed of the search (protein:2, DNA:6)
OPTCUT : the threshold for optimization. The OPTCUT value is normally calculated based on sequence length
GAPCUT: threshold for joining the initial regions for calculating the initn score
penalty for the first residue in a gap (-12 by default for fasta with proteins, -16 for DNA)
penalty for additional residues in a gap (-2 by default for fasta with proteins, -4 for DNA)
expectation value threshold for displaying scores and alignments

[Return to the main part with your favorite browser's Back function]

Optimization Options

band-width used for optimization
unlimited Smith-Waterman alignment for DNA
no limited optimization

[Return to the main part with your favorite browser's Back function]

Report options

No histogram
number of similarity scores to be shown
number of alignments to be shown

Alternate display of matches and mismatches in alignments
sequences ranked by the z-score based on the init1 score
both sequences are shown in their entirety in alignments
output line length for sequence alignments (< 200)
start numbering the aligned sequences at position x1 x2 (2 numbers)
display more information about the library sequence in the alignment
write out the sequence identifier, superfamily number, and similarity scores to this file
Do not do statistical significance calculation

[Return to the main part with your favorite browser's Back function]

Other Options

filename of an alternative scoring matrix file (BLOSUM50) : please enter either :

the name of a file:
or the actual data here:

(fastx only) penalty for a +1 or -1 frameshift
(tfasta only) only the three forward frames are searched

[Return to the main part with your favorite browser's Back function]

your e-mail

Some explanations about the options

Main parameters
Fasta program: fasta/fasta3 - scan a protein or DNA sequence library for similar sequences; tfasta/tfasta3 - compare a protein sequence to a DNA sequence library, translating the DNA sequence library `on-the-fly'.; fastx/fastx3/fasty3 - compare a DNA sequence to a protein sequence database, comparing the translated DNA sequence in three frames, with frameshifts. fasty2 allows frameshifts inside codons.; tfastx3/tfasty3: compare a protein sequence vs a translated DNA db, with frameshifts. tfasty3 allows frameshifts inside codons.
enter either the name of a file or the actual data: if you are using Netscape 2.x or later, you can select a file by typing its name, or better, by selecting it with the Netscape file browser (Browse button); OR you can type your data in the next area, or cut and paste it from another application; (but not both)
Optimization Options
band-width used for optimization: Set the band-width used for optimization. -y 16 is the default for protein when ktup=2 and for all DNA alignments. -y 32 is used for protein and ktup=1. For proteins, optimization slows comparison 2-fold and is highly recommended.
unlimited Smith-Waterman alignment for DNA: force Smith-Waterman alignment for output. Smith-Waterman is the default for protein sequences and FASTX, but not for TFASTA or DNA comparisons with FASTA.

Control Options
ktup : sensitivity and speed of the search (protein:2, DNA:6): ktup sets the sensitivity and speed of the search. If ktup=2, similar regions in the two sequences being compared are found by looking at pairs of aligned residues; if ktup=1, single aligned amino acids are examined. ktup can be set to 2 or 1 for protein sequences, or from 1 to 6 for DNA sequences. The default if ktup is not specified is 2 for proteins and 6 for DNA.
expectation value threshold for displaying scores and alignments: Expectation value limit for displaying scores and alignments. (Typically 10.0 for protein sequence comparisons; 5.0 for FASTX, and 2.0 for DNA sequence comparisons.)
Report options
Alternate display of matches and mismatches in alignments: (MARKX) =0,1,2,3,4. Alternate display of matches and mismatches in alignments.; MARKX=0 uses ':','.',' ', for identities, conservative replacements, and non-conservative replacements, respectively.; MARKX=1 uses ' ','x', and 'X'.; MARKX=2 does not show the second sequence, but uses the second alignment line to display matches with a '.' for identity, or with the mismatched residue for mismatches. MARKX=2 is useful for aligning large numbers of similar sequences.; MARKX=3 writes out a file of library sequences in FASTA format. MARKX=3 should always be used with the 'SHOWALL' (-a) option, but this does not completely ensure that all of the sequences output will be aligned.; MARKX=4 displays a graph of the alignment of the library sequence with repect to the query sequence, so that one can identify the regions of the query sequence that are conserved.
start numbering the aligned sequences at position x1 x2 (2 numbers): causes fasta/lfasta/plfasta to start numbering the aligned sequences starting with offset1 and offset2, rather than 1 and 1. This is particularly useful for showing alignments of promoter regions.
Sequence format: The sequence will be automatically converted in the format needed for the program; providing you enter a sequence either:; in plain (raw) sequence format or in one of the following known formats:; IG,GenBank,NBRF,EMBL,GCG,DNAStrider,Fitch,fasta,Phylip,PIR,MSF,ASN,PAUP,CLUSTALW