BLAST (Basic Local Alignment Search Tool) is the most widely used method to find other sequences similar to one you have.  It is optimized for speed so if a more rigorous method is desired use Smith Waterman (see comprehensive below). Many sites are available on the web but  results differ because of different versions of the program, different parameters that can be set and different databases searched against.  Defaults work well with medium length sequences (nucleotides 100-1000, proteins 50-200).
    Protein sequence is better to search on than DNA because: (a) information content per residue is greater (20 letters vs. 4); (b) nonidentical aa can be scored by mutation frequency matrix like PAM or BLOSOM; (c) protein database is smaller so chance matches are reduced.

  sequence database searched sequence in hand


blastn nucleotide nucleotide check if a newly obtained sequence is already published
identify coding regions by searching for mRNA sequences
extend sequence fragments by in EST db
blastp protein protein identify new proteins, find homologs, identify domain structure by related proteins of known structure
blastx protein nucleotide translated identify potential coding regions in newly sequenced DNA
tblastn nucleotide translated protein detect unidentified proteins in a nucleotide database
tblastx nucleotide translated nucleotide translated increased sensitivity but slower than direct blastn


_p _n _x t_n t_x fasta SW notes
EBI * * *          
Sanger   *   * *      
NCBI * * * * * email     graphical overview
BioNavigator$ * * * * *   * results easily saved
GWFASTA           *   imtech
Raghava           *   imtech

    If you have a long sequence and are hunting for short domains, then splitting into overlapping fragments is best. 
    Searching with a short sequence often gets nothing because BLAST filters out random similarity.  For PCR primers increase the statistical expectation (expect in NCBI, maxexp in EBI).
    Repeats & regions of low complexity are usually filtered out to avoid false positives.  Masked region appears as a string of N or X so it may be useful to repeat with this turned off. 
    Distant homology can be sought by adjusting scoring matrix. The usual BLOSUM62 may be lowered to 45 or  the usual PAM30 increased to 250. BLOSUM (derived from BLOCKS protein conserved regions) is considered better than the older PAM (derived from a smaller set of globular proteins).
MSA (Multiple Sequence Allignment): WUSTL (sw/pir#) - BCM pairs - Fullen index - Clustal, PIMA, MSA, MAP (bcm) - ToPLign (Sankt) - Karolinska - MAP (Mich Tech) - Pimall (BU) - MultAlign (Zurich) - AllAll (Zurich) - MPSA - SAM (UCSC - viseur (nancy) - Musequal2 (nih) - cbrg AllAll (#;,#;) - Gryan's FALCON-large contigs - proanwin (ebi) - proanwin - map(C) - VSNS Bielefeld list -
MSA Clustal, progressive alignment starts with most similar portions: Clustal(bcm) - Clustal (ebi) - Clustal (emory bimcore) - Clustal (nih) - Clustal (pasteur) - Clsutal(Lyonnais) - Clustal(jp) - Clustal(Transfac) - Bionavigator - Clustal tutorial - Clustal tutorial -
MSA Pileup, progressive alignment using UPGMA: GCG - UA - NIH - HGMP - Bionavigator -
MSA Dialign, local alignment: dialign (pasteur) - dialign(Genomatix) - dialign(Bielefeld) -
Documentation: algorithms - conserved blocks algorithm - alscript seq->ps - ShadyBox (emory bimcore) -

Alignments Analysis: AMAS alignment - WebLogo - Hidden Markov Model - SAM HMM - Profile-SS (Watterman-Eggert) - SAM (Haussler UCSC) - Smith Waterman (paracel) - emboss - monash - monash - Fuellen basics - Fuellen gentle guide - Fuellen gentle-bcm - Fuellen gentle-uk - Tekaia MSA - bcm - vsns - vsns (bcm) - vsns (it) - barton - barton mirror - bcd bielefeld - bcd bcm - bcd uk - gibbs -

Multiple Alignments Databases: ALI search - ALI info FSSP folds - ShadyBox - BoxShade -

Temperature Melt (Tm) prediction: MeltDNA -

protein structure alignment servers and programs: CE (download) uses extension of the optimal path - DALI (download) uses distance matrix - KENOBI uses genetic algorithm - PRISM (download) uses sse alignment plus iterative refinement - PROSUP (download) uses hierarchical alignment (to build initial equivalence list) followed by dynamic programming refinement - SAP (download) uses double dynamic programming - SHEBA (download) uses hierarchical alignment with profiles - TOP (download) uses sse alignment - VAST uses vector alignment - STRUCTAL (download) uses double dynamic programming -

Database of Genome Sizes (DOGS)
ANALYSIS of Protein Sequences

Function from sequence:
BCM Beauty Protein blast, prosite, blocks - predictprotein
Motifs: nuclear localiz motif - Eisenberg - Eisenberg - Eisenberg article - Zincbinding
BLOCKS: protein motifs/homol (Hutchinson)
PROSITE: protein motifs (Blocks has more) (expasy, assoc with Swiss-Prot). Docs: Prosite Guide

Prot Alignment (Barton) - Matchbox MSA protein blocks, no gap penalty: Matchbox to 2000 aa (>nameCRseq) - BioSCAN match seq or ac# to db (blosum/pam, MSA) -

cbrg all/all match Swiss-Prot mailback

PDBTool-structure validation

pro-fit - ProFit

PSORT (prediction of protein sorting signals and localization)
There are significant differences in how well defined the results are and in how succintly the information is displayed. Some display every little fragment ever reported plus extraneous unrelated ones too, while others just show complete sequences. There are variations in availability of boolean operators to narrow - widen searches. Some hyperlink further on to medline reference text and motif servers. Some require more steps of linking (and waiting for server response) than others to get the desired information. It is unfortunate that at the present time many of the servers are not able to distinguish in the search between topoisomerase I and topoisomerase II and separating them is still a manual chore. An evaluation chart might look like:
database choices:
1 line - complex form
Example number of hits on:
EC# 5.99.1.?
topoisomerase I
topoisomerase II
Boolean operators
Provides on First screen: AC#, code (name, species), EC#
## links to see seq
seq formats: fasta, ncbi
Links to: prosite, entrez, medline, prodom, neighbors, gdb/embl
able to refer to address later?
notes: n-narrow ( only complete sequences - less than expected number ) ; w-wide ( every little redundant fragment ) ; x-extraneous unrelated hits .


Sequence Retrieval System (SRS) servers: lionbio - ebi - embl - sanger - .
All have seq lib (pir, swissprot, gene) ; formats: (pir-all, gcg, fasta) ; patterns (prosite, + ) . Provide sw code - pir ac#. Link to complete entries (can keep) - to seq & on to prosite, medline/entrez, genbank/embl, pir/swissprot. Boolean: |=or, &=and, !=but not. Docs .

bio oslo
prot stru, patrn (11), ent-medline, other-limb, transfac
topo&II (51), topo (167) 80sw, 87 pir; link to entrez.

           patrn  (8),           other-limb 
topoII (87x); topo (233x) 141sw, 92pir; link to entrez

embl heidelberg
prot stru, patrn (15), medlars, other-limb, pdbfind, transfac
topo&II (49); topo (163), 78sw, 85pir; link to medlars

csc finland
prot stru, patrn (10),         other-limb, pdbfind, transfac, taxon
topo&II (51), topo (167) 80sw, 87 pir; link to entrez.

ebi - Ebi
prot stru, patrn (12), medlars, other-limb,          transfac, tags
topo&II (52), topo (167) 80sw, 86pir; link to medlars

           patrn (5),                                transfac
topo&II (48), topo (179) 95sw, 84pir; medlars

caos nijmejen
           patrn (15), medlars, other-limb nakai, transfac,  lista  
topo&II (35);  topo (161) 80sw, 81pir;  link to medlars

inserm france
           patrn (15),           other-limb,  gene, transfac, lista
topo&II (32), topo (157) 76sw, 81pir; link to entrez.

sanger uk
prot stru, patrn ?                                            lista
topo&II (50), topo (166) 79sw, 87pir; link to entrez. 

abc hungary
           patrn (3),           other-limb .
topo&II (45), topo (151) 66sw, 85pir; link to entrez.

bioz basel embnet
           patrn (15),           gene lista,         transfac, lista
topo&II (47), topo (159) 72sw (no new), 87pir; link to entrez 

bmc uppsala
           patrn (9),           other-limb, transfac
topo&II (42), topo (138) 66sw (no new), 72pir; link to medlars

           patrn (15),          other-limb
topo&II (32), topo (111) 80sw, 31pir; link to entrez.

other menus:
Entrez (ncbi) stru - Entrez (ncbi) - Provides unique name, 200 limit. Links to sequence report, fasta format, medline, neighbors, genbank. topo&II (25), topo (218)
bimas Genobase/nih
bcm BioControl Panel (Genbank, GDB, GSDB, Genethon, Swiss-Prot, PIR, Blast, etc) - FAQ
bioCUSI (Genbank, Sw, Entrez medline 200)
GDB Gopher Other Genbank, Swiss-Prot, PIR, PDB, EMBL, Prosite, Blocks, Patents, Rebase etc)
CSC (Genbank, Swiss-Prot, PIR, PDB, EMBL, Entrez)
NIH (Genbank, Swiss-Prot, PIR, Prosite, Transcript Factors)
NIH GenoBase (Genbank, EMBL, SwisProt, EC, Prosite?, Selkov Enzym-Metab Path
harvard (Genbank, Swiss-Prot, PIR, GCG doc, Medline, etc)
IU (Genbank - PIR or Prosite or Swiss-Prot)
BioSCAN by AC# - codename (Genbank, Swiss-Prot, PIR)
also match seq or name to dbases.

Predicting Protein STRUCTURE from sequence
3DCrunch (200k seq vs PDB) - Robetta (David Baker UWa) - Quadratic Logistic (NIH) - PPS2 - descfold - UCLA-DOE Fold Recognition - ProFit superimpose - PsiPred - CBS - GPCR Viseur - GPCR expasy- GPCRdb - trans membrane (embl) - trans membrane (unil)?-

neuronet methods

BCM Secondary Stru - Rost MaxHomol (embl) - Protein db-Elba - Zvelebil 2ndary pred - SCRI 2ndary pred - cornell protein folding doc - UCLA fold recognition - stanford protein domain motion set - GBarton Sequence - Expasy: pI, MW, seq anal links - Martin's protein loops - side chain rotamers - Swissmodl (glaxo) & homology - CATH prot groups - Thornton UCL - Overington motifs - Miller predict - XABgen antibody homology - Martin's antibody modeling

BETAWRAP prediction of parallel B-helices (more common in toxins & disease surface proteins): MIT - PNASarticle

Homology requires ~25% sequence identity (Structure 4:1123 1996). Swissmodl (glaxo) & homology - SCWRL - ExpasyDeepview - Modeller - MolIDE - XABgen antibody homology - Rost MaxHomol (embl) - SegMod - SegMod validation - Look - Jackal
Loops: Martin's protein loops - side chain rotamers - ModLoop - VriendWhatif - SCWRL - Tasser

BLAST search of 929 human disease genes against Drosophila found 548 (representing 714 diseases) with high similarity in predicted amino acid sequences [Bier, Genome Res 2001 Jun;11(6):1114-25; PMID: 11381037]

homology of organisms - human mouse homology

Glycosylation prediction - GlycoNet

sesam struct groups

fssp prot fold families

memsat-hydrophobicity -

exptl membrane prediction

Structural Motifs; Threading:
Promotif, threader, etc (bsm) dali - PROSPECT - mailPROSPECTlist - joinPROSPECTlist (subscribe prospect-digest) - CN3D - raptorThreading -

PHYLOGENY: embl/SHOT - treejuxtaposer - OneZoom - ETH - EBI - HennigCladistics - Treeview - beast - TreeGen (cbrg darwin) - UATreeLifeTaxonEvol - UA Tree of Life - TreeofLife - phylo software rev - berkeley - berkeleyTree - gene linkage - ArboDraw
PHYLIP: pasteur doc - bimcore doc - UW source
PAUP: sinauer -

Primer design: Marburg - JHU - PathoGene

Misc Gluco-Amylase ref page protein disease db

Transcript Factors: nih tfd seq db - ncbi tfd db - CheaDB

PROTEIN Sequence Databases

EBI PICR Protein Cross Reference of 20 db

ENTREZ protein Provides species code name, 200 limit. Links to seq, fasta format, medline, neighbors, genbank.

PIR: prot & gene DB (Georgetown): genbank, Japan, Refbase
GDB PIR fields - GDB PIR keyword : Provides pir#, name, species. Can't keep result location. Links to sequence and further to BLAST/FASTA.
nih PIR (search by: term1 and term2) : Provides pir#, name, EC#, species. Links to sequence.
bchs PIR gopher?

SWISS-PROT: EMBL translations, PIR, refs to Prosite & PDB
Docs: expasy UI - emiliano - ebi - SwissInstBioinfo -
expasy recent - expasy text - expasy ac# Provides code name, EC#, species. Link to seq & on to PROSITE, medline, PRODOM, etc.
GDB Enzyme Provides ac# & species code name. Links to expasy sequence and can link further to prosite, Prodom, genbank, etc.
ui Provides ac#, name, EC#. Link to seq.

OWL (nonredundant Swiss-Prot, PIR) Provides enzyme name (nonunique). Links to header and can link further to sequence, PIR, EMBL, Genbank, Prosite, etc.

BioSCAN retrieve by AC# - code name . Provides fasta, staden or raw complete formats; BLOSUM/pam patterns.

ebi swprot+pir+gb+pdb

Genbank: gene seq (NCBI)
Gene Sequence DB (DOE): NCGR GSDB
Entrez nucleotides
EMBL nucl seq
Gene Maps: UniGene - mit - Whitehead mit - Coop Human Linkage Ctr maps - Genethon - by species(Nature) - arabidopsis(tigr) - beangenes(ND) - cotton(tamu) - maize(Mo) - rice(Jp) - soy(Ia) - grasses(usda) - cornvirus(ictv) - phytophthorafungus - SaccharomycesYeast(stanford) - bacteria - dairybacteria(Fr) - FUnctionalGene
Human gene maps: HumanProteinAtlas chromosomes & Ab - NCGR Sigma gene map - nci cancer genome - GeneLynx - gene servers - GDB - stanford hum genome ctr - UMich dna sequencing - NCBI - SNP(nih) - gene browser (Jim Kent, UCSC) - ensemble - AspAlt - to know ourselves (LBL gene) - cancergenemap - geneservice

SNP: SNP(nih) - SNPconsortium of 300k - IntlHumGenomeSeq Consortium 1400k - $Celera - $CuraGen - $Incyte
SNP screening: POLBAYES - SNPipeline

GeneOntology (GO)- controlled vocabulary of eukaryotes genes and proteins. EBI gene ontology - OntoDas - OntoDasdoc - GOrilla

GenomeOL - geneNotes

Gensat gene expression of mouse CNS

Restriction Enzymes: RE cuts - REBASE search - NEB Restrict enz - RE cutting

dbEST (ncbi): cDNA maps, assoc with Genbank dbEST TIGR Expressed Seq Tag db

Oligonucleotide calculations: Oligo info: mp, mass, etc oligo analysis - Buehler oligo mp, mw, etc - alces

Oligo data: RansomHill 800-262-8212

Codon usage: nakamura - IU codon - harvard codon

Molecular Biol/DNA

Studying a disease using mouse models may use a sequence of databases: mouseGeneInfo -> ensemble -> swisprot -> interpro -> unigene -> goldenPath -> kegg -> mouseGeneInfo -> geneOntology -> pubmed -> transfac -> pubmed

Other Gene Info Japan's genomenet human gene project - Nomenclature(HUGO) - HLA nomenclature - GeneCards (Weizmann) - geneclinics search - geneclinics index
LBL Chr21: human: LBL

Mouse geneDB: Jackson Lab - mouse backcross/mapping - Mouse Gene

Plant & pathogen geneDB: Ag Geneome - Plant Gene db - SchistoDB - Strep=Actinomyc - Stanford Saccharomyces Genome DB - Ecoli

Examples of protein family analysis & links: p450 seq & anal - glucoamylase - Lei - Seq Anal Course: de = mit usgs - houston - stanford Biol GenomeNet Japan (Medline, Genbank, Swisprot etc)

Molecular Biol/ Sequence Software: EGCG dozens of Prot-DNA prog - EuGene-RU account required - SAM-RU - Bimcore Software Tools - Sigma gene map - Gene Topographaer - GDB Mac access - Windows Protein Anal - houston Gene-server(Software, PIR, OWL, PC-dos, Mac, Unix) - Gilbert SeqApp-DNA, etc - UPenn - Stothard links - Stothard Jscripts - biosoft - chem&mbio - linux - OracleLifeSciDatamining

Bioinformatics courses: BrownNYU - NIH/NHGRI - HauslerUCSC - BourneUCSD - Yale

Bioinformatics: base4 Pharmatrix - incyteProteome(sbearj) - Hyseq genesolutions - NCGR - EBI - Weizmann - GersteinYale - UPune - AberystwythWales - Canberra - Peking - SoAfr - Celera - Synomics - LionBioscience - Genomica - NetGenics

Annotated Gene db: ERGOmicrobe - ERGOplant

