|
|
User guide
|
Overview:
FGF (Fishing Gene Family) is software finding gene family
in a genome by a protein sequence and giving the phylogenetic
tree of the family. It also gives the ka/ks ratio and stop
codon/frame shift in the duplication sequence through which
users can judge the fate of gene duplication. Data sets
can be analyzed by different criterion user selected in
finding gene family and there are several method for evolutionary
analysis (nj, ml for building phylogenetic tree method and
NG86, LWL85, MLWL85, LPB93, MLPB93, GY94, YN00 for calculating
ka/ks). Below is the manual for the online interface of
FGF.
The online interface:
This is the web format that you fill in e-mail, query protein sequence and select genome data.

- Email address: you should
input your e-mail address seriously and confirm it, because
it is the only approach to get your results.
- Input sequence: the
input sequence is Fasta-format protein sequence. It
can also be uploaded by a file if you click the "browse"
button. The number of input sequence is limited to a
maximum of 10 and the total length of input sequence
may be up to 5kb.
>Translation:CG8515-PA
MKVLILLVLLAISCQGQHHHQHQHQNVNNIPRDDKPDHHRHEDHRETSTWIPIIKYNKEQSD
DGSYKTEYETGNSIIHEETGFLKDFDTNPNGVLVQHGQYSYQSPEGTLVNVQYTADENGFR
ATGDHIPTPPAIPEEIQKGLDQIYAGIKLQQERLEQRAKTDPDFARKLEERRVANQNGQYIGLLENQ
- Database:database is the whole genome sequence.
It includes 26 complete genomes sequence and will be updated later.
You can select one in which finding gene family. The genomes and versions are listed below:
Vertebrate: Human-(hg18);Chimpanzee-(panTro2);
Mouse-(mm8);Rat-(rn4); Rhesus-(rheMac2);Dog-(canFam2);Cow-(bosTau2); Opossum-(monDom4);Chicken-(galGal3);
Zebrafish-(zebrafish4);
Fugu-(fr1); Tetraodon-(tetraodon1);gasterosteusAculeatus-(Gasterosteus_aculeatus41_1);
xenopusTropicalis-(xenTro2);
Others: Fruitfly-(dm2); Mosquito-(anoGam1);Silkworm;bee-(Amel_4.0);beetle-(Tcas2005);
D.simulans-(D.simulans); D.yakuba-(yak2);D.erecta-(droEre1);
Rice_1-(9311_genome_BGI_2003-08-01); Rice_2-(IRGSP_chromosomes_build04); Arabidopsis;Poplar-(plplar1);
Ciona-(Assembly v2.0); C.elegans-(ce2)
Some specie pictures are cited from Ensembl and JGI.
- Retrieve FGF result:Upon submission and completion of the task,you will receive an e-mail with a job ID for picking up the results.You can retrieve your result by putting in your jobid to the "FGF result" box.
- User register:You can freely register FGF user only by filling in your name, password, email and other information.You must assure the accuracy of the email for it’s the only approach to get result for registered user. Registered user can login by inputting user id and password.

Options
You can select "advance" to set up parameters of the program.

- Blast options: Expectation value, filter
query sequence and word size are blast parameters; usually
the expectation value is smaller than 10-5 for the accuracy
of blast result. The blast blocks are connected using
dynamic programming algorithm. The gap length is the
length between two blast blocks. It is usually the intron
size, so the maximum gap length is usually the longest
intron length.
- Genewise options: in this part you can set
up the score cutoff in genewise, the length cutoff (percent
of the aligned length occupy the protein length) and
identity (percent of matched AA occupy the aligned sequence).
If you submitt several proteins at the same time and the duplications of several proteins have overlap,
you can regard it as one protein’s duplication and filter
the others or ignore it. You can set the parameter "remove
redundant homologs", "T" means filtering redundant homolog
and "F" is ignoring it.
- Evolutionary options: you can select the
algorithm to build phylogenetic tree from nj and ml.
if using nj algorithm, distance (dn, ds, dm) can also
be selected. If you select ml algorithm, you can set
the substitution models (JC69, K2P, F81,HKY, F84,TN93,
GTR). There are seven methods to calculate ka/ks ratio
(NG86, LWL85, MLWL85, LPB93, MLPB93, GY94, YN00), you
can select one from them.
Illumination of Result
Other service
Methods supplementary description:
The dynamic programming algorithm:
Assume that we have a cDNA and what we want to know is the gene structure.
With no doubt, we can use GENEWISE to do these, but it too slow to accept in a genome-wide analysis. SO we use BLAST to locate the cDNA roughly in the genome. When we BLAST a query sequence to a target sequence, we will get several hits and each hit may represent a exon or a random match. So we want to link the hits together to form a longer alignment if they could, and that we can distinguish random match to a ¡°true¡± match.
We use dynamic programming to find an optimal gapped alignment consists with the BLAST hits under some rules below:
1. Each hit have a score which considers its length and identity. A hit
has a longer length and high identity has a bigger score.
2.The hits should not overlap with each other.
3.There is a score-penalty for the gaps between the hits. A longer gap
has a bigger score-penalty.
4.The alignment has the biggest score is what we want.
Reference
- Ohno S: Evolution by Gene Duplication. Berlin: Springer-Verlag
1970.
- Ferris SD, Whitt GS: Evolution of the differential
regulation of duplicate genes after polyploidization.
J Mol Evol 1979, 12(4):267-317.
- Iwabe N, Kuma K, Miyata T: Evolution of gene families
and relationship with organismal evolution: rapid divergence
of tissue-specific genes in the early evolution of chordates.
Mol Biol Evol 1996, 13(3):483-493.
- Lundin LG: Gene duplications in early metazoan evolution. Semin Cell Dev Biol 1999,
10(5):523-530.
- Friedman R, Hughes AL: Gene duplication and the
structure of eukaryotic genomes. Genome Res 2001,
11(3):373-381.
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang
Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST:
a new generation of protein database search programs.
Nucleic Acids Res 1997, 25(17):3389-3402.
- Wolfe KH, Shields DC: Molecular evidence for an
ancient duplication of the entire yeast genome. Nature
1997, 387(6634):708-713.
- Borrelli L, De Stasio R, Filosa S, Parisi E, Riggio
M, Scudiero R, Trinchella F: Evolutionary fate of duplicate
genes encoding aspartic proteinases. Nothepsin case
study. Gene 2006, 368:101-109.
- Harrison PM, Echols N, Gerstein MB: Digging for
dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans
genome. Nucleic Acids Res 2001, 29(3):818-830.
- Coin L, Durbin R: Improved techniques for the identification
of pseudogenes. Bioinformatics 2004, 20 Suppl 1:I94-I100.
- Taylor JS, Raes J: Duplication and divergence: the
evolution of new genes and old ideas. Annu Rev Genet
2004, 38:615-643.
- Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin
EV: Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res 2001,
11(4):555-565.
- Birney E, Clamp M, Durbin R: GeneWise and Genomewise.
Genome Res 2004, 14(5):988-995.
- Li H, Coghlan A, Ruan J, Coin LJ, Heriche JK, Osmotherly
L, Li R, Liu T, Zhang Z, Bolund L et al: TreeFam: a
curated database of phylogenetic trees of animal gene
families. Nucleic Acids Res 2006, 34(Database issue):D572-580.
- Saitou N, Nei M: The neighbor-joining method: a
new method for reconstructing phylogenetic trees. Mol Biol Evol 1987,
4(4):406-425.
- Nei M, Gojobori T: Simple methods for estimating
the numbers of synonymous and nonsynonymous nucleotide
substitutions. Mol Biol Evol 1986, 3(5):418-426.
- Yang Z: PAML: a program package for phylogenetic
analysis by maximum likelihood. Comput Appl Biosci 1997,
13(5):555-556.
- Zhang Z, Harrison PM, Liu Y, Gerstein M: Millions
of years of evolution preserved: a comprehensive catalog
of the processed pseudogenes in the human genome. Genome Res 2003,
13(12):2541-2558.
- Wang W, Zheng H, Fan C, Li J, Shi J, Cai Z, Zhang
G, Liu D, Zhang J, Vang S et al: High Rate of Chimeric
Gene Origination by Retroposition in Plant Genomes.
Plant cell 2006, in press.
|
|