User guide

Overview:

FGF (Fishing Gene Family) is software finding gene family in a genome by a protein sequence and giving the phylogenetic tree of the family. It also gives the ka/ks ratio and stop codon/frame shift in the duplication sequence through which users can judge the fate of gene duplication. Data sets can be analyzed by different criterion user selected in finding gene family and there are several method for evolutionary analysis (nj, ml for building phylogenetic tree method and NG86, LWL85, MLWL85, LPB93, MLPB93, GY94, YN00 for calculating ka/ks). Below is the manual for the online interface of FGF.

The online interface:

This is the web format that you fill in e-mail, query protein sequence and select genome data.

  • Email address: you should input your e-mail address seriously and confirm it, because it is the only approach to get your results.
  • Input sequence: the input sequence is Fasta-format protein sequence. It can also be uploaded by a file if you click the "browse" button. The number of input sequence is limited to a maximum of 10 and the total length of input sequence may be up to 5kb.

    >Translation:CG8515-PA
    MKVLILLVLLAISCQGQHHHQHQHQNVNNIPRDDKPDHHRHEDHRETSTWIPIIKYNKEQSD
    DGSYKTEYETGNSIIHEETGFLKDFDTNPNGVLVQHGQYSYQSPEGTLVNVQYTADENGFR
    ATGDHIPTPPAIPEEIQKGLDQIYAGIKLQQERLEQRAKTDPDFARKLEERRVANQNGQYIGLLENQ
  • Database:database is the whole genome sequence. It includes 26 complete genomes sequence and will be updated later. You can select one in which finding gene family. The genomes and versions are listed below:

    Vertebrate: Human-(hg18);Chimpanzee-(panTro2); Mouse-(mm8);Rat-(rn4); Rhesus-(rheMac2);Dog-(canFam2);Cow-(bosTau2); Opossum-(monDom4);Chicken-(galGal3); Zebrafish-(zebrafish4); Fugu-(fr1); Tetraodon-(tetraodon1);gasterosteusAculeatus-(Gasterosteus_aculeatus41_1); xenopusTropicalis-(xenTro2);
    Others: Fruitfly-(dm2); Mosquito-(anoGam1);Silkworm;bee-(Amel_4.0);beetle-(Tcas2005); D.simulans-(D.simulans); D.yakuba-(yak2);D.erecta-(droEre1); Rice_1-(9311_genome_BGI_2003-08-01); Rice_2-(IRGSP_chromosomes_build04); Arabidopsis;Poplar-(plplar1); Ciona-(Assembly v2.0); C.elegans-(ce2)
    Some specie pictures are cited from Ensembl and JGI.
     
  • Retrieve FGF result:Upon submission and completion of the task,you will receive an e-mail with a job ID for picking up the results.You can retrieve your result by putting in your jobid to the "FGF result" box.
     
  • User register:You can freely register FGF user only by filling in your name, password, email and other information.You must assure the accuracy of the email for it’s the only approach to get result for registered user. Registered user can login by inputting user id and password.

Options

You can select "advance" to set up parameters of the program.

  • Blast options: Expectation value, filter query sequence and word size are blast parameters; usually the expectation value is smaller than 10-5 for the accuracy of blast result. The blast blocks are connected using dynamic programming algorithm. The gap length is the length between two blast blocks. It is usually the intron size, so the maximum gap length is usually the longest intron length.
  • Genewise options: in this part you can set up the score cutoff in genewise, the length cutoff (percent of the aligned length occupy the protein length) and identity (percent of matched AA occupy the aligned sequence). If you submitt several proteins at the same time and the duplications of several proteins have overlap, you can regard it as one protein’s duplication and filter the others or ignore it. You can set the parameter "remove redundant homologs", "T" means filtering redundant homolog and "F" is ignoring it.
  • Evolutionary options: you can select the algorithm to build phylogenetic tree from nj and ml. if using nj algorithm, distance (dn, ds, dm) can also be selected. If you select ml algorithm, you can set the substitution models (JC69, K2P, F81,HKY, F84,TN93, GTR). There are seven methods to calculate ka/ks ratio (NG86, LWL85, MLWL85, LPB93, MLPB93, GY94, YN00), you can select one from them.

Illumination of Result

Other service

  • Methods supplementary description:
    The dynamic programming algorithm: Assume that we have a cDNA and what we want to know is the gene structure. With no doubt, we can use GENEWISE to do these, but it too slow to accept in a genome-wide analysis. SO we use BLAST to locate the cDNA roughly in the genome. When we BLAST a query sequence to a target sequence, we will get several hits and each hit may represent a exon or a random match. So we want to link the hits together to form a longer alignment if they could, and that we can distinguish random match to a ¡°true¡± match. We use dynamic programming to find an optimal gapped alignment consists with the BLAST hits under some rules below:
    1. Each hit have a score which considers its length and identity. A hit has a longer length and high identity has a bigger score.
    2.The hits should not overlap with each other.
    3.There is a score-penalty for the gaps between the hits. A longer gap has a bigger score-penalty.
    4.The alignment has the biggest score is what we want.

    Reference

    1. Ohno S: Evolution by Gene Duplication. Berlin: Springer-Verlag 1970.
    2. Ferris SD, Whitt GS: Evolution of the differential regulation of duplicate genes after polyploidization. J Mol Evol 1979, 12(4):267-317.
    3. Iwabe N, Kuma K, Miyata T: Evolution of gene families and relationship with organismal evolution: rapid divergence of tissue-specific genes in the early evolution of chordates. Mol Biol Evol 1996, 13(3):483-493.
    4. Lundin LG: Gene duplications in early metazoan evolution. Semin Cell Dev Biol 1999, 10(5):523-530.
    5. Friedman R, Hughes AL: Gene duplication and the structure of eukaryotic genomes. Genome Res 2001, 11(3):373-381.
    6. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402.
    7. Wolfe KH, Shields DC: Molecular evidence for an ancient duplication of the entire yeast genome. Nature 1997, 387(6634):708-713.
    8. Borrelli L, De Stasio R, Filosa S, Parisi E, Riggio M, Scudiero R, Trinchella F: Evolutionary fate of duplicate genes encoding aspartic proteinases. Nothepsin case study. Gene 2006, 368:101-109.
    9. Harrison PM, Echols N, Gerstein MB: Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. Nucleic Acids Res 2001, 29(3):818-830.
    10. Coin L, Durbin R: Improved techniques for the identification of pseudogenes. Bioinformatics 2004, 20 Suppl 1:I94-I100.
    11. Taylor JS, Raes J: Duplication and divergence: the evolution of new genes and old ideas. Annu Rev Genet 2004, 38:615-643.
    12. Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV: Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res 2001, 11(4):555-565.
    13. Birney E, Clamp M, Durbin R: GeneWise and Genomewise. Genome Res 2004, 14(5):988-995.
    14. Li H, Coghlan A, Ruan J, Coin LJ, Heriche JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L et al: TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res 2006, 34(Database issue):D572-580.
    15. Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 1987, 4(4):406-425.
    16. Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 1986, 3(5):418-426.
    17. Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 1997, 13(5):555-556.
    18. Zhang Z, Harrison PM, Liu Y, Gerstein M: Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res 2003, 13(12):2541-2558.
    19. Wang W, Zheng H, Fan C, Li J, Shi J, Cai Z, Zhang G, Liu D, Zhang J, Vang S et al: High Rate of Chimeric Gene Origination by Retroposition in Plant Genomes. Plant cell 2006, in press.

     

  •     BGI FGF team, all right reserved. Feb. 20th, 2006 Master: fgf@genomics.org.cn