Revised 14/10/2004 README for Inparanoid Eukaryotic Ortholog Groups data download directory ******************************************************************************** This a description of the data directory of the Inparanoid website. For more information on how to use the Inparanoid program as well as the online tool please go to "http://inparanoid.cgb.ki.se/ehelp.html" ******************************************************************************** The datasets present in the /data directory can be broken down in two main categories; A - Main Inparanoid section using Ensembl, UniProt and other datasets B - UniProt Only section ----------------------------------------------------------------------------- A - Main Inparanoid section using Ensembl, UniProt and other datasets ----------------------------------------------------------------------------- Protein sequence files in fasta format: -------------------------------------------- Name Description ensAG Ensembl-derived peptides for Anopheles gambiae ensCB Ensembl-derived peptides for Caenorhabditis briggsae ensCE Ensembl-derived peptides for Caenorhabditis elegans ensDM Ensembl-derived peptides for Drosophila melanogaster ensDR Ensembl-derived peptides for Danio rerio ensFR Ensembl-derived peptides for Takifugu rubripes ensGG Ensembl-derived peptides for Gallus gallus ensHS Ensembl-derived peptides for Homo sapiens ensMM Ensembl-derived peptides for Mus musculus ensPT Ensembl-derived peptides for Pan troglodytes ensRN Ensembl-derived peptides for Rattus norvegicus oryza Rice Genome consortium-derived peptides for Oryza sativa sanPF Sanger center-derived peptides for Plasmodium falciparum swtAT UniProt derived peptides for Arabidopsis thaliana swtEC UniProt derived peptides for Escherichia coli swtSC UniProt derived peptides for Saccharomyces cerevisiae swtSP UniProt derived peptides for Schizosaccharomyces pombe enall Combined fasta file containing all the above species (enall.phr/enall.pin/enall.psq can be used in combination to create a local blast library ) SQL tables used to construct the main Inparanoid Database ----------------------------------------------------------------------- Name Description sqltable_ensproteins Table of all Ensembl proteins used; Each column represents: 1 - Ensembl Peptide/translation identifier 2 - Ensembl gene identifier 3 - Species abbreviation (see fasta list above) 4 - Sequence type (“ens” denotes Ensembl) 5 - Gene/Protein description 6 - Source used by Ensembl and source identifier sqltable_oryproteins Table of all Rice genome consortium peptides used; Each column represents: 1 -Peptide identifier 2 - Gene identifier 3 - Species abbreviation (see fasta list above) 4 - Sequence type (“ory” denotes Rice genome) 5 - Gene/Protein description 6 - Source used by Consortium and source identifier sqltable_sanpfproteins Table of all Plasmodium falciparum peptides used; Each column represents: 1 - Peptide identifier 2 - Gene identifier 3 - Species abbreviation (see fasta list above) 4 - Sequence type (“san” denotes Sanger-derived) 5 - Gene/Protein description sqltable_swtproteins Table of all Uniprot peptides used in analyses. Each column represents: 1 - Uniprot/Swissprot identifier 2 - Uniprot acc no. 3 - Species abbreviation (see fasta list above) 4 - Sequence type (“swt” denotes UniProt) 5 - Gene/Protein description 6 - Source used by UniProt and source identifier sqltable_allext.txt Table of all External identifiers for all SQL protein entries Each column represents: 1 - Gene identifier 2 - Transcript identifier (Ensembl only) 3 - Peptide identifier 4 - External database identifier, e.g. HUGO, Flybase. 5 - Source identifier used by Ensembl e.g. Uniprot ID 6 - Uniprot/Swissprot identifier 7 - Uniprot acc no. 8 - Ensembl Family identifier (Ensembl only) 9 - Ensembl Family description (Ensembl only) 10 - Flybase identifier 11 - Gene/Protein description 12 - Species abbreviation (see fasta list above) Main section Inparanoid clustering results: ---------------------------------------------------- Name Description orthologs.?????-?????.html Output files containing all Inparanoid clusters for each species pair in html format. See species fasta file list above for species abbreviations. e.g. orthologs.ensHS-ensDM.html; All Inparanoid clusters between Homo sapiens and Drosophila melanogaster. sqltable.?????-????? Output files containing all Inparanoid clusters for each species pair in table format. See species fasta file list above for species abbreviations. e.g. sqltable.ensHS-ensCE; All Inparanoid clusters between Homo sapiens and Caenorhabditis elegans Each column represents: 1 - Cluster number 2 - Seed ortholog-pair blast score in bits 3 - Species abbreviation 4 - Inparanoid score 5 - Protein identifier sqltable.tar.gz tarball of sql clustering results for all species (including uniprot only) ----------------------------------------------------------------------------- B - UniProt Only section ----------------------------------------------------------------------------- Protein sequence files in fasta format: -------------------------------------------- Name Description AT.fas UniProt derived peptides for Arabidopsis thaliana CE.fas UniProt derived peptides for Caenorhabditis briggsae DM.fas UniProt derived peptides for Drosophila melanogaster EC.fas UniProt derived peptides for Escherichia coli HS.fas UniProt derived peptides for Homo sapiens MM.fas UniProt derived peptides for Mus musculus SC.fas UniProt derived peptides for Saccharomyces cerevisiae all Combined fasta file containing all the above species (all.phr/all.pin/all.psq can be used in combination to create a local blast library ) SQL tables used to construct UniProt only section ----------------------------------------------------------------------- Name Description all.sql Table of all Uniprot peptides used; Each column represents: 1 - Uniprot/Swissprot identifier 2 - Uniprot acc no. 3 - Gene/Protein description UniProt Only section Inparanoid clustering results: ----------------------------------------------------------- Name Description orthologs.??-??.html Output files containing all Inparanoid clusters for each species pair in html format. See species fasta file list above for species abbreviations. e.g. orthologs.HS-DM.html; All Inparanoid clusters between Homo sapiens and Drosophila melanogaster. sqltable.??-?? Output files containing all Inparanoid clusters for each species pair in table format. See species fasta file list above for species abbreviations. e.g. sqltable.HS-CE; All Inparanoid clusters between Homo sapiens and Caenorhabditis elegans Each column represents: 1 - Cluster number 2 - Seed ortholog-pair blast score in bits 3 - Species abbreviation 4 - Inparanoid score 5 - Protein identifier sqltable.tar.gz tarball of sql clustering results for all species (including Ensembl results)