SciCrunch Registry is a curated repository of scientific resources, with a focus on biomedical resources, including tools, databases, and core facilities - visit SciCrunch to register your resource.
Data and tools for studying the function of DNA sequences, with an emphasis on those involved in the production of hemoglobin. It includes information about naturally-occurring human hemoglobin mutations and their effects, experimental data related to the regulation of the beta-like globin gene cluster, and software tools for comparing sequences with one another to discover regions that are likely to play significant roles.
Proper citation: Globin Gene Server (RRID:SCR_001480) Copy
http://www.animalgenome.org/cgi-bin/QTLdb/index
Database of trait mapping data, i.e. QTL (phenotype / expression, eQTL), candidate gene and association data (GWAS) and copy number variations (CNV) mapped to livestock animal genomes, to facilitate locating and comparing discoveries within and between species. New data and database tools are continually developed to align various trait mapping data to map-based genome features, such as annotated genes. QTLdb is open to house QTL/association date from other animal species where feasible. Most scientific journals require that any original QTL/association data be deposited into public databases before paper may be accepted for publication. User curator accounts are provided for direct data deposit. Users can download QTLdb data from each species or individual chromosome.
Proper citation: Animal QTLdb (RRID:SCR_001748) Copy
https://www.hgsc.bcm.edu/content/sea-urchin-genome-project
Provides informationa about Genome of California Purple Sea Urchin, one species (Strongylocentrotus purpuratus) of which has been sequenced and annotated by Sea Urchin Genome Sequencing Consortium led by HGSC. Reports sequence and analysis of genome of sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology.
Proper citation: Sea Urchin Genome Project (RRID:SCR_001735) Copy
http://www.sanbi.ac.za/resources/
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 23, 2022. The South African National Bioinformatics Institute delivers biomedical discovery appropriate to both international and African context. Researchers at SANBI perform the highest level of research and provide excellence in education. Research at SANBI has set well recognized milestones in the field of computational biology. The tools and techniques used have not only been developed but also implemented across heterogeneous domains of advanced research. Local and international efforts have driven our discoveries. Until recently, the core of SANBIs research has focused upon gene expression biology. Methods developed and applied at SANBI revolve around a greater understanding of the underlying causes of diseases. SANBI approaches the problem by comparison of genes, genomes and transcriptomes. It uses computational gene expression biology to create novel biological insights and to provide biomarkers for experimental validation. It also performs analysis of human genome variation, transcriptional diversity on both the expression and splicing level and the unravelling of transcriptional regulatory networks. Resources - Hinv, STACKdb, Malaria resources and Trypanosome databases are available for on-line seaching. - SANBI offers WCD, STACKdb, stackPACK and eVOC and the eVOKE viewer as tools that can be downloaded. Sponsors: SANBI receives funding and support from a range of organisations in South Africa and Internationally. Organisations currently supporting SANBI include: South Africa * South African Medical Research Council * South African AIDS Vaccine Initiative * National Bioinformatics Network * National Research Foundation * Claude Leon Foundation * International Business Machines Inc. Europe * European Unions 6th Framework Programme * World Health Organization USA * US National Institutes of Health * Fogarty International Centre * Ludwig Institute for Cancer Research
Proper citation: South African National Bioinformatics Institute: Resources (RRID:SCR_001867) Copy
http://www-genome.stanford.edu/
This resource hyperlinks to systematic analysis projects, resources, laboratories, and departments at Stanford University.
Proper citation: Stanford Genomic Resourses (RRID:SCR_001874) Copy
Suite of motif-based sequence analysis tools to discover motifs using MEME, DREME (DNA only) or GLAM2 on groups of related DNA or protein sequences; search sequence databases with motifs using MAST, FIMO, MCAST or GLAM2SCAN; compare a motif to all motifs in a database of motifs; associate motifs with Gene Ontology terms via their putative target genes, and analyze motif enrichment using SpaMo or CentriMo. Source code, binaries and a web server are freely available for noncommercial use.
Proper citation: MEME Suite - Motif-based sequence analysis tools (RRID:SCR_001783) Copy
http://sammeth.net/confluence/display/ASTA/2+-+Download
Tool that extracts and displays alternative splicing (AS) events from a given genomic annotation of exon-intron gene coordinates. By comparing all given transcripts, it detects the variations in their splicing structure and identifies all AS events (like exon skipping, alternate donor, etc) by assigning to each of them an AS code. It provides a visual summary of the AS landscape in the analyzed dataset, the possibility to browse the results on the UCSC website or to download them in GTF or ASTA format. You can use AStalavista for any genome by providing your own annotation set, the identifier of your gene(s) of interest, or analyze the AS landscape of reference annotation datasets like Gencode, RefSeq, Ensembl, FlyBase, etc.
Proper citation: AStalavista (RRID:SCR_001815) Copy
Database of genetic and molecular biological information about the filamentous fungi of the genus Aspergillus including information about genes and proteins of Aspergillus nidulans and Aspergillus fumigatus; descriptions and classifications of their biological roles, molecular functions, and subcellular localizations; gene, protein, and chromosome sequence information; tools for analysis and comparison of sequences; and links to literature information; as well as a multispecies comparative genomics browser tool (Sybil) for exploration of orthology and synteny across multiple sequenced Sgenus species. Also available are Gene Ontology (GO) and community resources. Based on the Candida Genome Database, the Aspergillus Genome Database is a resource for genomic sequence data and gene and protein information for Aspergilli. Among its many species, the genus contains an excellent model organism (A. nidulans, or its teleomorph Emericella nidulans), an important pathogen of the immunocompromised (A. fumigatus), an agriculturally important toxin producer (A. flavus), and two species used in industrial processes (A. niger and A. oryzae). Search options allow you to: *Search AspGD database using keywords. *Find chromosomal features that match specific properties or annotations. *Find AspGD web pages using keywords located on the page. *Find information on one gene from many databases. *Search for keywords related to a phenotype (e.g., conidiation), an allele (such as veA1), or an experimental condition (e.g., light). Analysis and Tools allow you to: *Find similarities between a sequence of interest and Aspergillus DNA or protein sequences. *Display and analyze an Aspergillus sequence (or other sequence) in many ways. *Navigate the chromosomes set. View nucleotide and protein sequence. *Find short DNA/protein sequence matches in Aspergillus. *Design sequencing and PCR primers for Aspergillus or other input sequences. *Display the restriction map for a Aspergillus or other input sequence. *Find similarities between a sequence of interest and fungal nucleotide or protein sequences. AspGD welcomes data submissions.
Proper citation: ASPGD (RRID:SCR_002047) Copy
Consortium of 50 research groups across the UK to harness the power of newly-available genotyping technologies to improve our understanding of the aetiological basis of several major causes of global disease. The consortium has gathered genotype data for up to 500,000 sites of genome sequence variation (single nucleotide polymorphisms or SNPs) in samples ascertained for the disease phenotypes. Analysis of the genome-wide association data generated has lead to the identification of many SNPs and genes showing evidence of association with disease susceptibility, some of which will be followed up in future studies. In addition, the Consortium has gained important insights into the technical, analytical, methodological and biological aspects of genome-wide association analysis. The core of the study comprised an analysis of 2,000 samples from each of seven diseases (type 1 diabetes, type 2 diabetes, coronary heart disease, hypertension, bipolar disorder, rheumatoid arthritis and Crohn's disease). For each disease, the case samples have been ascertained from sites widely distributed across Great Britain, allowing us to obtain considerable efficiencies by comparing each of these case populations to a common set of 3,000 nationally-ascertained controls also from England, Scotland and Wales. These controls come from two sources: 1,500 are representative samples from the 1958 British Birth Cohort and 1,500 are blood donors recruited by the three national UK Blood Services. One of the questions that the WTCCC study has addressed relates to the relative merits of these alternative strategies for the generation of representative population cohorts. Genotyping for this main Case Control study was conducted by Affymetrix using the (commercial) Affymetrix 500K chip. As part of this study a total of 17,000 samples were typed for 500,000 SNPs. There are two additional components to the study. First, the WTCCC award is part-funding a study of host resistance to infectious diseases in African populations. The same approach has been used to type 2,000 cases of tuberculosis (TB) and 2,000 cases of malaria, as well as 2,000 shared controls. As well as addressing diseases of major global significance, and extending WTCCC coverage into the area of infectious disease, the inclusion of samples of African origin has obvious benefits with respect to methodological aspects of genome-wide association analysis. Second, the WTCCC has, for four additional diseases (autoimmune thyroid disease, breast cancer, ankylosing spondylitis, multiple sclerosis), completed an analysis of 15,000 SNPs designed to represent a large proportion of the known non-synonymous coding SNPs across the genome. This analysis has been performed at the WTSI using a custom Infinium chip (Illumina). Data release The genotypic data of the control samples (1958 British Birth Cohort and UK Blood Service) and from seven diseases analyzed in the main study are now available to qualified researchers. Summary genotype statistics for these collections are available directly from the website. Access to the individual-level genotype data and summary genotype statistics is by application to the Consortium Data Access Committee (CDAC) and approval subject to a Data Access Agreement. WTCCC2: A further round of GWA studies were funded in April 2008. These include 15 WTCCC-collaborative studies and 12 independent studies be supported totaling approximately 120,000 samples. Many of the studies represent major international collaborative networks that have together assembled large sample collections. WTCCC2 will perform genome-wide association studies in 13 disease conditions: Ankylosing spondylitis, Barrett's oesophagus and oesophageal adenocarcinoma, glaucoma, ischaemic stroke, multiple sclerosis, pre-eclampsia, Parkinson's disease, psychosis endophenotypes, psoriasis, schizophrenia, ulcerative colitis and visceral leishmaniasis. WTCCC2 will also investigate the genetics of reading and mathematics abilities in children and the pharmacogenomics of statin response. Over 60,000 samples will be analyzed using either the Affymetrix v6.0 chip or the Illumina 660K chip. The WTCCC2 will also genotype 3,000 controls each from the 1958 British Birth cohort and the UK Blood Service control group, and the 6,000 controls will be genotyped on both the Affymetrix v6.0 and Illumina 1.2M chips. WTCCC3: The Wellcome Trust has provided support for a further round of GWA studies in January 2009. These include 5 WTCCC-collaborative studies to be carried out in WTCCC3 and 5 independent studies, across a range of diseases. Many of the studies represent major international collaborative networks that have together assembled large sample collections. WTCCC3 will perform genome-wide association studies in the following 4 disease conditions: primary biliary cirrhosis, anorexia nervosa, pre-eclampsia in UK subjects, and the interactions between donor and recipient DNA related to early and late renal transplant dysfunction. The WTCCC3 will also carry out a pilot in a study of the genetics of host control of HIV-1 infection. Over 40,000 samples will be analyzed using the Illumina 660K chip. The WTCCC3 will utilize the 6,000 control genotypes generated by the WTCCC2.
Proper citation: Wellcome Trust Case Control Consortium (RRID:SCR_001973) Copy
The UCLA-DOE Institute for Genomics and Proteomics carries out research in bioenergy, structural biology, genomics and proteomics, consistent with the research mission of the United States Department of Energy. Major interests of the 12 Principal Investigators and 9 Associate Members include systems approaches to organisms, structural biology, bioinformatics, and bioenergetic systems. The Institute sponsors 5 Core Technology Centers, for X-ray and NMR structural determination, bioinformatics and computation, protein expression and purification, and biochemical instrumentation. Services offered by this Institute: - Databases: * DIP (The Database of Interacting Proteins): The DIPTM database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. * ProLinks Database of Functional Linkages: The Prolinks database is a collection of inference methods used to predict functional linkages between proteins. These methods include the Phylogenetic Profile method which uses the presence and absence of proteins across multiple genomes to detect functional linkages; the Gene Cluster method, which uses genome proximity to predict functional linkage; Rosetta Stone, which uses a gene fusion event in a second organism to infer functional relatedness; and the Gene Neighbor method, which uses both gene proximity and phylogenetic distribution to infer linkage. - Data-to-Structure Servers: * SAVEs Structure Verification Server * Merohedral Twinning Test Server * SER Surface Entropy Reduction Server * VERIFY3D Structure Verification Server * ERRAT Structure Verification Server - Structure-to-Function Servers: * ProKnow Protein Functionator * Hot Patch Functional Site Locator
Proper citation: University of California at Los Angeles - Department of Energy Institute for Genomics and Proteomics (RRID:SCR_001921) Copy
https://code.google.com/p/tbrowse/
Software providing a HTML5/javascript based browser for visualizing RNA-seq results in the familiar track layout of common genome browser. But given the quantitative nature of RNA-seq data, in addition to visualizing sequence coverage, the browser quantitates transcript abundance across regions of interest. The HTML5 functionality is made of use to render all the tracks using the canvas drawing element. This greatly reduces the load on servers and allows for rich interactive graphics without the need for third-party plugins. Furthermore, this framework completely segregates data from visualization, making development much easier. The browser is designed to run on all modern browsers: Firefox, Safari, Chrome, Opera and Internet Explorer (though not recommended).
Proper citation: tbrowse (RRID:SCR_001918) Copy
Multi-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments. Mass spectrometer output files are collected for human, mouse, yeast, and several other organisms, and searched using the latest search engines and protein sequences. All results of sequence and spectral library searching are subsequently processed through the Trans Proteomic Pipeline to derive a probability of correct identification for all results in a uniform manner to insure a high quality database, along with false discovery rates at the whole atlas level. The raw data, search results, and full builds can be downloaded for other uses. All results of sequence searching are processed through PeptideProphet to derive a probability of correct identification for all results in a uniform manner ensuring a high quality database. All peptides are mapped to Ensembl and can be viewed as custom tracks on the Ensembl genome browser. The long term goal of the project is full annotation of eukaryotic genomes through a thorough validation of expressed proteins. The PeptideAtlas provides a method and a framework to accommodate proteome information coming from high-throughput proteomics technologies. The online database administers experimental data in the public domain. You are encouraged to contribute to the database.
Proper citation: PeptideAtlas (RRID:SCR_006783) Copy
http://www.ensemblgenomes.org/
Database portal offering integrated access to genome-scale data from non-vertebrate species of scientific interest, developed using the Ensembl genome annotation and visualization platform. Ensembl Genomes consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the availability of vertebrate genomes in Ensembl. Many of the databases supporting the portal have been built in close collaboration with the scientific community - essential for maintaining the accuracy and usefulness of the resource. A common set of user interfaces (which include a graphical genome browser, FTP, BLAST search, a query optimized data warehouse, programmatic access, and a Perl API) is provided for all domains. Data types incorporated include annotation of (protein and non-protein coding) genes, cross references to external resources, and high throughput experimental data (e.g. data from large scale studies of gene expression and polymorphism visualized in their genomic context). Additionally, extensive comparative analysis has been performed, both within defined clades and across the wider taxonomy, and sequence alignments and gene trees resulting from this can be accessed through the site.
Proper citation: Ensembl Genomes (RRID:SCR_006773) Copy
Curated collection of known Drosophila transcriptional cis-regulatory modules (CRMs) and transcription factor binding sites (TFBSs). Includes experimentally verified fly regulatory elements along with their DNA sequence, associated genes, and expression patterns they direct. Submission of experimentally verified cis-regulatory elements that are not included in REDfly database are welcome.
Proper citation: REDfly Regulatory Element Database for Drosophilia (RRID:SCR_006790) Copy
https://github.com/friend1ws/EBCall
A software package for somatic mutation detection (including InDels). EBCall uses not only paired tumor/normal sequence data of a target sample, but also multiple non-paired normal reference samples for evaluating distribution of sequencing errors, which leads to an accurate mutaiton detection even in case of low sequencing depths and low allele frequencies.
Proper citation: EBCall (RRID:SCR_006791) Copy
The goals of Antibiotic Resistance Genes Database (ARGB) are to provide a centralized compendium of information on antibiotic resistance, to facilitate the consistent annotation of resistance information in newly sequenced organisms, and also to facilitate the identification and characterization of new genes. ARGB contains six types of database groups: - Resistance Type: This database contains information, such as resistance profile, mechanism, requirement, epidemiology for each type. - Resistance Gene: This database contains information, such as resistance profile, resistance type, requirement, protein and DNA sequence for each gene.This database only includes NON-REDUNDANT, NON-VECTOR, COMPLETE genes. - Antibiotic: This database contains information, such as producer, action mechanism, resistance type, for each gene. - Resistance Gene(NonRD): This database contains the same information as Resistance Gene. It does NOT include NON-REDUNDANT, NON-VECTOR genes, but includes INCOMPLETE genes. - Resistance Gene(ALL): This database contains the same information as Resistance Gene. It includes all REDUNDANT, VECTOR AND INCOMPLETE genes. - Resistance Species: This database contains resistance profile and corresponding resistance genes for each species. Furthermore, ARDB also contians three types BLAST database: - Resistance Genes Complete: Contains only NON-REDUNDANT, NON-VECTOR, COMPLETE genes sequences. - Resistance Genes Non-redundant: Contains NON-REDUNDANT, NON-VECTOR, COMPLETE, INCOMPLETE genes sequences. - Resistance Genes All: Contains all REDUNDANT, VECTOR, COMPLETE, INCOMPLETE genes sequences. Lastly, ARDB provides four types of Analytical tools: - Normal BLAST: This function allows an user to input a DNA or protein sequence, and find similar DNA (Nucleotide BLAST) or protein (Protein BLAST) sequences using blastn, blastp, blastx, tblastn, tblastx - RPS BLAST: A web RPSBLAST (RPS BLAST) interface is provided to align a query sequence against the Position Specific Scoring Matrix (PSSM) for each type. Normally, this will give the same annotation information as using regular BLAST mentioned above. - Multiple Sequences BLAST (Genome Annotation): This function allows an user to annotate multiple (less than 5000) query sequences in FASTA format. - Mutation Resistance Identification: This function allows an user to identify mutations that will cause potential antibiotic resistance, for 12 genes (16S rRNA, 23S rRNA, gyrA, gyrB, parC, parE, rpoB, katG, pncA, embB, folP, dfr). ������ :Sponsors: ARDB is funded by Uniformed Services University of the Health Sciences, administered by the Henry Jackson Foundation. :
Proper citation: Antibiotic Resistance Genes Database (RRID:SCR_007040) Copy
This genomic tRNA database contains tRNA gene predictions made by the program tRNAscan-SE (Lowe & Eddy, Nucl Acids Res 25: 955-964, 1997) on complete or nearly complete genomes. Unless otherwise noted, all annotation is automated, and has not been inspected for agreement with published literature. Transfer RNAs (tRNAs) represent the single largest, best-understood class of non-protein coding RNA genes found in all living organisms. By far, the major source of new tRNAs is computational identification of genes within newly sequenced genomes. To organize the rapidly growing collection and enable systematic analyses, we created the Genomic tRNA Database (GtRNAdb). The web resource provides overview statistics of tRNA genes within each analyzed genome, including information by isotype and genetic locus, easily downloadable primary sequences, graphical secondary structures and multiple sequence alignments. Direct links for each gene to UCSC eukaryotic and microbial genome browsers provide graphical display of tRNA genes in the context of all other local genetic information. The database can be searched by primary sequence similarity, tRNA characteristics or phylogenetic group. Inevitably with automated sequence analysis, we find exceptions to general identification rules, isoacceptor type predictions (esp. due to variable post-transcriptional anticodon modification), and questionable tRNA identifications (due to pseudogenes, SINES, or other tRNA-derived elements). We attempt to document all cases we come across, and welcome feedback on new or unrecognized discrepancies.
Proper citation: GtRNAdb - Genomic tRNA Database (RRID:SCR_006939) Copy
http://www.nervenet.org/main/dictionary.html
A mouse-related portal of genomic databases and tables of mouse brain data. Most files are intended for you to download and use on your own personal computer. Most files are available in generic text format or as FileMaker Pro databases. The server provides data extracted and compiled from: The 2000-2001 Mouse Chromosome Committee Reports, Release 15 of the MIT microsatellite map (Oct 1997), The recombinant inbred strain database of R.W. Elliott (1997) and R. W. Williams (2001), and the Map Manager and text format chromosome maps (Apr 2001). * LXS genotype (Excel file): Updated, revised positions for 330 markers genotyped using a panel of 77 LXS strain. * MIT SNP DATABASE ONLINE: Search and sort the MIT Single Nucleotide Polymorphism (SNP) database ONLINE. These data from the MIT-Whitehead SNP release of December 1999. * INTEGRATED MIT-ROCHE SNP DATABASE in EXCEL and TEXT FORMATS (1-3 MB): Original MIT SNPs merged with the new Roche SNPs. The Excel file has been formatted to illustrate SNP haplotypes and genetic contrasts. Both files are intended for statistical analyses of SNPs and can be used to test a method outlined in a paper by Andrew Grupe, Gary Peltz, and colleagues (Science 291: 1915-1918, 2001). The Excel file includes many useful equations and formatting that will help in navigating through this large database and in testing the in silico mapping method. * Use of inbred strains for the study of individual differences in pain related phenotypes in the mouse: Elissa J. Chesler''s 2002 dissertation, discussing issues relevant to the integration of genomic and phenomic data from standard inbred strains including genetic interactions with laboratory environmental conditions and the use of various in silico inbred strain haplotype based mapping algorithms for QTL analysis. * SNP QTL MAPPER in EXCEL format (572 KB, updated January 2002 by Elissa Chesler): This Excel workbook implements the Grupe et al. mapping method and outputs correlation plots. The main spreadsheet allows you to enter your own strain data and compares them to haplotypes. Be very cautious and skeptical when using this spreadsheet and the technique. Read all of the caveates. This excel version of the method was developed by Elissa Chesler. This updated version (Jan 2002) handles missing data. * MIT SNP Database (tab-delimited text format): This file is suitable for manipulation in statistics and spreadsheet programs (752 KB, Updated June 27, 2001). Data have been formatted in a way that allows rapid acquisition of the new data from the Roche Bioscience SNP database. * MIT SNP Database (FileMaker 5 Version): This is a reformatted version of the MIT Single Nucleotide Polymorphism (SNP) database in FileMaker 5 format. You will need a copy of this application to open the file (Mac and Windows; 992 KB. Updated July 13, 2001 by RW). * Gene Mapping and Map Manager Data Sets: Genetic maps of mouse chromosomes. Now includes a 10th generation advanced intercross consisting of 500 animals genetoyped at 340 markers. Lots of older files on recombinant inbred strains. * The Portable Dictionary of the Mouse Genome, 21,039 loci, 17,912,832 bytes. Includes all 1997-98 Chromosome Committee Reports and MIT Release 15. * FullDict.FMP.sit: The Portable Dictionary of the Mouse Genome. This large FileMaker Pro 3.0/4.0 database has been compressed with StuffIt. The Dictionary of the Mouse Genome contains data from the 1997-98 chromosome committee reports and MIT Whitehead SSLP databases (Release 15). The Dictionary contains information for 21,039 loci. File size = 4846 KB. Updated March 19, 1998. * MIT Microsatellite Database ONLINE: A database of MIT microsatellite loci in the mouse. Use this FileMaker Pro database with OurPrimersDB. MITDB is a subset of the Portable Dictionary of the Mouse Genome. ONLINE. Updated July 12, 2001. * MIT Microsatellite Database: A database of MIT microsatellite loci in the mouse. Use this FileMaker Pro database with OurPrimersDB. MITDB is a subset of the Portable Dictionary of the Mouse Genome. File size = 3.0 MB. Updated March 19, 1998. * OurPrimersDB: A small database of primers. Download this database if you are using numerous MIT primers to map genes in mice. This database should be used in combination with the MITDB as one part of a relational database. File size = 149 KB. Updated March 19, 1998. * Empty copy (clone) of the Portable Dictionary in FileMaker Pro 3.0 format. Download this file and import individual chromosome text files from the table into the database. File size = 231 KB. Updated March 19, 1998. * Chromosome Text Files from the Dictionary: The table lists data on gene loci for individual chromosomes.
Proper citation: Mouse Genome Databases (RRID:SCR_007147) Copy
http://www.genoscope.cns.fr/externe/tetraodon/
The initial objective of Genoscope was to compare the genomic sequences of this fish to that of humans to help in the annotation of human genes and to estimate their number. This strategy is based on the common genetic heritage of the vertebrates: from one species of vertebrate to another, even for those as far apart as a fish and a mammal, the same genes are present for the most part. In the case of the compact genome of Tetraodon, this common complement of genes is contained in a genome eight times smaller than that of humans. Although the length of the exons is similar in these two species, the size of the introns and the intergenic sequences is greatly reduced in this fish. Furthermore, these regions, in contrast to the exons, have diverged completely since the separation of the lineages leading to humans and Tetraodon. The Exofish method, developed at Genoscope, exploits this contrast such that the conserved regions which can be identified by comparing genomic sequences of the two species, correspond only to coding regions. Using preliminary sequencing results of the genome of Tetraodon in the year 2000, Genoscope evaluated the number of human genes at about 30,000, whereas much higher estimations were current. The progress of the annotation of the human genome has since supported the Genoscope hypothesis, with values as low as 22,000 genes and a consensus of around 25,000 genes. The sequencing of the Tetraodon genome at a depth of about 8X, carried out as a collaboration between Genoscope and the Whitehead Institute Center for Genome Research (now the Broad Institute), was finished in 2002, with the production of an assembly covering 90 of the euchromatic region of the genome of the fish. This has permitted the application of Exofish at a larger scale in comparisons with the genome of humans, but also with those of the two other vertebrates sequenced at the time (Takifugu, a fish closely related to Tetraodon, and the mouse). The conserved regions detected in this way have been integrated into the annotation procedure, along with other resources (cDNA sequences from Tetraodon and ab initio predictions). Of the 28,000 genes annotated, some families were examined in detail: selenoproteins, and Type 1 cytokines and their receptors. The comparison of the proteome of Tetraodon with those of mammals has revealed some interesting differences, such as a major diversification of some hormone systems and of the collagen molecules in the fish. A search for transposable elements in the genomic sequences of Tetraodon has also revealed a high diversity (75 types), which contrasts with their scarcity; the small size of the Tetraodon genome is due to the low abundance of these elements, of which some appear to still be active. Another factor in the compactness of the Tetraodon genome, which has been confirmed by annotation, is the reduction in intron size, which approaches a lower limit of 50-60 bp, and which preferentially affects certain genes. The availability of the sequences from the genomes of humans and mice on one hand, and Takifugu and Tetraodon on the other, provide new opportunities for the study of vertebrate evolution. We have shown that the level of neutral evolution is higher in fish than in mammals. The protein sequences of fish also diverge more quickly than those of mammals. A key mechanism in evolution is gene duplication, which we have studied by taking advantage of the anchoring of the majority of the sequences from the assembly on the chromosomes. The result of this study speaks strongly in favor of a whole genome duplication event, very early in the line of ray-finned fish (Actinopterygians). An even stronger evidence came from synteny studies between the genomes of humans and Tetraodon. Using a high-resolution synteny map, we have reconstituted the genome of the vertebrate which predates this duplication - that is, the last common ancestor to all bony vertebrates (most of the vertebrates apart from cartilaginous fish and agnaths like lamprey). This ancestral karyotype contains 12 chromosomes, and the 21 Tetraodon chromosomes derive from it by the whole genome duplication and a surprisingly small number of interchromosomal rearrangements. On the contrary, exchanges between chromosomes have been much more frequent in the lineage that leads to humans. Sponsors: The project was supported by the Consortium National de Recherche en Genomique and the National Human Genome Research Institute.
Proper citation: Tetraodon Genome Browser (RRID:SCR_007079) Copy
Database containing the DNA sequence and annotation of the entire human chromosome 7, encompassing nearly 158 million nucleotides of DNA and 1917 gene structures, are presented; the most up to date collation of sequence, gene, and other annotations from all databases (eg. Celera published, NCBI, Ensembl, RIKEN, UCSC) as well as unpublished data. To generate a higher order description, additional structural features such as imprinted genes, fragile sites, and segmental duplications were integrated at the level of the DNA sequence with medical genetic data, including 440 chromosome rearrangement breakpoints associated with disease. The objective of this project is to generate a comprehensive description of human chromosome 7 to facilitate biological discovery, disease gene research and medical genetic applications. There are over 360 disease-associated genes or loci on chromosome 7. A major challenge ahead will be to represent chromosome alterations, variants, and polymorphisms and their related phenotypes (or lack thereof), in an accessible way. In addition to being a primary data source, this site serves as a weighing station for testing community ideas and information to produce highly curated data to be submitted to other databases such as NCBI, Ensembl, and UCSC. Therefore, any useful data submitted will be curated and shown in this database. All Chromosome 7 genomic clones (cosmids, BACs, YACs) listed in GBrowser and in other data tables are freely distributed.
Proper citation: Chromosome 7 Annotation Project (RRID:SCR_007134) Copy
Can't find your Tool?
We recommend that you click next to the search bar to check some helpful tips on searches and refine your search firstly. Alternatively, please register your tool with the SciCrunch Registry by adding a little information to a web form, logging in will enable users to create a provisional RRID, but it not required to submit.
Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the sources that were queried against in your search that you can investigate further.
Here are the categories present within FDI Lab - SciCrunch.org that you can filter your data on
Here are the subcategories present within this category that you can filter your data on
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.