Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.
SciCrunch Registry is a curated repository of scientific resources, with a focus on biomedical resources, including tools, databases, and core facilities - visit SciCrunch to register your resource.
http://www.well.ox.ac.uk/~kgaulton/chaos.shtml
A Perl-based system for annotation of variants identified in high-throughput sequencing experiments. Functionality includes annotation of variants with information relating to population genetics, known transcripts, positional records, and sequence motif-based prediction. In addition, annotated variants can be summarized and extracted to facilitate downstream analysis. There is also basic support for gene-based biological annotation, and eventually will include tools for variant and genotype analysis and visualization.
Proper citation: CHAoS (RRID:SCR_005174) Copy
http://cbrc.kaust.edu.sa/readscan/
A highly scalable parallel software program to identify non-host sequences (of potential pathogen origin) and estimate their genome relative abundance in high-throughput sequence datasets.
Proper citation: READSCAN (RRID:SCR_005204) Copy
http://www.sanger.ac.uk/resources/software/lookseq/
A web-based application for alignment visualization, browsing and analysis of genome sequence data.
Proper citation: LookSeq (RRID:SCR_005625) Copy
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. An algorithm that finds articles most relevant to a genetic sequence. In the genomic era, researchers often want to know more information about a biological sequence by retrieving its related articles. However, there is no available tool yet to achieve conveniently this goal. Here, a new literature-mining tool MedBlast is developed, which uses natural language processing techniques, to retrieve the related articles of a given sequence. An online server of this program is also provided. The genome sequencing projects generate such a large amount of data every day that many molecular biologists often encounter some sequences that they know nothing about. Literature is usually the principal resource of such information. It is relatively easy to mine the articles cited by the sequence annotation; however, it is a difficult task to retrieve those relevant articles without direct citation relationship. The related articles are those described in the given sequence (gene/protein), or its redundant sequences, or the close homologs in various species. They can be divided into two classes: direct references, which include those either cited by the sequence annotation or citing the sequence in its text; indirect references, those which contain gene symbols of the given sequence. A few additional issues make the task even more complicated: (1) symbols may have aliases; and (2) one sequence may have a couple of relatives that we want to take into account too, which include redundant (e.g. protein and gene sequences) and close homologs. Here the issues are addressed by the development of the software MedBlast, which can retrieve the related articles of the given sequence automatically. MedBlast uses BLAST to extend homology relationships, precompiled species-specific thesauruses, a useful semantics technique in natural language processing (NLP), to extend alias relationship, and EUtilities toolset to search and retrieve corresponding articles of each sequence from PubMed. MedBlast take a sequence in FASTA format as input. The program first uses BLAST to search the GenBank nucleic acid and protein non-redundant (nr) databases, to extend to those homologous and corresponding nucleic acid and protein sequences. Users can input the BLAST results directly, but it is recommended to input the result of both protein and nucleic acid nr databases. The hits with low e-values are chosen as the relatives because the low similarity hits often do not contain specific information. Very long sequences, e.g. 100k, which are usually genomic sequences, are discarded too, for they do not contain specific direct references. User can adjust these parameters to meet their own needs.
Proper citation: MedBlast (RRID:SCR_008202) Copy
The JCSG is a multi-institutional consortium that aims to explore the expanding protein universe to find new challenges and opportunities to significantly contribute to new biology, chemistry and medicine through development of HT approaches to structural genomics. The mission of JCSG is to to operate a robust HT protein structure determination pipeline as a large-scale production center for PSI-2. A major goal is to ensure that innovative high-throughput approaches are developed that advance not only structural genomics, but also structural biology in general, via investigation of large numbers of high-value structures that populate protein fold and family space and by increasing the efficiency of structure determination at substantially reduced cost. The JCSG centralizes each core activity into single dedicated sites, each handling distinct, but interconnected objectives. This unique approach allows each specialized group to focus on its own area of expertise and provides well-defined interfaces among the groups. In addition, this approach addresses the requirements for the scalability needed to process large numbers of targets at a greatly reduced cost per target. JCSG production groups are: - Administrative Core - Bioinformatics Core - Crystallomics Core - Structure Determination Core - NMR Core JCSG is deeply committed to the development of new technologies that facilitate high throughput structural genomics. The areas of development include hardware, software, new experimental methods, and adaptation of existing technologies to advance genome research. In the hardware arena, their commitment is to the development of technologies that accelerate structure solution by increasing throughput rates at every stage of the production pipeline. Therefore, one major area of hardware development has been the implementation of robotics. In the software arena, they have developed enterprise resource software that track success, failures, and sample histories from target selection to PDB deposition, annotation and target management tools, and helper applications aimed at facilitating and automating multiple steps in the pipeline. Sponsors: The Joint Center for Structural Genomics is funded by the National Institute of General Medical Sciences (NIGMS), as part of the second phase of the Protein Structure Initiative (PSI) of the National Institutes of Health (U54 GM074898).
Proper citation: Joint Center for Structural Genomics (RRID:SCR_008251) Copy
http://www.affymetrix.com/support/developer/powertools/apt_archive.affx
Affymetrix Power Tools (APT) are a set of cross-platform command line programs that implement algorithms for analyzing and working with Affymetrix GeneChip arrays. APT programs are intended for power users who prefer programs that can be utilized in scripting environments and are sophisticated enough to handle the complexity of extra features and functionality. APT provides platform for developing and deploying new algorithms without waiting for the GUI implementations. This resource is supported by Affymetrix, Inc.
Proper citation: Affymetrix Power Tools (RRID:SCR_008401) Copy
http://weizhong-lab.ucsd.edu/cd-hit-otu/
Data analysis service and software program that perform Operantional Taxonomic Units (OTUs) finding. It uses a three-step clustering for identifying OTUs. The first-step clustering is raw read filtering and trimming. The second step is error-free reads picking.. At the last step, OTU clustering is done at different distanct cutoffs (0.01, 0.02, 0.03... 0.12).
Proper citation: CD-HIT-OTU (RRID:SCR_006983) Copy
Non profit research organization for genome sequences to advance understanding of biology of humans and pathogens in order to improve human health globally. Provides data which can be translated for diagnostics, treatments or therapies including over 100 finished genomes, which can be downloaded. Data are publicly available on limited basis, and provided more extensively upon request.
Proper citation: Wellcome Trust Sanger Institute; Hinxton; United Kingdom (RRID:SCR_011784) Copy
http://sourceforge.net/projects/gasic/
A method to correct read alignment results for the ambiguities imposed by similarities of genomes.
Proper citation: GASiC (RRID:SCR_006765) Copy
http://ecoliwiki.net/colipedia/index.php/T4-like_genome_database
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. A database of information on bacterial phages. It contains multiple phage genomes, which users can BLAST and MegaBLAST, and also hosts a Phage Forum in which users can discuss phage data. Interactive browsing of completed phage genomes is available using the program. The browser allows users to scan the genome for particular features and to download sequence information plus analyses of those features. Views of the genome are generated showing named genes BLAST similarities to other phages predicted tRNAs and other sequence features.
Proper citation: T4-like genome database (RRID:SCR_005367) Copy
http://genome.jgi.doe.gov/programs/plants/index.jsf
The goal of the DOE JGI Plant Genome Program is to shed light on the fundamental biology of photosynthesis and transduction of solar to chemical energy. Other areas of interest include characterizing: * Ecosystems and the role of terrestrial plants and oceanic phytoplankton-in carbon sequestration. * The role of plants in coping with toxic pollutants in soils by hyper-accumulation and detoxification. * Feedstocks for biofuels, e.g., biodiesel from soybean; cellulosic ethanol from perennial grasses. * The ability to respond to environmental change (e.g., loss of diversity from monoculture produces vulnerabilities; nitrogen fixing nodules in legumes reduce fertilizer need). * The generation of useful secondary metabolites (produced largely for disease resistance)- for positive/negative control in agriculture, with attendant influence on global carbon cycle. The Plant Genome Program accomplishes the above through the following activities: # Sequence. Produce genome sequences of key plant (and algal) species to accelerate biofuel development and understand response to climate change. # Function. Develop datasets (and synthetic biology tools) to elucidate functional elements in plant genomes, with special focus on handful of flagship genomes. # Variation. Characterize natural genomic variation in plants (and their associated microbiomes), and relate to biofuel sustainability and adaptation to climate change. # Integration. Provide a centralized hub for the retrieval and deep integrated analysis of plant genome datasets.
Proper citation: Plant Genome Resource at JGI (RRID:SCR_005315) Copy
Database of known and predicted protein interactions. The interactions include direct (physical) and indirect (functional) associations and are derived from four sources: Genomic Context, High-throughput experiments, (Conserved) Coexpression, and previous knowledge. STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. The database currently covers 5''214''234 proteins from 1133 organisms. (2013)
Proper citation: STRING (RRID:SCR_005223) Copy
http://bioinfo.iitk.ac.in/MIPModDB/
This is a database of comparative protein structure models of MIP (Major Intrinsic Protein) family of proteins. The nearly completed sets of MIPs have been identified from the completed genome sequence of organisms available at NCBI. The structural models of MIP proteins were created by defined protocol. The database aims to provide key information of MIPs in particular based on sequence as well as structures. This will further help to decipher the function of uncharacterized MIPs. For each MIP entry, this database contains information about the source, gene structure, sequence features, substitutions in the conserved NPA motifs, structural model, the residues forming the selectivity filter and channel radius profile. For selected set of MIPs, it is possible to derive structure-based sequence alignment and evolutionary relationship. Sequences and structures of selected MIPs can be downloaded from MIPModDB database.
Proper citation: MIPModDB (RRID:SCR_006058) Copy
http://operons.ibt.unam.mx/OperonPredictor/
The Prokaryotic Operon DataBase (ProOpDB) constitutes one of the most precise and complete repository of operon predictions in our days. Using our novel and highly accurate operon algorithm, we have predicted the operon structures of more than 1,200 prokaryotic genomes. ProOpDB offers diverse alternatives by which a set of operon predictions can be retrieved including: i) organism name, ii) metabolic pathways, as defined by the KEGG database, iii) gene orthology, as defined by the COG database, iv) conserved protein motifs, as defined by the Pfam database, v) reference gene, vi) reference operon, among others. In order to limit the operon output to non-redundant organisms, ProOpDB offers an efficient protocol to select the more representative organisms based on a precompiled phylogenetic distances matrix. In addition, the ProOpDB operon predictions are used directly as the input data of our Gene Context Tool (GeConT) to visualize their genomic context and retrieve the sequence of their corresponding 5�� regulatory regions, as well as the nucleotide or amino acid sequences of their genes. The prediction algorithm The algorithm is a multilayer perceptron neural network (MLP) classifier, that used as input the intergenic distances of contiguous genes and the functional relationship scores of the STRING database between the different groups of orthologous proteins, as defined in the COG database. Nevertheless, the operon prediction of our method is not restricted to only those genes with a COG assignation, since we successfully defined new groups of orthologous genes and obtained, by extrapolation, a set of equivalent STRING-like scores based on conserved gene pairs on different genomes. Since the STRING functional relationships scores are determined in an un-bias manner and efficiently integrates a large amount of information coming from different sources and kind of evidences, the prediction made by our MLP are considerably less influenced by the bias imposed in the training procedure using one specific organism.
Proper citation: ProOpDB (RRID:SCR_006111) Copy
http://prorepeat.bioinformatics.nl/
ProRepeat is an integrated curated repository and analysis platform for in-depth research on the biological characteristics of amino acid tandem repeats. ProRepeat collects repeats from all proteins included in the UniProt knowledgebase, together with 85 completely sequenced eukaryotic proteomes contained within the RefSeq collection. It contains non-redundant perfect tandem repeats, approximate tandem repeats and simple, low-complexity sequences, covering the majority of the amino acid tandem repeat patterns found in proteins. The ProRepeat web interface allows querying the repeat database using repeat characteristics like repeat unit and length, number of repetitions of the repeat unit and position of the repeat in the protein. Users can also search for repeats by the characteristics of repeat containing proteins, such as entry ID, protein description, sequence length, gene name and taxon. ProRepeat offers powerful analysis tools for finding biological interesting properties of repeats, such as the strong position bias of leucine repeats in the N-terminus of eukaryotic protein sequences, the differences of repeat abundance among proteomes, the functional classification of repeat containing proteins and GC content constrains of repeats' corresponding codons.
Proper citation: ProRepeat (RRID:SCR_006113) Copy
High quality ribosomal RNA databases providing comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). Supplementary services include a rRNA gene aligner, online tools for probe and primer evaluation and optimized browsing, searching and downloading on the website. The extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches. Alignment tool, SINA, is available for download as well as available for use online.
Proper citation: SILVA (RRID:SCR_006423) Copy
ViralZone is a SIB Swiss Institute of Bioinformatics web-resource for all viral genus and families, providing general molecular and epidemiological information, along with virion and genome figures. Each virus or family page gives an easy access to UniProtKB/Swiss-Prot viral protein entries. ViralZone project is handled by the virus program of SwissProt group. Proteins popups were developed in collaboration with Prof. Christian von Mering and Andrea Franceschini, Bioinformatics Group , Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland, funded in part by the SIB Swiss Institute of bioinformatics. All pictures in ViralZone are copyright of the SIB Swiss Institute of Bioinformatics.
Proper citation: ViralZone (RRID:SCR_006563) Copy
http://www.broadinstitute.org/annotation/tetraodon/
This database have been funded by the National Human Genome Research Institute (NHGRI) to produce shotgun sequence of the Tetraodon nigriviridis genome. The strategy involves Whole Genome Shotgun (WGS) sequencing, in which sequence from the entire genome is generated. Whole genome shotgun libraries were prepared from Tetraodon genomic DNA obtained from the laboratory of Jean Weissenbach at Genoscope. Additional sequence data of approximately 2.5X coverage of Tetraodon has also been generated by Genoscope in plasmid and BAC end reads. Broad and Genoscope intend to pool their data and generate whole genome assemblies. Tetraodon nigroviridis is a freshwater pufferfish of the order Tetraodontiformes and lives in the rivers and estuaries of Indonesia, Malaysia and India. This species is 20-30 million years distant from Fugu rubripes, a marine pufferfish from the same family. The gene repertoire of T. nigroviridis is very similar to that of other vertebrates. However, its relatively small genome of 385 Mb is eight times more compact than that of human, mostly because intergenic and intronic sequences are reduced in size compared to other vertebrate genomes. These genome characteristics along with the large evolutionary distance between bony fish and mammals make Tetraodon a compact vertebrate reference genome - a powerful tool for comparative genetics and for quick and reliable identification of human genes.
Proper citation: Tetraodon nigroviridis Database (RRID:SCR_007123) Copy
http://compbio.soe.ucsc.edu/yeast_introns.html
Database of information about the spliceosomal introns of the yeast Saccharomyces cerevisiae. Listed are known spliceosomal introns in the yeast genome and the splice sites actually used are documented. Through the use of microarrays designed to monitor splicing, they are beginning to identify and analyze splice site context in terms of the nature and activities of the trans-acting factors that mediate splice site recognition. In version 3.0, expression data that relates to the efficiency of splicing relative to other processes in strains of yeast lacking nonessential splicing factors is included. These data are displayed on each intron page for browsing and can be downloaded for other types of analysis.
Proper citation: Yeast Intron Database (RRID:SCR_007144) Copy
http://net.icgeb.org/benchmark/
It was created in order to create standard datasets on which the performance of machine learning methods can be compared. The collection contains datasets of sequences and structures, each subdivided into positive/negative training/test sets. Such a subdivision is called a classification task. Typical tasks include the classification of structural domains in the SCOP and CATH databases based on their sequences, as fell as various functional and taxonomic classification tasks. Running a performance evaluation test on an entire database can include many different classification tasks. These ensembles of classification tasks are encoded in a simple matrix format - called the cast matrix or membership table - that specifies the role of each sequence (or structure) in the different calculations. Each column of this matrix is a subdivision of the objects (rows) into positive/negative training/test sets. Typically, a database record contains such an ensemble of classification tasks, encoded in a single cast matrix. In addition, there is a collection of distance matrices that contain an all vs. all comparison of the datasets using methods as BLAST, Smith-Waterman, 3D-comparisons etc. Evaluation of a method on a given database consists of calculating a performance measure such as a receiver operating curve (ROC) AUC value. Results of evaluation are deposited along with the data, each dataset is evaluated at least by one classification method, such as 1NN (nearest neighbour) or SVM (support vector machines), ANN (artificial neural networks), RF (random forests) etc.. There are small datasets meant for program developers, as well as downloadable programs for various classification algorithms.
Proper citation: Protein Classification Benchmark Collection (RRID:SCR_007561) Copy
Can't find your Tool?
We recommend that you click next to the search bar to check some helpful tips on searches and refine your search firstly. Alternatively, please register your tool with the SciCrunch Registry by adding a little information to a web form, logging in will enable users to create a provisional RRID, but it not required to submit.
Welcome to the RRID Resources search. From here you can search through a compilation of resources used by RRID and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that RRID has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on RRID then you can log in from here to get additional features in RRID such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into RRID you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the sources that were queried against in your search that you can investigate further.
Here are the categories present within RRID that you can filter your data on
Here are the subcategories present within this category that you can filter your data on
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.