Life Sciences

Standards Add

ABCD (Access to Biological Collection Data) Edit

The Access to Biological Collections Data (ABCD) Schema is an evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data). The ABCD Schema attempts to be comprehensive and highly structured, supporting data from a wide variety of databases. It is compatible with several existing data standards. Parallel structures exist so that either (or both) atomised data and free-text can be accommodated.

The ABCD Schema was ratified as a standard by the Biodiversity Information Standards Taxonomic Databases Working Group (TDWG) in 2005. It was developed as a community-driven effort, with contributions from CODATA, BioCASE and GBIF among other organizations.

Darwin Core Edit

A body of standards, including a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries.

Sponsored by Biodiversity Information Standards (TWDG), the current standard was last modified in October 2009.

EML (Ecological Metadata Language) Edit

Ecological Metadata Language (EML) is a metadata specification particularly developed for the ecology discipline. It is based on prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications).

Sponsored by ecoinformatics.org, EML Version 2.2.0 was released in 2019.

Genome Metadata Edit

Genome metadata on PATRIC consists of 61 different metadata fields, called attributes, which are organized into the following seven broad categories: Organism Info, Isolate Info, Host Info, Sequence Info, Phenotype Info, Project Info, and Others.

ISA-Tab Edit

The Investigation/Study/Assay (ISA) tab-delimited (TAB) format is a general purpose framework with which to collect and communicate complex metadata (i.e. sample characteristics, technologies used, type of measurements made) from 'omics-based' experiments employing a combination of technologies.

Created by core developers from the University of Oxford, ISA-TAB v1.0 was released in November 2008.

MIBBI (Minimum Information for Biological and Biomedical Investigations) Edit

A common portal to a group of nearly 40 checklists of Minimum Information for various biological disciplines. The MIBBI Foundry is developing a cross-analysis of these guidelines to create an intercompatible, extensible community of standards.

The concept was realized initially through the joint efforts of the Proteomics Standards Initiative, the Genomic Standards Consortium and the MGED RSBI Working Groups. The latest project to register with MIBBI is the MIABie guidelines for reporting biofilm research, as of January 2012.

NeXus Edit

NeXus is an international standard for the storage and exchange of neutron, x-ray, and muon experiment data. The structure of NeXus files is extremely flexible, allowing the storage of both simple data sets, such as a single data array and its axes, and highly complex data and their associated metadata, such as measurements on a multi-component instrument or numerical simulations. NeXus is built on top of the container format HDF5, and adds domain-specific rules for organizing data within HDF5 files in addition to a dictionary of well-defined domain-specific field names.

Observ-OM Edit

Observ-OM is founded on four basic concepts to represent any kind of observation: Targets, Features, Protocols (and their Applications), and Values. It is intended to lower the barrier for future data sharing and facilitate integrated search across panels and species. All models, formats, documentation, and software are available for free and open source (LGPLv3) at http://www.observ-om.org.

ODAM Structural Metadata Edit

Open Data for Access and Mining (ODAM) Structural Metadata is a format describing how the metadata should be formatted and what should be included to ensure ODAM compliance for a data set. To comply with this format, two metadata files in TSV format are required in addition to the data file(s). These two files describe the metadata of the dataset, which includes descriptions of measures and structural metadata like references between tables. The metadata lets non-expert users explore and visualize your data. By making data interoperable and reusable by both humans and machines, it also encourages data dissemination according to FAIR principles. The structural metadata is specified in section 'Data collection and preparation' on the website.

OME-XML (Open Microscopy Environment XML) Edit

OME-XML is a vendor-neutral file format for biological image data, with an emphasis on metadata supporting light microscopy. It can be used as a data file format in its own right, or as a way of encoding metadata within a TIFF or BigTIFF file (for which purpose there is the OME-TIFF specification).

The standard is maintained by the Open Microscopy Environment Consortium, and was last updated in June 2012.

Open Standard for Particle-Mesh Data (openPMD) Edit

OpenPMD provides naming and attribute conventions that allow the exchange of particle and mesh based data from scientific simulations and experiments. The primary goal is to define a minimal set/kernel of meta information that enables the sharing and exchange of data to achieve

  • portability between various applications and differing algorithms;
  • a unified open-access description for scientific data (publishing and archiving);
  • a unified description for post-processing, visualization and analysis.

OpenPMD suits any kind of hierarchical, self-describing data format, such as, but not limited to ADIOS1 (BP3), ADIOS2 (BP4), HDF5, JSON, and XML.

PDBx/mmCIF (Protein Data Bank Exchange Dictionary and the Macromolecular Crystallographic Information Framework) Edit

Protein Data Bank archive (PDB) is the single worldwide archival repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies, managed by the Worldwide PDB (wwPDB). The PDB Exchange Dictionary (PDBx) is used by the wwPDB to define data content for deposition, annotation and archiving of PDB entries. PDBx incorporates the community standard metadata representation, the Macromolecular Crystallographic Information Framework (mmCIF), orginally developed under the auspices of the International Union of Crystallography (IUCr). PDBx has been extended by the wwPDB to include descriptions of other experimental methods that produce 3D macromolecular structure models such as Nuclear Magnetic Resonance Spectroscopy, 3D Electron Microscopy and Tomography.

Protocol Data Element Definitions Edit
A draft set of data elements required by the National Institues of Health (U.S.) for the submission of trial information to the CLincalTrials.gov registry and results database.
Repository-Developed Metadata Schemas Edit

Some repositories have decided that current standards do not fit their metadata needs, and so have created their own requirements.

RO-Crate Edit

RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.

UKEOF Edit

A metadata standard for describing environmental monitoring activities, programmes, networks and facilities published by the UK Environmental Observation Framework (UKEOF).

Extensions Add

ABCD Zoology Edit

ABCD Zoology is an application profile of ABCD tailored for use in zoological contexts. It was the first official application profile to use the RDF-based version 3.0 of ABCD.

ABCDDNA Edit

An extension of the ABCD standard for DNA data.

Apple Core Edit

Darwin Core documentation and recommendations for herbaria.

Darwin Core Geospatial Extension Edit

A protocol-independent XML schema for a geospatial extension to the Darwin Core.

DwC Germplasm Edit

An extension to the Darwin Core standard, it includes additional terms required to describe plant genetic resources and in particular germplasm seed samples.

EDMED Metadata Profile Edit

The European Directory of Marine Environmental Datasets metadata scheme, which is a profile of ISO 19115.

FGDC/CSDGM Biological Data Profile Edit

A profile of the FGDC/CSDGM metadata standard, intended to support the collection and processing of biological data.

GBIF Metadata Profile Edit

Established by a global network of countries and organizations, GBIF is a web portal promoting and facilitating the mobilization, access, discovery and use of biodiversity data. The portal uses a profile of EML; a How-to Guide and Reference Guide for using the profile are available.

HISPID (Herbarium Information Standards and Protocols for Interchange of Data) Edit

An extension to ABCD 2.06, it is designed to allow the storage and transmission of herbarium plant specimen data.

ISA-TAB Nano Edit

An extension of ISA-TAB specifying the format for representing and sharing information about nanomaterials, small molecules and biological specimens along with their assay characterization data.

isaconfig-diXa Edit
FAIRsharing MIBBI Collection Edit

A list of nearly 40 Minimum Information standards projects registered with the MIBBI initiative.

OME-TIFF (Open Microscopy Environment TIFF) Edit

A specification of how to embed OME-XML metadata within a TIFF or BigTIFF image file.

SNRNASM ISA-Tab Edit

An ISA-Tab-based standard for reporting the results of single nucleotide resolution nucleic acid structure mapping experiments.

VarioML Edit

Tools Add

Bio-Formats Edit

Bio-Formats reads proprietary microscopy image data and metadata, and converts them to OME-TIFF, a combination of TIFF and OME-XML.

Darwin Core Archive Assistant Edit

A web application that offers data publishers wishing to serve to the GBIF network an easy interface for describing data elements as basic text files, composing an appropriate XML Darwin Core descriptor file to accompany them.

Darwin Core Archive Validator Edit

A tool to validate XML metadata against the Darwin Core Text Guidelines.

Fiji Edit

Fiji is an image processing package that supports the OME data model for images

Integrated Publishing Toolkit Edit

A software platform using Darwin Core and EML to facilitate the efficient publishing of biodiversity data on the Internet, using the GBIF network.

ISA Software Suite Edit

The open source ISA metadata tracking tools facilitate ISA-TAB-compliant collection, curation, local management and reuse of datasets in an increasingly diverse set of life science domains.

Metacat Edit

Metacat is a repository for data and metadata that helps scientists find, understand, and effectively use the data sets they manage or that have been created by others.

MOLGENIS Edit

A software generator to rapidly build web databases and a suite of web databases for genotype, phenotype, QTL and analysis pipelines.

Morpho Edit

An application for accessing and manipulating metadata and data (both locally and on the network), with wizards creating metadata files using a subset of Ecological Metadata Language (EML).

ODAM Software Suite Edit

Experimental data table management software to make research data accessible and available for reuse with minimal effort on the part of the data provider. Designed to manage experimental data tables in an easy way for users, ODAM provides a model for structuring both data and metadata that facilitates data handling and analysis. It also encourages data dissemination according to FAIR principles by making the data interoperable and reusable by both humans and machines, allowing the dataset to be explored and then extracted in whole or in part as needed.

OMERO Edit

Repository software for organising, viewing, analysing and sharing biological microscopy images. It supports proprietary file formats but normalises to OME-TIFF/OME-XML.

PATRIC Download Tool Edit

Tool for downloading data from PATRIC.

PDBx/mmCIF Software Resources Edit
Parsing, validation, and visualization tools and libraries supporting PDBx/mmCIF, the data standard used by the Worldwide Protein Data Bank.
ProteoRed Tools Edit

Bioinformatics tools to create and extract metadata compliant with the MIBBI-registered MIAPE minimum requirements.

UKEOF Monitoring Catalogue Edit

The UKEOF Catalogue contains over 2000 metadata records of environmental observations undertaken and funded by public and third sector organisations.

The Catalogue provides a unique management tool to underpin the activities and requirements of the environmental observation community. It provides a strong basis for strategic planning, giving a holistic overview of environmental observations as well as a place to discover who is doing what, where, why and when.

Use Cases Add

Atlas of Living Australia Edit

An aggregation of information on all the known species in Australia, collected from museums, herbaria, community groups, government departments, individuals and universities. All data is converted to Darwin Core.

BioCASE (Biological Collection Access Service for Europe) Edit

The BioCASE Biological Unit Network provides access to a transnational network of biological collections; its protocol requires providers to use the ABCD schema in their configuration files.

BioModels Database Edit

A repository hosting computational models of biological systems, using the MIBBI-registered MIRIAM and MIASE minimal metadata requirements.

BODC (British Oceanographic Data Centre Published Data Library) Edit

This national facility for looking after and distributing data concerning the marine environment requires that data sets use a well-documented format such as CF-compliant NetCDF and be accompanied by a Dublin Core record as well as discovery metadata in a recognised standard such as DIF or FGDC/CDGM.

CARMEN Edit

A a virtual laboratory for neurophysiology, enabling sharing and collaborative exploitation of data, analysis, code and expertise. Metadata must include the MIBBI-registered MINI recommendations.

The Cell: An Image Library Edit

A resource database of images, videos, and animations of cells, capturing a wide diversity of organisms, cell types, and cellular processes. Its native metadata format for images is OME-XML.

CHD7 Database Edit
Chem-BLAST Edit
A Web-based service for searching for and visualizing chemical structures. It uses data from the Protein Data Bank that has been transformed to RDF.
dbEST (Expressed Sequence Tag Database) Edit

A repository-developed metadata schema for EST data in Genbank.

Environmental Information Data Centre (EIDC) Edit

The Environmental Information Data Centre (EIDC) is a Natural Environment Research Council Data Centre hosted by the Centre for Ecology & Hydrology (CEH). It manages nationally-important datasets concerned with the terrestrial and freshwater sciences.

FlowRepository Edit

A database of flow cytometry experiments where you can query and download data collected and annotated according to the MIBBI-registered MIFlowCyt standard.

GBIF (Global Biodiversity Information Facility) Edit

Established by a global network of countries and organizations, GBIF is a web portal promoting and facilitating the mobilization, access, discovery and use of biodiversity data. The preferred format for publishing data to the GBIF network is the Darwin Core Archive, and its Integrated Publishing Toolkit uses EML as its data standard.

Harvard Medical School LINCS Database Edit

One of two research centers in the US creating libraries of signatures that describe how cells respond to perturbation, it uses the ISA-TAB standard to describe its data.

Integrated Marine Observing System Portal Edit
International dystrophic eb Patient Registry Edit
International Molecular Exchange Consortium Edit

An international collaboration to provide access to a non-redundant set of protein-protein interaction data from a broad taxonomic range of organisms. IMEx partner databases require data to be MIMIx (a MIBBI-registered standard) compatible.

ISA Commons Edit

A network of systems and projects that use the ISA-Tab file format, and/or are powered by components of the ISA software suite.

JCB Data Viewer Edit

A repository for viewing and analysing multi-dimensional image data associated with articles published in The Journal of Cell Biology. Its native metadata format is OME-XML.

KNB (The Knowledge Network for Biocomplexity) Edit

A network of federated institutions that have agreed to share data and metadata using a common framework, principally revolving around the use of the Ecological Metadata Language as a common language for describing ecological data.

Long Term Ecological Research Network Edit

A network providing the scientific expertise, research platforms, and long-term datasets necessary to document and analyze environmental change, it uses the Ecological Metadata Language in describing its data.

MetaboLights Edit

A database for metabolomics experiments and derived information in ISA-Tab format.

MVID Patient Registry Edit
National Center for Ecolocial Analysis and Synthesis Edit

An EML developer, this US-based centre of cross-disciplinary research uses existing data to address major fundamental issues in ecology and allied fields.

National Science Digital Library Data Repository Edit

An online portal for education and research on learning in Science, Technology, Engineering, and Mathematics, using a profile of the Dublin Core Metadata Elements for resource and collections metadata.

NEBC ISA Network BioInvestigationIndex Edit

The NERC Environmental Bioinformatics Centre ISA Network's index of ISA-Tab and MIBBI-compliant environmental 'omics data.

OBIS (Ocean Biogeographic Information System) Edit

A data repository for marine species datasets from all of the world's oceans; it uses an extension of Darwin Core 2 as its data standard.

Ocean Networks Canada Edit

Ocean Networks Canada operates the world-leading NEPTUNE and VENUS cabled ocean observatories that collect data on physical, chemical, biological, and geological aspects of the ocean over long time periods, supporting research on complex Earth processes. The CF standard is used within netCDF data products delivered through the Oceans 2.0 interface and via OPeNDAP webservices.

PRIDE (Proteomics Identifications Database) Edit

A centralized, MIBBI standards compliant, public data repository for proteomics data, post-translational modifications and supporting spectral evidence.

Rebioma Edit

A web portal using Darwin Core to describe biodiversity data collected in Madagascar.

UK Polar Data Centre Edit

An organisation coordinating the management of data collected by UK-funded scientists in the polar regions, using an application profile that is harmonious with both ISO 19115 and DIF.

VertNet Edit

Four distributed database networks (MaNIS, HerpNET, ORNIS and FishNet) using a Darwin Core engine to make bioinformatics specimen data interoperable, mappable and publicly available.

WormQTL-HD Edit
WormQTL Edit
wwPDB (Worldwide Protein Data Bank) Edit

Protein Data Bank archive (PDB) is the single worldwide archival repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies. The Worldwide PDB (wwPDB) organization manages the PDB archive and ensures that the PDB is freely and publicly available to the global community.