The Common European Research Information Format is the standard that the EU recommends to its member states for recording information about research activity. Since version 1.6 it has included specific support for recording metadata for datasets.
The Data Package specification is a generic wrapper format for exchanging data. Although it supports arbitrary metadata, the format defines required, recommended, and optional fields for both the package as a whole and the resources contained within it.
A separate but linked specification provides a way to describe the columns of a data table; descriptions of this form can be included directly in the Data Package metadata.
A set of mandatory metadata that must be registered with the DataCite Metadata Store when minting a DOI persistent identifier for a dataset. The domain-agnostic properties were chosen for their ability to aid in accurate and consistent identification of data for citation and retrieval purposes.
Sponsored by the DataCite consortium, version 3.0 was recently released in 2013.
By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.
A basic, domain-agnostic standard which can be easily understood and implemented, and as such is one of the best known and most widely used metadata standards.
Sponsored by the Dublin Core Metadata Initiative, Dublin Core was published as ISO Standard 15836 in February 2009.
The Metadata Object Description Schema (MODS) is a bibliographic metadata standard implemented in XML. It reimplements a subset of the elements of MARC (Machine Readable Cataloging) using language-based tags instead of numeric ones, and groups them somewhat differently. It is intended both as a simplified version of MARC 21 and as a richer alternative to Dublin Core for applications such as metadata syndication/harvesting and the documentation of digital information packages.
It was developed in 2002 by the Library of Congress Network Development and MARC Standards Office along with a group of interested experts.
The goal of these standards is to expose the rich content in aggregations of Web resources to applications that support authoring, deposit, exchange, visualization, reuse, and preservation. The standards support the changing nature of scholarship and scholarly communication, and the need for cyberinfrastructure to support that scholarship, with the intent to develop standards that generalize across all web-based information including the increasing popular social networks of “Web 2.0”.
This encoding is an essential dependency for the OGC Sensor Observation Service (SOS) Interface Standard. More specifically, this standard defines XML schemas for observations, and for features involved in sampling when making observations. These provide document models for the exchange of information describing observation acts and their results, both within and between different scientific and technical communities.
The PREMIS (Preservation Metadata: Implementation Strategies) Data Dictionary defines a set of metadata that most repositories of digital objects would need to record and use in order to preserve those objects over the long term. It has its roots in the Open Archival Information System Reference Model but has been strongly influenced by the practical experience of such repositories. While the Data Dictionary can be used with other standards to influence the creation of local application profiles, an XML Schema is provided to allow the metadata to be serialized independently.
PREMIS was initially developed by the Preservation Metadata: Implementation Strategies Working Group, convened by OCLC and RLG, and is currently maintained by the PREMIS Maintenance Activity, lead by the Library of Congress.
Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. The PROV Family of Documents defines a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web.
The standard provides a means to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts using the W3C RDF (Resource Description Framework) standard. The model underpinning the Data Cube vocabulary is compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations.
Some repositories have decided that current standards do not fit their metadata needs, and so have created their own requirements.
An application of Dublin Core designed to improve visibility and availability of online resources, originally adapted from the Australian Government Locator Service metadata standard for use in government agencies.
Used to describe semantic assets, defined as highly reusable metadata (for example: XML schemata, generic data models) and reference data (for example: code lists, taxonomies, dictionaries, vocabularies) that are used for eGovernment system development.
An application profile based on the Dublin Core Metadata Initiative Abstract Model, used to describe multi-disciplinary data underlying peer-reviewed scientific and medical literature.
A reference framework that provides a common terminology acroos and between statistical organisations; aligns with DDI and SDMX.
The OpenAIRE Guidelines are a suite of application profiles designed to allow research institutions to make their scholarly outputs visible through the OpenAIRE infrastructure. The profiles are based on established standards and designed to be used in conjunction with the OAI-PMH metadata harvesting protocol:
While the focus of each profile is different, they allow for interlinking and the contextualization of research artefacts.
A profile of the Data Package specification, intended for exchanging tabular data in CSV (comma-separated values) format.
Current research information system implementing the CERIF standard. Originally developed by Avedas but now a product of Thomson Reuters.
Tool which utilizes the DCAT standard. CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data.
CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. Portals that use CKAN include http://data.gov.uk and http://open-data.europa.eu. The United States http://data.gov uses a version of CKAN wrapped up as the Open Government Platform.
A collection of libraries for working with Data Packages in various programming languages, and scripts for importing them into databases.
The Data Package Validator takes the URL of a Data Package and checks whether it conforms to the Data Package specification.
The Data Package Viewer takes the URL of a Data Package and provides a human-friendly view of it.
The Data Packagist is a Web-based tool for writing a Data Package descriptor file (datapackage.json).
RESTFUL API for registering datasets with the DataCite organization. The interface uses the DataCite Metadata Schema.
The DCMI Tools Community list of tools and software implementing Dublin Core.
DdiEditor is a DDI-Lifecycle Editing Framework developed by the DDA - Danish Data Archive.
Current research information system developed by Elsevier that implements the CERIF standard.
Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services.
The Linked Data Cubes Explorer allows for the analysis of statistical datasets using the RDF Cube Vocabulary.
This service validates OAI-PMH metadata records against the OpenAIRE Guidelines for publication repositories, data archives and current research information systems.
Geometa is an R package that offers facilities to handle reading and writing of geographic metadata defined with OGC/ISO 19115, 11119 and 19110 geographic information metadata standards, and encoded using the ISO 19139 (XML) standard. It also includes a facility to check the validity of ISO 19139 XML encoded metadata. The package can be used in integrated (meta)data management flows to generate business metadata compliant with ISO/OGC standards. Metadata generated with geometa can then be published to standard web metadata catalogues by means of related R packages such as ows4R (R interface to OGC Web-Services) or geonapi (R Interface to GeoNetwork API).
This tool uses the Observations and Measurements standard to define a Web service interface which allows querying observations, sensor metadata, as well as representations of observed features.
Current research information system implementing the CERIF standard.
A multidisciplinary data repository for a consortium of universities in the Netherlands, using a metadata structure based on the Dublin Core Metadata Initiative.
The Vatican Library uses FITS as the digital image format for the digitization of its manuscript collection.
A collection of commonly used and example data sets packaged using the Data Package specification.
An online digital repository of multi-disciplinary research datasets produced at the University of Edinburgh, using a modified Dublin Core metadata catalogue.
The University of Southampton's multi-disciplinary Institutional Research Repository, using a profile of Dublin Core and administrative ePrints metadata.
W3C Government Linked Data list of implementations of the RDF Data Cube Vocabulary.
An online portal for education and research on learning in Science, Technology, Engineering, and Mathematics, using a profile of the Dublin Core Metadata Elements for resource and collections metadata.
Develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.
A European Scholarly Communication Infrastructure that aggregates bibliographic metadata from a network of publication repositories, data archives and CRIS following the OpenAIRE Guidelines. Together with additional authoritative information, the objects and their relationships described by the metadata form an information space graph which can be traversed by users and accessed via APIs by other services. The metadata primarily support discovery and monitoring services.
A list of the implementations and usage of the PROV specifications.