General Research Data

Standards Add

Registry Interchange Format - Collections and Services Edit

The Registry Interchange Format - Collections and Services (RIF-CS) schema was developed as a data interchange format for supporting the electronic exchange of collection and service descriptions. It organises information about collections and services into the format required by the Research Data Australia (RDA) Registry.

CERIF (Common European Research Information Format) Edit

The Common European Research Information Format is the standard that the EU recommends to its member states for recording information about research activity. Since version 1.6 it has included specific support for recording metadata for datasets.

Data Package Edit

The Data Package specification is a generic wrapper format for exchanging data. Although it supports arbitrary metadata, the format defines required, recommended, and optional fields for both the package as a whole and the resources contained within it.

A separate but linked specification provides a way to describe the columns of a data table; descriptions of this form can be included directly in the Data Package metadata.

DataCite Metadata Schema Edit

A set of mandatory metadata that must be registered with the DataCite Metadata Store when minting a DOI persistent identifier for a dataset. The domain-agnostic properties were chosen for their ability to aid in accurate and consistent identification of data for citation and retrieval purposes.

Sponsored by the DataCite consortium, version 3.0 was recently released in 2013.

DCAT (Data Catalog Vocabulary) Edit

By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.

Dublin Core Edit

A basic, domain-agnostic standard which can be easily understood and implemented, and as such is one of the best known and most widely used metadata standards.

Sponsored by the Dublin Core Metadata Initiative, Dublin Core was published as ISO Standard 15836 in February 2009.

MODS (Metadata Object Description Schema) Edit

The Metadata Object Description Schema (MODS) is a bibliographic metadata standard implemented in XML. It reimplements a subset of the elements of MARC (Machine Readable Cataloging) using language-based tags instead of numeric ones, and groups them somewhat differently. It is intended both as a simplified version of MARC 21 and as a richer alternative to Dublin Core for applications such as metadata syndication/harvesting and the documentation of digital information packages.

It was developed in 2002 by the Library of Congress Network Development and MARC Standards Office along with a group of interested experts.

OAI-ORE (Open Archives Initiative Object Reuse and Exchange) Edit

The goal of these standards is to expose the rich content in aggregations of Web resources to applications that support authoring, deposit, exchange, visualization, reuse, and preservation. The standards support the changing nature of scholarship and scholarly communication, and the need for cyberinfrastructure to support that scholarship, with the intent to develop standards that generalize across all web-based information including the increasing popular social networks of “Web 2.0”.

Observations and Measurements Edit

This encoding is an essential dependency for the OGC Sensor Observation Service (SOS) Interface Standard. More specifically, this standard defines XML schemas for observations, and for features involved in sampling when making observations. These provide document models for the exchange of information describing observation acts and their results, both within and between different scientific and technical communities.

PREMIS Edit

The PREMIS (Preservation Metadata: Implementation Strategies) Data Dictionary defines a set of metadata that most repositories of digital objects would need to record and use in order to preserve those objects over the long term. It has its roots in the Open Archival Information System Reference Model but has been strongly influenced by the practical experience of such repositories. While the Data Dictionary can be used with other standards to influence the creation of local application profiles, an XML Schema is provided to allow the metadata to be serialized independently.

PREMIS was initially developed by the Preservation Metadata: Implementation Strategies Working Group, convened by OCLC and RLG, and is currently maintained by the PREMIS Maintenance Activity, lead by the Library of Congress.

PROV Edit

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. The PROV Family of Documents defines a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web.

RDF Data Cube Vocabulary Edit

The standard provides a means to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts using the W3C RDF (Resource Description Framework) standard. The model underpinning the Data Cube vocabulary is compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations.

Repository-Developed Metadata Schemas Edit

Some repositories have decided that current standards do not fit their metadata needs, and so have created their own requirements.

RO-Crate Edit

RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.

schema.org Edit

Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. Schema.org vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model. Over 10 million sites use Schema.org to markup their web pages and email messages. Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.

Extensions Add

AGLS Metadata Profile Edit

An application of Dublin Core designed to improve visibility and availability of online resources, originally adapted from the Australian Government Locator Service metadata standard for use in government agencies.

Asset Description Metadata Schema (ADMS) Edit

Used to describe semantic assets, defined as highly reusable metadata (for example: XML schemata, generic data models) and reference data (for example: code lists, taxonomies, dictionaries, vocabularies) that are used for eGovernment system development.

Dryad Metadata Application Profile Edit

An application profile based on the Dublin Core Metadata Initiative Abstract Model, used to describe multi-disciplinary data underlying peer-reviewed scientific and medical literature.

GSIM (Generic Statistical Information Model) Edit

A reference framework that provides a common terminology acroos and between statistical organisations; aligns with DDI and SDMX.

OpenAIRE Guidelines for institutional and thematic repositories, data archives and CRIS systems Edit

The OpenAIRE Guidelines are a suite of application profiles designed to allow research institutions to make their scholarly outputs visible through the OpenAIRE infrastructure. The profiles are based on established standards and designed to be used in conjunction with the OAI-PMH metadata harvesting protocol to foster FAIR principles:

While the focus of each profile is different, they allow for interlinking and the contextualization of research artefacts.

Tabular Data Package Edit

A profile of the Data Package specification, intended for exchanging tabular data in CSV (comma-separated values) format.

Tools Add

Converis Edit

Current research information system implementing the CERIF standard. Originally developed by Avedas but now a product of Thomson Reuters.

CKAN Edit

Tool which utilizes the DCAT standard. CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data.

CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. Portals that use CKAN include http://data.gov.uk and http://open-data.europa.eu. The United States http://data.gov uses a version of CKAN wrapped up as the Open Government Platform.

Data Package libraries Edit

A collection of libraries for working with Data Packages in various programming languages, and scripts for importing them into databases.

Data Package Validator Edit

The Data Package Validator takes the URL of a Data Package and checks whether it conforms to the Data Package specification.

Data Package Viewer Edit

The Data Package Viewer takes the URL of a Data Package and provides a human-friendly view of it.

Data Packagist Edit

The Data Packagist is a Web-based tool for writing a Data Package descriptor file (datapackage.json).

DataCite Metadata Store API Edit

RESTFUL API for registering datasets with the DataCite organization. The interface uses the DataCite Metadata Schema.

DCMI Tools and Software Edit

The DCMI Tools Community list of tools and software implementing Dublin Core.

DdiEditor Edit

DdiEditor is a DDI-Lifecycle Editing Framework developed by the DDA - Danish Data Archive.

Pure Edit

Current research information system developed by Elsevier that implements the CERIF standard.

Esri Geoportal Server Edit

Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services.

Linked Data Cubes Explorer Edit

The Linked Data Cubes Explorer allows for the analysis of statistical datasets using the RDF Cube Vocabulary.

OpenAIRE Validator Edit

This service validates OAI-PMH metadata records against the OpenAIRE Guidelines for publication repositories, data archives and current research information systems.

geometa Edit

Geometa is an R package that offers facilities to handle reading and writing of geographic metadata defined with OGC/ISO 19115, 11119 and 19110 geographic information metadata standards, and encoded using the ISO 19139 (XML) standard. It also includes a facility to check the validity of ISO 19139 XML encoded metadata. The package can be used in integrated (meta)data management flows to generate business metadata compliant with ISO/OGC standards. Metadata generated with geometa can then be published to standard web metadata catalogues by means of related R packages such as ows4R (R interface to OGC Web-Services) or geonapi (R Interface to GeoNetwork API).

SOS (Sensor Observation Service) Edit

This tool uses the Observations and Measurements standard to define a Web service interface which allows querying observations, sensor metadata, as well as representations of observed features.

Symplectic Elements Edit

Current research information system implementing the CERIF standard.

Use Cases Add

3TU.Datacentrum Edit

A multidisciplinary data repository for a consortium of universities in the Netherlands, using a metadata structure based on the Dublin Core Metadata Initiative.

BAV (Biblioteca Apoltolica Vaticana) Edit

The Vatican Library uses FITS as the digital image format for the digitization of its manuscript collection.

Data Packaged Core Datasets Edit

A collection of commonly used and example data sets packaged using the Data Package specification.

Edinburgh DataShare Edit

An online digital repository of multi-disciplinary research datasets produced at the University of Edinburgh, using a modified Dublin Core metadata catalogue.

ePrints Soton Edit

The University of Southampton's multi-disciplinary Institutional Research Repository, using a profile of Dublin Core and administrative ePrints metadata.

List of RDF Data Cube Vocabulary Implementations Edit

W3C Government Linked Data list of implementations of the RDF Data Cube Vocabulary.

National Science Digital Library Data Repository Edit

An online portal for education and research on learning in Science, Technology, Engineering, and Mathematics, using a profile of the Dublin Core Metadata Elements for resource and collections metadata.

Open Archives Inititative Edit

Develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.

OpenAIRE Edit

A European Scholarly Communication Infrastructure that aggregates bibliographic metadata from a network of publication repositories, data archives and CRIS following the OpenAIRE Guidelines. Together with additional authoritative information, the objects and their relationships described by the metadata form an information space graph which can be traversed by users and accessed via APIs by other services. The metadata primarily support discovery and monitoring services.

PROV Implementation Report Edit

A list of the implementations and usage of the PROV specifications.