MeSH ID: n/a
Description:
Clinical and/or demographic data at an individual participant level that is either a stand-alone dataset, or is associated with another method of data collection (eg. genomics). Data are usually kept in tabular databases, eg in Excel or in a text file format (.csv.)
Best practice for sharing this type of data:
There are many considerations when sharing human clinical or demographic data. Of primary concern are issues around consent and identifiability. Ideally, consent for data sharing is obtained from participants before data collection occurs. If the study design does not involve consenting new participants, then ethics board approval needs to be sought with respect to sharing data without consent. Careful de-identification (removal of personally identifying data such as birthdates) and/or anonymization needs to happen prior to sharing or deposition in a repository. Consultation with your ethics review board is required to determine what level of de-identification or anonymization is needed in your specific case. However, here are some basic guidelines (from https://irb.ucsf.edu/definitions#18) for data from the USA, although these are likely applicable in other jurisdictions:
1. an experienced expert determines that the risk that certain information could be used to identify an individual is “very small” and documents and justifies the determination, or
2. the data do not include any of the 18 identifiers (of the individual or his/her relatives, household members, or employers) which could be used alone or in combination with other information to identify the subject. Note that even if these identifiers are removed, the Privacy Rule states that information will be considered identifiable if the covered entity knows that the identity of the person may still be determined.
The 18 identifiers mentioned above are:
This paper has some special considerations for clinical data that is accompanying genomic data: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005873
Finally, a very thorough data dictionary is needed to accompany the data set to explain any data coding, categories, or other calculations. Column headings should describe the content of each column and contain only numbers, letters, and underscores – no spaces, or special characters. Lowercase letters are preferred. Row names should be consistent with those used in the article and in other related datasets.
Most suitable repositories:
Many repositories are suitable for tabular data such as International Clinical Trials Registry Platform, ClinicalTrials.gov, Inter-university Consortium for Political and Social Research and Qualitative Data Repository. Your institution, journal, or funder may recommend specific repositories, otherwise data can be added to any repository able to host generic file types, detailed here.
Best practice for indicating re-use of existing data:
For public datasets please provide a DOI or other stable identified for the dataset itself *and* include a citation for the dataset in the reference list. Be sure to indicate exactly which data has been re-used, particularly when multiple versions of the dataset exist. In many cases, this is best achieved by sharing the code used to extract the part of the data that you analyzed. In some cases it may be best to share the exact dataset(s) you analyzed as well.
For access-controlled data authors should provide a link to instructions for obtaining access (e.g. here is the information page for ADNI (Alzheimer's Disease Neuroimaging Initiative): http://adni.loni.usc.edu/data-samples/access-data/).
When re-using a private dataset from a previous study please contact the data owners to discuss how the data can be made public.