User Tools

Site Tools


data_type:tabular_data:research_subjects

Tabular Data- Subject Data Table

MeSH ID: n/a

Description:
Clinical and/or demographic data at an individual participant level that is either a stand-alone dataset, or is associated with another method of data collection (eg. genomics). Data are usually kept in tabular databases, eg in Excel or in a text file format (.csv.)

Best practice for sharing this type of data:
There are many considerations when sharing human clinical or demographic data. Of primary concern are issues around consent and identifiability. Ideally, consent for data sharing is obtained from participants before data collection occurs. If the study design does not involve consenting new participants, then ethics board approval needs to be sought with respect to sharing data without consent. Careful de-identification (removal of personally identifying data such as birthdates) and/or anonymization needs to happen prior to sharing or deposition in a repository. Consultation with your ethics review board is required to determine what level of de-identification or anonymization is needed in your specific case. However, here are some basic guidelines (from https://irb.ucsf.edu/definitions#18) for data from the USA, although these are likely applicable in other jurisdictions:

1. an experienced expert determines that the risk that certain information could be used to identify an individual is “very small” and documents and justifies the determination, or

2. the data do not include any of the 18 identifiers (of the individual or his/her relatives, household members, or employers) which could be used alone or in combination with other information to identify the subject. Note that even if these identifiers are removed, the Privacy Rule states that information will be considered identifiable if the covered entity knows that the identity of the person may still be determined.

The 18 identifiers mentioned above are:

  1. Names;
  2. All geographical subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code, if according to the current publicly available data from the Bureau of the Census: (1) The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and (2) The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000.
  3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;
  4. Phone numbers;
  5. Fax numbers;
  6. Electronic mail addresses;
  7. Social Security numbers;
  8. Medical record numbers;
  9. Health plan beneficiary numbers;
  10. Account numbers;
  11. Certificate/license numbers;
  12. Vehicle identifiers and serial numbers, including license plate numbers;
  13. Device identifiers and serial numbers;
  14. Web Universal Resource Locators (URLs);
  15. Internet Protocol (IP) address numbers;
  16. Biometric identifiers, including finger and voice prints;
  17. Full face photographic images and any comparable images; and
  18. Any other unique identifying number, characteristic, or code (note this does not mean the unique code assigned by the investigator to code the data)

This paper has some special considerations for clinical data that is accompanying genomic data: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005873

Finally, a very thorough data dictionary is needed to accompany the data set to explain any data coding, categories, or other calculations. Column headings should describe the content of each column and contain only numbers, letters, and underscores – no spaces, or special characters. Lowercase letters are preferred. Row names should be consistent with those used in the article and in other related datasets. You can check the formatting of tables with goodtables.io.

Most suitable repositories:
Many repositories are suitable for tabular data such as International Clinical Trials Registry Platform, ClinicalTrials.gov, Inter-university Consortium for Political and Social Research and Qualitative Data Repository. Your institution, journal, or funder may recommend specific repositories, otherwise data can be added to any repository able to host generic file types, detailed here.

Best practice for indicating re-use of existing data:
For public datasets please provide a DOI or other stable identified for the dataset itself *and* include a citation for the dataset in the reference list. Be sure to indicate exactly which data has been re-used, particularly when multiple versions of the dataset exist. In many cases, this is best achieved by sharing the code used to extract the part of the data that you analyzed. In some cases it may be best to share the exact dataset(s) you analyzed as well.

For access-controlled data authors should provide a link to instructions for obtaining access (e.g. here is the information page for ADNI (Alzheimer's Disease Neuroimaging Initiative): http://adni.loni.usc.edu/data-samples/access-data/).

When re-using a private dataset from a previous study please contact the data owners to discuss how the data can be made public.

Most suitable repositories:
Not applicable

data_type/tabular_data/research_subjects.txt · Last modified: 2021/05/27 00:17 by samantha