User Tools

Site Tools


Genetic Data- Feature Table

MeSH ID: n/a

Number of reads of a feature from different types of assays: microarray, amplicon sequencing, proteomics, metagenomic shotgun sequencing, metabolomics, etc. Generally in the format of a data table with features as rows and samples as columns (can also be the reverse), where the cells indicate the number of reads of that feature in that sample. Usually in Excel or .csv format, although .biom files are also used.

Best practice for sharing this type of data:
Features should be shared at the lowest level collected (eg sequence variants, OTU), such that combining into higher level categories (eg genus/phylum) can be repeated if desired. Metadata should include information about what variant calling protocol was used (eg DADA2) and what database was used to find labels for the variants (eg Greengenes). Ideally, the read count data would be accompanied by any other experimental, environmental, clinical, or demographic data needed to recreate the analyses in the manuscript. If these are included as a separate data table, then it is vital that the ID codes for the samples can be linked to ID codes for sample level data. If the study involves human data, then ethical considerations around sharing need to be evaluated: Subject Data Table (Tabular data). Column headings should describe the content of each column and contain only numbers, letters, and underscores – no spaces, or special characters. Lowercase letters are preferred. Row names should be consistent with those used in the article and in other related datasets.

Most suitable repositories:
Tables containing feature data can be added to European Variation Archive, NCBI Gene, NCBI Genome or any repository able to host generic file types (e.g. Dryad, Zenodo, and FigShare)

Best practice for indicating re-use of existing data:
For public datasets please provide a DOI or other stable identified for the dataset itself *and* include a citation for the dataset in the reference list. Be sure to indicate exactly which data has been re-used, particularly when multiple versions of the dataset exist. In many cases, this is best achieved by sharing the code used to extract the part of the data that you analyzed. In some cases it may be best to share the exact dataset(s) you analyzed as well.

For access-controlled data authors should provide a link to instructions for obtaining access (e.g. here is the information page for ADNI (Alzheimer's Disease Neuroimaging Initiative):

When re-using a private dataset from a previous study please contact the data owners to discuss how the data can be made public.

data_type/genetic_data/feature_table.txt · Last modified: 2022/07/08 05:31 by souad