User Tools

Site Tools


Genetic Data- Read Counts

MeSH ID: n/a

Number of reads of a feature from different types of assays: microarray, amplicon sequencing, proteomics, metagenomic shotgun sequencing, metabolomics, etc. Generally in the format of a data table with features as rows and samples as columns (can also be the reverse), where the cells indicate the number of reads of that feature in that sample. Usually in Excel or .csv format, although .biom files are also used.

Best practice for sharing this type of data:
Features should be shared at the lowest level collected (eg sequence variants, OTU), such that combining into higher level categories (eg genus/phylum) can be repeated if desired. Metadata should include information about what variant calling protocol was used (eg DADA2) and what database was used to find labels for the variants (eg Greengenes). Ideally, the read count data would be accompanied by any other experimental, environmental, clinical, or demographic data needed to recreate the analyses in the manuscript. If these are included as a separate data table, then it is vital that the ID codes for the samples can be linked to ID codes for sample level data. If the study involves human data, then ethical considerations around sharing need to be evaluated: Subject Data Table (Tabular data). Column headings should describe the content of each column and contain only numbers, letters, and underscores – no spaces, or special characters. Lowercase letters are preferred. Row names should be consistent with those used in the article and in other related datasets.

Most suitable repositories:
Tables containing read count data may be added to Database of Genotypes and Phenotypes, NCBI Gene, Reference Sequence Database, Genetic Testing Registry, European Mouse Mutant Archive, European Nucleotide Archive, European Variation Archive, Fungal and Oomycete Genomics Resource, GenBank, NCBI Genome, Genome Sequence Archive, Genomic Expression Archive, Japanese Genotype-Phenotype Archive, MGnify, NCBI Trace Archives, and Sequence Read Archive.

Best practice for indicating re-use of existing data:
For public datasets please provide a DOI or other stable identified for the dataset itself *and* include a citation for the dataset in the reference list. Be sure to indicate exactly which data has been re-used, particularly when multiple versions of the dataset exist. In many cases, this is best achieved by sharing the code used to extract the part of the data that you analyzed. In some cases it may be best to share the exact dataset(s) you analyzed as well.

For access-controlled data authors should provide a link to instructions for obtaining access (e.g. here is the information page for ADNI (Alzheimer's Disease Neuroimaging Initiative):

When re-using a private dataset from a previous study please contact the data owners to discuss how the data can be made public.

Most suitable repositories:
Not applicable

data_type/genetic_data/read_counts.txt · Last modified: 2021/05/28 21:33 by samantha