User Tools

Site Tools


Genetic Data- Sequence Assembly

MeSH ID: n/a

Sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. From Wikipedia

Best practice for sharing this type of data:
A .fasta (compressed) file containing contigs/scaffolds or pseudomolecules. Usually, assembly will contain several accompanying files: annotation file(s) (.gff) and protein coding regions (.gff + .fasta of nucleotides & corresponding peptides).

Most suitable repositories:
Data may be added to DNA Databank of Japan, Ensembl, NCBI Gene, Reference Sequence Database, NCBI Genome, The Consensus CDS, European Nucleotide Archive, NCBI Assembly, GenBank, and NCBI BioProject.

Best practice for indicating re-use of existing data:
For public datasets please provide a DOI or other stable identified for the dataset itself *and* include a citation for the dataset in the reference list. Be sure to indicate exactly which data has been re-used, particularly when multiple versions of the dataset exist. In many cases, this is best achieved by sharing the code used to extract the part of the data that you analyzed. In some cases it may be best to share the exact dataset(s) you analyzed as well.

For access-controlled data authors should provide a link to instructions for obtaining access (e.g. here is the information page for ADNI (Alzheimer's Disease Neuroimaging Initiative):

When re-using a private dataset from a previous study please contact the data owners to discuss how the data can be made public.

data_type/genetic_data/assembly.txt · Last modified: 2022/07/08 05:31 by souad