Difference between revisions of "Molecular pathology"

Molecular pathology (view source)

1,273 bytes added , 11:02, 3 May 2019

→‎Data formats: NCBI repositories

Account-creators

1,040

edits

@@ Line 119: / Line 119: @@
 | rarely done; follows karyotyping to better characterize unusual cases; can be thought of as a karyotype and a simultaneous ISH
 |}
+==Data formats==
+Human gene naming is provided by the HUGO Gene Nomenclature Committee: https://www.genenames.org/
+DNA data repositories
+* NCBI: National Center for Biotechnology Information
+**Standard sequencing data is usually located in Nucleotide database: https://www.ncbi.nlm.nih.gov/nuccore
+**Next-gen sequencing data is in short read archive: https://www.ncbi.nlm.nih.gov/sra/docs/submit/
+* EMBL: European Molecular Biology Laboratory
+* DDBJ: DNA Data Bank of Japan
+DNA sequence data formats
+* GenBank: human readable, can be processed by computer (fixed width, first 10 characters are an identifier).
+** NCBI Reference Sequence (RefSeq) project provides sequence records and related information.
+** Prefix AC_ in acession number is for genomic data, NM_ is for mRNA.
+* FASTA: Sequence information
+** Header starts with > and is followed by a sequence ID.
+** Sequence lines should wrap always at the same width.
+** Lower-case letters may indicate repetitive regions.
+* FASTQ: Current standard for sequencing data
+** It is essentially FASTA with quality values for the sequence.
+** Quality is on a scale from 0 - 40 and represented by a distinct character.
+** Upper case letters ABCDEFGHI means high quality.
+** Special letters !"#$%&'()*+,-. mean low quality.
 ==Polymerase chain reaction-based techniques==