Question

genome assembly vs genome annotation

0

Entering edit mode

4.5 years ago

Zeinab.mokhtar • 0

Hello everyone, please I need help to understand the difference between genome assembly vs genome annotation and which one I have to use in mapping "I'm using Hisat2 " and which genome is better to use GRCh38 or GRCh37 thanks in advance.

RNA-Seq alignment genome • 1.3k views

ADD COMMENT • link updated 4.1 years ago by a.alnawfal.1992 ▴ 360 • written 4.5 years ago by Zeinab.mokhtar • 0

0

Entering edit mode

4.1 years ago

a.alnawfal.1992 ▴ 360

Hi Zeinab, it dose not matter wither you are using Hisat2, BWA or any other alignment program. the most important question is what you want to achieve ?, second question which organism ? thired question is it DNA or RNA? for the reference you mentioned i believe it's human!, for me i'm using GRCh37/Hg19 for my DNA reads (Illumina), annotation is performed after variants calling step where i do have the variants but i want to know more about the impact of each and every particular variant. But if you data is RNA you need the reference and Annotations in GFF format. why? to aligned the reads to the transcript . on other word you need to add the annotation to the reference genome to let the aligner aligned the read to the proline coding regions where it's originally came from. Gencode database is best place for you to download the data (Reference genome and GFF). make sure you use the same version of both. Note:

it's possible to convert from GRCh38 to GRCh37 coordinate

Wish you all the best!

ADD COMMENT • link 4.1 years ago by a.alnawfal.1992 ▴ 360

score 2 · Accepted Answer · 2020-07-06

2

Entering edit mode

4.5 years ago

lieven.sterck 15k

Genome assembly is the process of putting together sequencing reads such to resemble as best as possible the original biological sequences the reads are derived from

genome annotation is adding knowledge to that assembly: adding genes, RNAs, binding sites, ... thus linking biological features to the assembly

typically it's good practise to use the latest version available (though I'm no user of that 'species')

ADD COMMENT • link 4.5 years ago by lieven.sterck 15k

0

Entering edit mode

so for mapping to a reference genome I should use genome annotation or both of them

ADD REPLY • link 4.5 years ago by Zeinab.mokhtar • 0

1

Entering edit mode

for mapping you will need at least the genome assembly (== the actual sequence) , more down the line it also makes sense to include the annotations. If you are mapping RNAseq data (gene expression study for instance) it makes sense to also use the annotations as this allows you to link you mapping result to specific genes

ADD REPLY • link 4.5 years ago by lieven.sterck 15k