genome assembly vs genome annotation
2
0
Entering edit mode
4.5 years ago

Hello everyone, please I need help to understand the difference between genome assembly vs genome annotation and which one I have to use in mapping "I'm using Hisat2 " and which genome is better to use GRCh38 or GRCh37 thanks in advance.

RNA-Seq alignment genome • 1.3k views
ADD COMMENT
2
Entering edit mode
4.5 years ago

Genome assembly is the process of putting together sequencing reads such to resemble as best as possible the original biological sequences the reads are derived from

genome annotation is adding knowledge to that assembly: adding genes, RNAs, binding sites, ... thus linking biological features to the assembly

typically it's good practise to use the latest version available (though I'm no user of that 'species')

ADD COMMENT
0
Entering edit mode

so for mapping to a reference genome I should use genome annotation or both of them

ADD REPLY
1
Entering edit mode

for mapping you will need at least the genome assembly (== the actual sequence) , more down the line it also makes sense to include the annotations. If you are mapping RNAseq data (gene expression study for instance) it makes sense to also use the annotations as this allows you to link you mapping result to specific genes

ADD REPLY
0
Entering edit mode
4.1 years ago

Hi Zeinab, it dose not matter wither you are using Hisat2, BWA or any other alignment program. the most important question is what you want to achieve ?, second question which organism ? thired question is it DNA or RNA? for the reference you mentioned i believe it's human!, for me i'm using GRCh37/Hg19 for my DNA reads (Illumina), annotation is performed after variants calling step where i do have the variants but i want to know more about the impact of each and every particular variant. But if you data is RNA you need the reference and Annotations in GFF format. why? to aligned the reads to the transcript . on other word you need to add the annotation to the reference genome to let the aligner aligned the read to the proline coding regions where it's originally came from. Gencode database is best place for you to download the data (Reference genome and GFF). make sure you use the same version of both. Note:

  1. it's possible to convert from GRCh38 to GRCh37 coordinate

Wish you all the best!

ADD COMMENT

Login before adding your answer.

Traffic: 1797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6