Hello everyone, please I need help to understand the difference between genome assembly vs genome annotation and which one I have to use in mapping "I'm using Hisat2 " and which genome is better to use GRCh38 or GRCh37 thanks in advance.
Hello everyone, please I need help to understand the difference between genome assembly vs genome annotation and which one I have to use in mapping "I'm using Hisat2 " and which genome is better to use GRCh38 or GRCh37 thanks in advance.
Genome assembly is the process of putting together sequencing reads such to resemble as best as possible the original biological sequences the reads are derived from
genome annotation is adding knowledge to that assembly: adding genes, RNAs, binding sites, ... thus linking biological features to the assembly
typically it's good practise to use the latest version available (though I'm no user of that 'species')
Hi Zeinab, it dose not matter wither you are using Hisat2, BWA or any other alignment program. the most important question is what you want to achieve ?, second question which organism ? thired question is it DNA or RNA? for the reference you mentioned i believe it's human!, for me i'm using GRCh37/Hg19 for my DNA reads (Illumina), annotation is performed after variants calling step where i do have the variants but i want to know more about the impact of each and every particular variant. But if you data is RNA you need the reference and Annotations in GFF format. why? to aligned the reads to the transcript . on other word you need to add the annotation to the reference genome to let the aligner aligned the read to the proline coding regions where it's originally came from. Gencode database is best place for you to download the data (Reference genome and GFF). make sure you use the same version of both. Note:
Wish you all the best!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
so for mapping to a reference genome I should use genome annotation or both of them
for mapping you will need at least the genome assembly (== the actual sequence) , more down the line it also makes sense to include the annotations. If you are mapping RNAseq data (gene expression study for instance) it makes sense to also use the annotations as this allows you to link you mapping result to specific genes