Entering edit mode
3.4 years ago
Arindam Ghosh
▴
530
The Ensembl human reference genome sequence and annotation GTF/GFF contains contigs and scaffold apart from the chromosomes. Should they be removed prior to alignment of RNA-Seq reads or BS-Seq reads?
It might be useful to keep such sequences to see if any reads align to any region and subsequently detect gene expression/methylation. But it might also cause multimapping.
See: Why is human genome FASTA file on GENCODE much smaller than that on ENSEMBL?
That give an idea that I am atleast using the correct file - the PRIMARY sequence. But it is in the primary sequence that contains Chr1-22, X, Y, MT and a few others. So do i keep these few other sequences or not?
For normal RNAseq you can keep everything in primary sequence.