Dear all,
I have RNASEQ data and want to map against the reference genome. I download human genome from ftp://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/dna/
Toplevel sequences unmasked: Homo_sapiens.GRCh38.dna.toplevel.fa.gz
Here, I found other format files such as
Toplevel soft/hard masked sequences: Homo_sapiens.GRCh38.dna_sm.toplevel.fa.gz Homo_sapiens.GRCh38.dna_rm.toplevel.fa.gz
Is it good choice if I go with repeat masked version Homo_sapiens.GRCh38.dna_rm.toplevel.fa.gz and star mapping RNASEQ on this repeat masked human genome?
Heng Li recently posted some thoughts about this very topic.