I am trying to align a set of paired end fastq filres to hg38 reference genome, before alignment I need to remove reads that map to chrM. I would also like to remove reads that map to alpha satellite repeats, Alu repeats, ribosomal DNA repeats,... . I read bowtie2 manual but I'm not sure how to remove repeats and mitochondrial reads. May anyone help me with this ?
so far I used this command with very sensitive options :
bowtie2 -k 1 -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 -x my_index -1 mate1.fastq -2 mate2.fastq
You cannot do that before alignment, you have to know where a read fits the best before filtering it out or not. For mitochondrial hits you can use
samtools
to remove reads falling into chrM. For all other kind of repeat events, I believe they will be considered as multimapped. Take a look at samtools flags