Entering edit mode
11 months ago
Chironex
▴
50
Hi, I have a question. i am processing a Chipseq experiment on mm10 genome. I did quality check, trimming, alignment, duplicate removal. The "problem" Is that I did not remove Uncharacterized chromosomes from reference fasta genome. I was planning to remove them After peak calling. The question Is, should I repeat the analysis removing them from reference fasta file used as input in bowtie2 or could I move forward with the analysis because It doesnt affect so much? What do you think?
It's fine and actually good not to remove them before alignment. Reads can come from these chromosomes, so removing takes away the true origin of the reads, potentially leading to spurious alignment to other contigs. You can remove from called peaks, or from the bam files to call the peaks. Both is fine.
Thank you very much! I was concerned to remove potential reads that can fall in both parts ( One canonical and One non canonical chromosome, for example), that Will be discarded in downstream analysis (Picard) because are flagged as multiple by bowtie2. So I potentially could lose some read (maybe not significant Number). But, as you Say, they can also be ambiguous. I tried removing them from the bam, but then, the numbers of PaiR1 and pair2 when I do 'samtools flagstat' are not identical anymore (another point that I would to understand if Is It normal that happens?!? i suppose yes, because there isnt the same nr of reads that falls in non canonical chromosomes , for each pair... ) so, for this reason I planned to remove them After peak calling, with blacklist regions, but people suggest me to do It before, so I am 'little' confused about the best and right way to do It.
That's a multimapper and is usually discarded. I think there is no problem in that.