Entering edit mode
8.5 years ago
prp291
▴
70
Hi, I have control and treatment FASTQ file in triplicate. Alignment percentage for control files are 44% while in treatment it is 77%. My question is when I will count the reads from control and treatment for differential expression analysis, there will always a bias for treatment as the percentage of alignment is high. How can I solve this problem? Thanks
Thanks. Can you point out some probable cause of this discrepancy?
Is there a difference in (sample) quality? Have you tried blasting the non-mapping reads to find out what it is? (Contamination or very bad quality?) Do you have library QC metrics? Have you used FastQC?
tHANKS FOR YOUR HELP. all libraries have Q30 more than 97%. They also passed all parameters of fastqc except the kmer. I will try to BLAST the unmapped reads.
If the data are RNA-Seq, the likeliest explanation is ribosomal RNA contamination of the controls. Repetitive sequences such as rRNA are typically masked and not reported as aligned.
Thanks. How to get rid of them?
Informatically .. Don't count reads that align to features you don't want to consider (e.g. rDNA). Does your reference contain rDNA?
Ideally rRNA should have been removed (by an appropriate experimental method) during sample prep itself.
I didn't use the rib-depletion methods for library preparation therefore I assume that I have rRNA sequence there.