Hi all,
I'm a beginner for RNA-seq and bioinformatics. I found something strange in my RNA-seq raw read from an insect. I have 4 samples; A1, A2, B1, B2. Sample A1 - A2, and B1 - B2 are biological replicates.
After I ran FastQC to qualify the quality of the raw reads. I got the warning from 'overrepresented sequences' from sample B2. Then, I randomly picked 50,000 reads from each sample and ran blastn. The result shows that
No. of reads which are matched with my model in the database; A1: 90%, A2: 93%, B1: 86%, B2: 47%
No. of 'No hit'; A1: 9%, A2 : 4%, B1 : 12%, B2 : 51%
The rest are matched with something else.
From the result, sample B2 has some problem. I checked 100 reads which have 'No hit' status and I found that some part of these reads (20 - 70 bp from 101 bp) are matched 80-90% to bacteria, virus, or fungus sequences.
I understand that it might be endosymbiont but why these bacteria, virus, or fungus sequences are inserted in the middle of my reads and why its replicate (B1) doesn't have this character?
Can someone explain what happens with my sample B2? Is there contamination? or What should I do?
Thank you in advance for your kind help,
Based on your description this does not quite sound like it is a bioinformatics issue.
Have you gone back and checked your experimental records to see if an obvious problem can may have been overlooked. You would want to focus on pre-library RNA QC and library QC. Since it only affects one of your biological replicates it does not seem to be a systematic issue with condition B. It is also possible that the sample just failed somewhere along the process.
Thank you for your suggestion, I have checked the pre-library RNA QC and library QC reports but the quality of B2 is similar to other samples with RIN is 10. I prepared total RNA and sent it to the sequencing company to perform library preparation and sequencing for me. Is it possible that something happened during library preparation or sequencing?
If anything it would likely be during library prep. Start by contacting sequence provider, explain your results and go from there.
I already contacted them but they said this issue is not contamination but they cannot explain why.
Are those matches of B2 to ribosomal genes, by chance?
Yes, when I looked at 47% of reads that matched to my model, some reads are matched to ribosomal genes. The rest with 'No hit' status, I also found ribosomal genes of other organisms but just 20 -70 bp in the middle of 101 bp reads are matched. Have you ever experienced any issue like this?
Thank you for your reply
That sounds definitely odd. I would suggest contacting sequencing company and letting them know about your observations. See if they are willing to troubleshoot this with you.