Question

Something strange in RNA-seq raw reads, It is contamination or not?

0

Entering edit mode

5.6 years ago

kamoltip.lao ▴ 30

Hi all,

I'm a beginner for RNA-seq and bioinformatics. I found something strange in my RNA-seq raw read from an insect. I have 4 samples; A1, A2, B1, B2. Sample A1 - A2, and B1 - B2 are biological replicates.

After I ran FastQC to qualify the quality of the raw reads. I got the warning from 'overrepresented sequences' from sample B2. Then, I randomly picked 50,000 reads from each sample and ran blastn. The result shows that

No. of reads which are matched with my model in the database; A1: 90%, A2: 93%, B1: 86%, B2: 47%

No. of 'No hit'; A1: 9%, A2 : 4%, B1 : 12%, B2 : 51%

The rest are matched with something else.

From the result, sample B2 has some problem. I checked 100 reads which have 'No hit' status and I found that some part of these reads (20 - 70 bp from 101 bp) are matched 80-90% to bacteria, virus, or fungus sequences.

I understand that it might be endosymbiont but why these bacteria, virus, or fungus sequences are inserted in the middle of my reads and why its replicate (B1) doesn't have this character?

Can someone explain what happens with my sample B2? Is there contamination? or What should I do?

Thank you in advance for your kind help,

RNA-Seq • 1.3k views

ADD COMMENT • link updated 5.6 years ago by shawn.w.foley ★ 1.3k • written 5.6 years ago by kamoltip.lao ▴ 30

0

Entering edit mode

Based on your description this does not quite sound like it is a bioinformatics issue.

Have you gone back and checked your experimental records to see if an obvious problem can may have been overlooked. You would want to focus on pre-library RNA QC and library QC. Since it only affects one of your biological replicates it does not seem to be a systematic issue with condition B. It is also possible that the sample just failed somewhere along the process.

ADD REPLY • link 5.6 years ago by GenoMax 147k

0

Entering edit mode

Thank you for your suggestion, I have checked the pre-library RNA QC and library QC reports but the quality of B2 is similar to other samples with RIN is 10. I prepared total RNA and sent it to the sequencing company to perform library preparation and sequencing for me. Is it possible that something happened during library preparation or sequencing?

ADD REPLY • link 5.6 years ago by kamoltip.lao ▴ 30

0

Entering edit mode

Is it possible that something happened during library preparation or sequencing?

If anything it would likely be during library prep. Start by contacting sequence provider, explain your results and go from there.

ADD REPLY • link 5.6 years ago by GenoMax 147k

0

Entering edit mode

I already contacted them but they said this issue is not contamination but they cannot explain why.

ADD REPLY • link 5.6 years ago by kamoltip.lao ▴ 30

0

Entering edit mode

Are those matches of B2 to ribosomal genes, by chance?

ADD REPLY • link 5.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes, when I looked at 47% of reads that matched to my model, some reads are matched to ribosomal genes. The rest with 'No hit' status, I also found ribosomal genes of other organisms but just 20 -70 bp in the middle of 101 bp reads are matched. Have you ever experienced any issue like this?

Thank you for your reply

ADD REPLY • link 5.6 years ago by kamoltip.lao ▴ 30

0

Entering edit mode

but just 20 -70 bp in the middle of 101 bp reads are matched.

That sounds definitely odd. I would suggest contacting sequencing company and letting them know about your observations. See if they are willing to troubleshoot this with you.

ADD REPLY • link 5.6 years ago by GenoMax 147k

score 0 · Answer 1 · 2019-04-23

0

Entering edit mode

5.6 years ago

shawn.w.foley ★ 1.3k

From your description it sounds like you have some contaminating material in your sample B2. I don't know enough about the system to comment on how this happened, intuitively I'd guess that at some point in the wet lab bacteria/fungi were introduced. As genomax said, it doesn't look like a systematic error or phenotype so much as bad luck.

It would be worth performing the mapping and generating some PCA analyses to see how your samples cluster, it's possible that this contamination is accomplishing no more than reducing the effective sequencing depth. You can get some preliminary analyses using these samples, but repeating the experiment, or at the very least repeating the B sample, would be the best way to proceed. This could be a useful resource in house for hypothesis generation (with a big asterisk because of the contamination) but it would need to be repeated before publication.

ADD COMMENT • link 5.6 years ago by shawn.w.foley ★ 1.3k

0

Entering edit mode

I'll perform the PCA to see the pattern as your suggestion. I planned to prepare a new sample B, so It will be better if I know the possible reason that I can avoid next time.

Thank you so much

ADD REPLY • link 5.6 years ago by kamoltip.lao ▴ 30

0

Entering edit mode

If you just do B on its own is that not going to add batch effects that may be difficult to deal with?

ADD REPLY • link 5.6 years ago by GenoMax 147k

0

Entering edit mode

It will be better if I perform sequencing for both conditions in the same batch, right? Just in case, I cannot collect more sample to run as replicate, only 1 sample for each condition, is it enough? I have a limitation with my sample, it's very difficult to collect. So, I could prepare only 2 replicates per condition in the first batch.

Thanks again for you help

ADD REPLY • link 5.6 years ago by kamoltip.lao ▴ 30

0

Entering edit mode

More replicates are always better, but you should be alright with repeating one rep for each condition. When you do the analysis just be sure to perform a batch correction so you can control for the fact that rep1 came from your first batch and rep2 came from the second.

ADD REPLY • link 5.6 years ago by shawn.w.foley ★ 1.3k