anyway to save contaminated samples?
0
0
Entering edit mode
3.8 years ago

Hello

I aligned a set of reads with C elegans genome. The alignment scores were around 80%, except for two samples, which hit 40%. I blasted the unaligned reads and it seems to come from drosophila (which we have no idea why). I aligned the samples again, this time for drosophila, and those 2 samples got a score of around 40% as well. Because the sample size is small I have been considering discarding the unmapped reads instead of discarding the whole sample. I assume a normalization like TMM could reduce the possible noise caused by the reduced counts and if the PCA clusters make sense, I would use the data in downstream analysis. Any opinions on this? Should I just discard those samples?

RNA-Seq • 694 views
ADD COMMENT
1
Entering edit mode

I would be very skeptical of the reads unless you figure out why there was so much contamination from an exogenous organism. Was your sequencing run shared with anyone else? Perhaps there could have been a mixup with barcodes or something.

ADD REPLY
0
Entering edit mode

We do have drosophila samples we sent to the same place for sequencing, so mislabeling was the initial suspicion. So I tried aligning the samples with drosophila genome, the clean samples had less than 1% of alignment and the dirty samples had ~40%. I also aligned the drosophila samples with c elegans genome and it was less than 1% too, with ~ 94% of alignment with drosophila. Very impressive from HISAT2 I guess. So I think is more likely samples got actually mixed somehow.

ADD REPLY
0
Entering edit mode

Sounds like there was definitely some sample mixup or problem somewhere. I wouldn't be confident in the reads you did recover from the samples, because there is no guarantee the labels are correct for those.

Your best bet would be to talk to the sequencing provider and also go back and see if there was any problem during sample collection.

ADD REPLY
1
Entering edit mode

If you are sure that your original samples were actually from C. elegans then you can ask your sequencing provider to re-make and resequence the libraries. Or at least check to make sure nothing amiss happened on their end.

ADD REPLY
0
Entering edit mode

I guess one possibility, if you are sure you have C. elegans and drosophila would be to combine the reference genomes you are aligning against, then align all the reads to this 'hybrid' and see if they partition between the two samples. The alignment score as you have already looked at should improve as an indicator. I would consider that the safest way of being able to use the reads. Then it depends what analysis is planned downstream..

ADD REPLY

Login before adding your answer.

Traffic: 1918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6