Virus sequence detected in my RNA-seq reads
1
0
Entering edit mode
7.3 years ago
mfalco • 0

Hi, I recently recieved the fastq files from a sequencer service. This was an experiment with human cell lines and along with the sequences they gave me the next warning:

"we detected the presence of Xenotropic murine leukemia virus sequences in some of the samples, resulting in a higher than expected percentage of no matches in the mapping statistics."

Do you think I should remove these sequences form my reads before alignment? If so, how can I do it?

Thank you

RNA-Seq alignment virus • 1.7k views
ADD COMMENT
0
Entering edit mode

An easy way to identify those reads and put them aside is to add the sequence of XMLV as an extra chromosome in your reference file. You can create the index with it then map and have a look to the coverage of this contamination.

Before removing them, have a look in a browser (Ex: IGV) if they are really what the sequencer service said.

ADD REPLY
0
Entering edit mode

Contamination of NGS data with sequences of unknown provenance is not an unknown. If you search PubMed you will find many reports. Detecting XMLV (a few reads) may be acceptable as opposed to massive contamination (what fraction of reads are XMLV?). Do you or anyone nearby work with mouse? If not the contamination could have originated at the sequence provider as well (if they made the libraries).

ADD REPLY
0
Entering edit mode

You could try to make the situation work in your favor. Try to see how those patients/cell-lines with virus and different from rest of samples in transcriptomics profile.

ADD REPLY
2
Entering edit mode
7.3 years ago

Those reads will most probably not influence your alignment and as such also won't have an influence on your downstream analysis.

But what is way worse is that this viral infection will influence your biological results. Your cells will behave differently when infected and will show a different transcriptomic profile, i.e. more antiviral genes will be expressed. So that's a problem.

Best you can do is identify which samples are affected and specify this as a covariate in your model for differential expression analysis.

ADD COMMENT

Login before adding your answer.

Traffic: 1594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6