Question

Virus sequence detected in my RNA-seq reads

0

Entering edit mode

7.2 years ago

mfalco • 0

Hi, I recently recieved the fastq files from a sequencer service. This was an experiment with human cell lines and along with the sequences they gave me the next warning:

"we detected the presence of Xenotropic murine leukemia virus sequences in some of the samples, resulting in a higher than expected percentage of no matches in the mapping statistics."

Do you think I should remove these sequences form my reads before alignment? If so, how can I do it?

Thank you

RNA-Seq alignment virus • 1.7k views

ADD COMMENT • link updated 7.2 years ago by WouterDeCoster 47k • written 7.2 years ago by mfalco • 0

0

Entering edit mode

An easy way to identify those reads and put them aside is to add the sequence of XMLV as an extra chromosome in your reference file. You can create the index with it then map and have a look to the coverage of this contamination.

Before removing them, have a look in a browser (Ex: IGV) if they are really what the sequencer service said.

ADD REPLY • link 7.2 years ago by VHahaut ★ 1.2k

0

Entering edit mode

Contamination of NGS data with sequences of unknown provenance is not an unknown. If you search PubMed you will find many reports. Detecting XMLV (a few reads) may be acceptable as opposed to massive contamination (what fraction of reads are XMLV?). Do you or anyone nearby work with mouse? If not the contamination could have originated at the sequence provider as well (if they made the libraries).

ADD REPLY • link 7.2 years ago by GenoMax 147k

0

Entering edit mode

You could try to make the situation work in your favor. Try to see how those patients/cell-lines with virus and different from rest of samples in transcriptomics profile.

ADD REPLY • link 7.2 years ago by Chirag Nepal ★ 2.4k

score 2 · Answer 1 · 2017-08-25

Those reads will most probably not influence your alignment and as such also won't have an influence on your downstream analysis.

But what is way worse is that this viral infection will influence your biological results. Your cells will behave differently when infected and will show a different transcriptomic profile, i.e. more antiviral genes will be expressed. So that's a problem.

Best you can do is identify which samples are affected and specify this as a covariate in your model for differential expression analysis.