Hi, I recently recieved the fastq files from a sequencer service. This was an experiment with human cell lines and along with the sequences they gave me the next warning:
"we detected the presence of Xenotropic murine leukemia virus sequences in some of the samples, resulting in a higher than expected percentage of no matches in the mapping statistics."
Do you think I should remove these sequences form my reads before alignment? If so, how can I do it?
Thank you
An easy way to identify those reads and put them aside is to add the sequence of XMLV as an extra chromosome in your reference file. You can create the index with it then map and have a look to the coverage of this contamination.
Before removing them, have a look in a browser (Ex: IGV) if they are really what the sequencer service said.
Contamination of NGS data with sequences of unknown provenance is not an unknown. If you search PubMed you will find many reports. Detecting XMLV (a few reads) may be acceptable as opposed to massive contamination (what fraction of reads are XMLV?). Do you or anyone nearby work with mouse? If not the contamination could have originated at the sequence provider as well (if they made the libraries).
You could try to make the situation work in your favor. Try to see how those patients/cell-lines with virus and different from rest of samples in transcriptomics profile.