How to check if a Fastq file is contaminated with other strains?
1
0
Entering edit mode
3.9 years ago
askif4 ▴ 20

I have some Fastq files of the mouse(later mapped to the B6=mm10 reference sequence).

But when I looked at bam files with IGV, some reads were found out to be Rat's (https://www.ncbi.nlm.nih.gov/assembly/GCF_000001895.5)

I have used web Blastn to check some of the reads but it is impossible to check all the reads one by one.

I installed Blastn for Linux but I couldn't figure out how to use it for comparing with limited reference sequences (In my case, I want to compare the reads with only mouse and rat reference sequences)

If you could help me, I would be grateful.

Thank you.

gene next-gen sequencing assembly • 2.7k views
ADD COMMENT
1
Entering edit mode

But when I looked at bam files with IGV, some reads were found out to be Rat's

How did you decide that? Did you realign the data to rat genome or just by selecting Rat genome instead of Mouse in IGV? I am surprised IGV allowed you to choose an unrelated genome to view an alignment.

Rat is not a mouse strain but a different species.

ADD REPLY
0
Entering edit mode

Ah, I was searching for the mutation spots with low VAFs. And I found some suspicious reads. So I copied the read sequences and pasted them into web Blastn. That's how I found out that it was contaminated.

It was like this

https://ibb.co/5jD8qtV

ADD REPLY
1
Entering edit mode

Maybe they indeed share some sequences. You can check by mapping reads to mouse and rat ref seqs using bowtie2 , blastn is too slow.

ADD REPLY
0
Entering edit mode

Thank you for your reply, I will try

ADD REPLY
1
Entering edit mode

In addition to what was said, you might also consider to use the BlobToolKit pipeline (paper). I never use it. I just read the paper, but it seems that in case of contamination, it can provide useful insights. Though the other options mentioned seem to be more straightforward to follow.

ADD REPLY
0
Entering edit mode

You can check it through kaiju server, you simply upload the fastq files and get the output

http://kaiju.binf.ku.dk/

ADD REPLY
1
Entering edit mode
3.9 years ago

Use fastqscreen.

  1. Download the genomes of suspect organisms.
  2. Index them
  3. Use fastqscreen to check the contamination.

By default, fastqscreen, checks for few model genomes and contaminating vector sequences. One can supply genomes and sequences externally and check for contamination. In general, fastqscreen checks for few reads and one increase this number.

ADD COMMENT
0
Entering edit mode

Thank you! I would try that tool

ADD REPLY
0
Entering edit mode

This worked perfectly! Thank you again

ADD REPLY

Login before adding your answer.

Traffic: 2683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6