Question

quick way of contamination identification from fastq file

0

Entering edit mode

3.5 years ago

Sara ▴ 260

I have a fastq file from human data but we think when our colleague did sample prep, a mistake happened and now there is contamination with mouse sample. is there any quick way to understand if such a mistake happened (in addition to alignment to mouse and human separately)?

alignment • 1.4k views

ADD COMMENT • link updated 3.5 years ago by MatthewP ★ 1.4k • written 3.5 years ago by Sara ▴ 260

1

Entering edit mode

You can sample some reads for example 10000 reads? and alignment to mouse, this will make alignment very fast.

ADD REPLY • link 3.5 years ago by MatthewP ★ 1.4k

score 1 · Answer 1 · 2021-05-31

1

Entering edit mode

3.5 years ago

boczniak767 ▴ 870

Hi,

I don't know "quick way" but there is RemoveHuman script from the bbmap suite. In the linked post there is extensive discussion about it.

Maybe just blasting your reads to mouse genome could give you some information, but mouse after all share many sequences with human.

ADD COMMENT • link 3.5 years ago by boczniak767 ▴ 870

score 1 · Answer 2 · 2021-05-31

You can also use bbsplit.sh from BBMap suite to bin the reads. Depending actual level of contamination this could end up being tricky.

bbsplit.sh in1=reads1.fq.gz in2=reads2.fq.gz ref=human.fa,mouse.fa ambiguous2=toss basename=out_%.fq refstats=Statistics_%.txt

Example above removes reads that map to both genomes but you can take a look at ambiguous2= options if you want to to keep them.

ambiguous2=<best>   Set behavior only for reads that map ambiguously to multiple different references.
                    Normal 'ambiguous=' controls behavior on all ambiguous reads;
                    Ambiguous2 excludes reads that map ambiguously within a single reference.
                       best   (use the first best site)
                       toss   (consider unmapped)
                       all   (write a copy to the output for each reference to which it maps)
                       split   (write a copy to the AMBIGUOUS_ output for each reference to which it maps)