quick way of contamination identification from fastq file
2
0
Entering edit mode
3.5 years ago
Sara ▴ 260

I have a fastq file from human data but we think when our colleague did sample prep, a mistake happened and now there is contamination with mouse sample. is there any quick way to understand if such a mistake happened (in addition to alignment to mouse and human separately)?

alignment • 1.4k views
ADD COMMENT
1
Entering edit mode

You can sample some reads for example 10000 reads? and alignment to mouse, this will make alignment very fast.

ADD REPLY
1
Entering edit mode
3.5 years ago
boczniak767 ▴ 870

Hi,

I don't know "quick way" but there is RemoveHuman script from the bbmap suite. In the linked post there is extensive discussion about it.

Maybe just blasting your reads to mouse genome could give you some information, but mouse after all share many sequences with human.

ADD COMMENT
1
Entering edit mode
3.5 years ago
GenoMax 147k

You can also use bbsplit.sh from BBMap suite to bin the reads. Depending actual level of contamination this could end up being tricky.

bbsplit.sh in1=reads1.fq.gz in2=reads2.fq.gz ref=human.fa,mouse.fa ambiguous2=toss basename=out_%.fq refstats=Statistics_%.txt

Example above removes reads that map to both genomes but you can take a look at ambiguous2= options if you want to to keep them.

ambiguous2=<best>   Set behavior only for reads that map ambiguously to multiple different references.
                    Normal 'ambiguous=' controls behavior on all ambiguous reads;
                    Ambiguous2 excludes reads that map ambiguously within a single reference.
                       best   (use the first best site)
                       toss   (consider unmapped)
                       all   (write a copy to the output for each reference to which it maps)
                       split   (write a copy to the AMBIGUOUS_ output for each reference to which it maps)
ADD COMMENT

Login before adding your answer.

Traffic: 1825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6