How to remove reads from fastq flle that match to a set of reads in my fasta file?
2
1
Entering edit mode
6.9 years ago
MAPK ★ 2.1k

I have a fastq file that seems to be contaminated by some sequences contaminating my reagents during library preparation. If I know the reads that came from reagents and I have them in a fasta format, do you think I can eliminate those reads from my fastq file? I want to remove any reads contaminating my fastq file. How can I work this out?

fastq fasta contamination • 5.4k views
ADD COMMENT
1
Entering edit mode
-Assess and QC Fastq 
-Format fastq to fasta
-BLAST to reagent fasta.
-Parse blast results and fasta (from fastq), by removing hits to reagents
ADD REPLY
0
Entering edit mode

I think you forgot to include a link to the program that does this.

ADD REPLY
0
Entering edit mode

The OP asked "How can I work this out?". The above comment illustrates a pipeline in which to complete the OP's task, so not one single program.

  1. FASTQC and quality trimmer
  2. Converter program from FASTQ to FASTA (several exist, e.g. fastxtoolkit)
  3. BLAST
  4. Several posts on Biostars are available for reference in parsing out sequences from FASTA files based on BLAST results

    I was unaware, until your answer @genomax, that the BBMap suite had this option.

ADD REPLY
0
Entering edit mode

The way you wrote that made it seem like you had copied/pasted that from github description of a package :-)

BTW: @Brian includes a sequencing_artifacts.fa.gz file (in resources directory) that I assume includes contaminants (which may be seen at other places but I assume are seen at JGI).

Various things that BBMap suite can do are here, if you have not seen this post before.

ADD REPLY
0
Entering edit mode

Trying to avoid black box/turn key solutions, so one can learn in the process.

ADD REPLY
0
Entering edit mode

Hi did you find the solution for that? If my contamination reads and true reads both are in fastq file then how to remove those reads ?

ADD REPLY
0
Entering edit mode

I gave you two additional answers in other thread you posted this in: C: Subtracting one FASTAq file Reads from other FASTAq reads

ADD REPLY
6
Entering edit mode
6.9 years ago
GenoMax 147k

By using bbduk.sh from BBMap. Provide the contaminants as a multi-fasta file with ref= option.

ADD COMMENT
0
Entering edit mode
6.9 years ago

My NanoLyse script is written for that, using the minimap2 aligner under the hood. It's mainly intended for long reads (Oxford Nanopore/PacBio).

ADD COMMENT

Login before adding your answer.

Traffic: 2488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6