Remove reads from fastq file
2
0
Entering edit mode
7.9 years ago
varsha619 ▴ 90

Hi, Could someone please help me with removing reads from a fastq file from a specific genomic location? I have only been able to look at methods for removing reads from a specific chromosome from the aligned sam file, using samtools or from fastq using sequence IDs. I would like to remove PCR contaminants from my fastq files by giving specific genome coordinates. I appreciate your help!

sequencing • 3.5k views
ADD COMMENT
0
Entering edit mode

See cutadapt, trimmomatic, fastxtoolkit for processing adapters/primers.

ADD REPLY
1
Entering edit mode
7.9 years ago

FASTQ files do not contain coordinates, so it is not possible to remove data based on that parameter. You would need to align and then filter, or filter by the sequence with one of the adapter-trimming tools (e.g., BBDuk or Trimmomatic).

ADD COMMENT
0
Entering edit mode

@harold.smith.tarheel, That makes sense, for example can something like "samtools view -b input.bam chr1:1-100 > output.bam" be used to remove sequences from the original file instead of extracting these regions to a new file?

ADD REPLY
1
Entering edit mode

From the manual:

-U FILE Write alignments that are not selected by the various filter options to FILE. When this option is used, all alignments (or all alignments intersecting the regions specified) are written to either the output file or this file, but never both.

It looks like you're using the syntax from an older version of SAMtools; I recommend updating to the current version.

ADD REPLY
0
Entering edit mode

@harold.smith.tarheel, Just to clarify, I used - samtools view in.sorted.bam -b -h -o inRegions.bam -U outRegions.bam -L Regions.bed... So here the -o file has the regions in "chr:start-stop" but the -U file excludes the regions in "chr:start-stop" and retains the rest? Thank you for your help!

ADD REPLY
1
Entering edit mode
7.9 years ago
GenoMax 148k

Instead of depending on genome co-ordinates you may want to use clumpify.sh from BBMap suite to identify duplicates (you can identify optical, PCR and other kinds) independent of alignments. Then depending on the severity of the issue decide what to do with them (just mark or remove). See this post for additional details on how you would use this tool: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files

ADD COMMENT

Login before adding your answer.

Traffic: 1698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6