bbmap split paired-end reads back into separated fastq files?
1
1
Entering edit mode
6.2 years ago
c.e.chong ▴ 60

Hi,

I am trying to remove human reads from my metagenomic sequences (skin samples).

I have used the command recommended in the bbmap documentation:

bbmap.sh minid=0.95 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=/path/to/hg19masked/ qtrim=rl trimq=10 untrim -ea -Xmx100g in=reads.fq outu=clean.fq outm=human.fq threads=12

My input is Illumina HiSeq 4000 Paired-end, 2x150 bp reads, so I want to split paired-end reads back into separated fastq files .._r1 .._r2.

Is there a way to do this using BBTools?

Thanks in advance!

genome next-gen-sequencing bbmap alignment • 7.8k views
ADD COMMENT
0
Entering edit mode

As shown your command is only using one read file (in=). So unless that file is interleaved there is no chance that out*= files will be interleaved. @ATPoint shows what an interleaved file should look like. Note: Your headers may look different but Read 1 and 2 from one clusters should be one after the other in the file.

ADD REPLY
0
Entering edit mode

Sorry that was a mistake, I have actually used in=r1.fq in2= r2.fq!

Thanks!

ADD REPLY
3
Entering edit mode
6.2 years ago
GenoMax 147k

Use reformat.sh from BBMap suite. This assumes your clean.fq is interleaved.

reformat.sh in=clean.fq out1=clean_R1.fq out2=clean_R2.fq
ADD COMMENT
0
Entering edit mode

Thank you very much for your help!

How would I find out if the clean.fq file is interleaved?

ADD REPLY
0
Entering edit mode

Check if two adjacent read pairs always have the same read name. This is an example of an interleaved file from a HiSeq3000:

@HWI-ST-J00104_BSF_0515:8:1101:10003:10036#DMSO_rep2_S43794/1
GGAAGAAAGACAGTCTGTGGCCCTGCCTGGGGACCTACACTGTCTGCTGTAACAGGCTTTCCCTTCATCTCAAGAG
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@HWI-ST-J00104_BSF_0515:8:1101:10003:10036#DMSO_rep2_S43794/2
GGTTGAAAAGTGGCCTCGGTGTTCCAGACCTACCTGGTTCACTTAGCTTTTTCCTCCTTTCCTTCCTTTTATACTC
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJFJJJJJJJFJJJJJJJJJ
@HWI-ST-J00104_BSF_0515:8:1101:10003:10141#DMSO_rep2_S43794/1
GTGATTCTCTTCAAGGCCACTCCTAAACAACTGTTGAATCCTTGATCCAGGCACCAAGCCCAGAGGTCCCTATCAG
+
AAAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<FJJJJJJJJJJJJJFJJJJJJJJJJJ
@HWI-ST-J00104_BSF_0515:8:1101:10003:10141#DMSO_rep2_S43794/2
GTCTAGGAGGAAACCAGGCTGTTAACTCCTCAGCAAAGACAAAGGAGGAGTTGGGCGGGGGCAAGACTAACTCCTA
+
AAFFFJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
ADD REPLY
0
Entering edit mode

Sorry to necro this but is there a difference in using reformat.sh vs bbsplit.sh to do this operation?

ADD REPLY
0
Entering edit mode

reformat.sh is simply separating R1/R2 reads that are interleaved as shown in @ATPoint's example above. bbsplit.sh on the other hand uses reference(s) genome(s) to bin the reads depending on how they align to those genomes.

ADD REPLY
0
Entering edit mode

can reformat.sh also be used to separate R1/R2 reads that are merged ?

ADD REPLY
0
Entering edit mode

No program can do that if the reads are already merged. Use following alternate method.

You could get the fastq headers from the merged files and then use them to retrieve reads from original R1/R2 files using filterbyname.sh program from BBMap suite.

ADD REPLY
0
Entering edit mode

Again sorry to necro this post, but is the correct way to work with paired end data to use in1= in2= and then have a single outu and outm , then run reformat.sh to get the paired unmapped reads when trying to remove human contaminants?

ADD REPLY
0
Entering edit mode

You can specify two output files as in outu1= outu2= etc. If you provide only one output file then the data will be written interleaved. Not many programs understand interleaved data and thus you will need to separate them into two files using reformat.sh.

ADD REPLY
0
Entering edit mode

It's validating that reformat.sh is the approach that the library author recommends here: Tool to separate human and mouse rna seq reads

ADD REPLY
0
Entering edit mode

Can reformat.sh also be used to merge the replicate file?

like-

reformat.sh in1=file1_R1.fastq in2=file1_R2.fastq in1=file2_R1.fastq in2=file2_R2.fastq out=merged_R1.fastq out2=merged_R2.fastq

where file1_R1.fastq and file1_R2.fastq are the paired-end reads from the first replicate, and file2_R1.fastq and file2_R2.fastq are the paired-end reads from the second replicate?

Is it the correct way to merge the replicate?

ADD REPLY
0
Entering edit mode

How is this question an answer to the original question? I'm moving it to a comment this time, please be more mindful in the future.

ADD REPLY
0
Entering edit mode

If you simply want to merge "technical" sequencing replicates then you can do the following

cat file1_R1.fastq file1_R1.fastq > merged_R1.fastq
cat file1_R2.fastq file2_R2.fastq > merged_R2.fastq
ADD REPLY

Login before adding your answer.

Traffic: 1827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6