bwa mapping after fastx-trimmer
3
0
Entering edit mode
9.6 years ago
User6891 ▴ 330

Hi everyone,

I have some Illumina MiSeq data (sequenced beginning 2015). The reads are paired-end 2x250 bp. I need to cut 100 bp of the end of each read. Therefore I used fastx-toolkit trimmer using the following code

fastx_trimmer -t 100 -Q33 -i file.fastq -o file_trimmed.fastq

Although the files were sequenced quite recent, I needed to use the Q33 parameter, otherwise fastx_trimmer was not working.

Now I want to map these data using bwa-mem. I get however the following error message:

[mem_sam_pe] paired reads have different names: "M01441:126:000000000-ADL98:1:1104:10392:3758", "M01441:126:000000000-ADL98:1:1104:23060:3763"

So bwa-mem ended with an error. Does anyone know why this is happening and how to solve it?

fastx-toolkit bwa • 4.0k views
ADD COMMENT
2
Entering edit mode
9.6 years ago

Fastx cannot handle paired reads; if you use it for trimming or filtering of paired reads, your data will get corrupted. It's possible to fix this, but the best solution is to use a different program such as BBDuk or seqtk, and process both reads simultaneously.

ADD COMMENT
0
Entering edit mode

I have seqtk installed. Would this work to trim the last 100 bases?

./seqtk trimfq -e 100 file.fastq.gz > file.trimmed.fastq

You're saying to process both reads simultaneously ... but how can I do that in seqtk?

I have used the above command on both R1 and R2 .fastq files seperately. BWA-mem now finishes successfully. However GATK is now ending with an error:

ERROR MESSAGE: SAM/BAM file SAMFileReader{file_sorted.bam} is malformed: the BAM file has a read with no stored bases (i.e. it uses '*') which is not supported in the GATK; see the --filter_bases_not_stored argument. Offender: M01441:126:000000000-ADL98:1:2104:23734:22483
ADD REPLY
0
Entering edit mode

Ahh! Sorry, my mistake. I could have sworn that seqtk processed files as pairs, but according to the manual, I don't see any option for that. You can do it with BBDuk, though:

bbduk.sh in1=r1.fq.gz in2=r2.fq.gz out1=trimmed1.fq.gz out2=trimmed2.fq.gz ftr2=100

By default, BBDuk will always leave a minimum of 1bp in a read to prevent the above problem. However, you can use the flag minlen=30 to, for example, throw away all read pairs that end up shorter than 30bp after trimming.

ADD REPLY
0
Entering edit mode

And can you also use it in the following situation:

I want to remove the first 4 bases & then keep the next 120?

ADD REPLY
0
Entering edit mode

Yes, in that case:

bbduk.sh in1=r1.fq.gz in2=r2.fq.gz out1=trimmed1.fq.gz out2=trimmed2.fq.gz ftl=4 ftr=123

ftl=X means trim the leftmost X bases; ftr2=X means trim the rightmost X bases; and ftr=X means trim all the rightmost bases after position X, 0-based.

ADD REPLY
0
Entering edit mode
9.6 years ago

I'm not sure BWA likes the fastqs when length(sequence|quality)==0. Try to put the following awk script after fastx:

awk '{if(NR%4==2 && length($0)==0) { printf("N\n");} else if(NR%4==0 && length($0)==0) { printf("#\n");} else {print;}}'
ADD COMMENT
0
Entering edit mode
9.6 years ago

If you just want to trim off the last 100, i.e. keep the first 150 bp you could do something like this:

zcat reads.fq.gz \
| paste  - - - - \
| awk -v OFS='\n' -v FS='\t' '{print $1, substr($2, 1, 150), $3, substr($4, 1, 150)}' \
| gzip > fq1.trim.fq.gz

(NB: There is no adapter or quality trimming here)

ADD COMMENT

Login before adding your answer.

Traffic: 995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6