I'm running a perl script (clipPairedEnd.pl
) which uses cutadapt to trim Illumina adapters from paired-end fastq files. I then use bwa aln
, bwa sampe
, and samtools view
to generate aln.bam
, this bam file has 248 lines. When I use the same process on the uncut fastq files I get 5M lines in the bam file. After some digging in my log files I found this
[bwa_sai2sam_pe_core] print alignments... [bwa_sai2sam_pe_core] paired reads have different names: "HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898", "HWI-ST1293:246:HFG23ADXX:1:1101:9432:1843"
When I try to find this position in the fastq files (pre and post adapter cut) here is what I see
less R1.fastq
495 +
496 #1=DDFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJFHIJHH>GIIIIJJIJJIGHHCEHFFFDBDFEEDDBB##############################################################################
497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 1:N:0:TCGCAGG
498 GGCTTTCCGGGTGTGTGTTTAAATTTTTTTTCTATTTAATAATGTTTTTTATTTGTGTTGTAGAATGCCAGAGGACTTGGATCTGAGCTAAAGGACAGTATTCCAGTTACTGAACTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACT
less R2.fastq
495 +
496 #######################################################################################################################################################
497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 2:N:0:TCGCAGG
498 AGTTCAGTAACTGGAATACTGTCCTTTAGCTCAGATCCAAGTCCTCTGGCATTCTACAACACAAATAAAAAACATTATTAAATAGAAAAAAAATTTAAACACACACCCGGAAAGCCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAG
Adapter trimmed fastq
less R1.fastq
495 +
496 !
497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 1:N:0:TCGCAGG
498 GGCTTTCCGGGTGTGTGTTTAAATTTTTTTTCTATTTAATAATGTTTTTTATTTGTGTTGTAGAATGCCAGAGGACTTGGATCTGAGCTAAAGGACAGTATTCCAGTTACTGAACT
less R2.fastq
495 +
496 #######################################################################################################################################################
497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 2:N:0:TCGCAGG
498 AGTTCAGTAACTGGAATACTGTCCTTTAGCTCAGATCCAAGTCCTCTGGCATTCTACAACACAAATAAAAAACATTATTAAATAGAAAAAAAATTTAAACACACACCCGGAAAGCCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAG
Can anyone tell me if the ! is causing "paired reads have different names" error message. If so any ideas on how to fix this? I find about 2000 lines that begin with !
in my adapter cut R1.fastq
, none in R2.fastq
?
Here is my trimming command
clipPairedEnd.pl -m1 read1.fastq -m2 read2.fastq -o1 R1.fastq -o2 R2.fastq -a1 AGATCGGAAGAGCACACGTCTGAACTCCAGTC -a2 TCTAGCCTTCTCGCAGCACATCC -s1 R1.stat -s2 R2.stat
Seeing lines 491-502 might be helpful for a little more context. There's nothing obviously wrong with the files from what you have posted, although that exclamation point was not an original quality score, and the reads were trimmed to different lengths, which is odd.
I just realized you asked for line 491-502, this seems like quite a few lines.
R1.FASTQ
R2.fastq
Trimmed fastqs
R1.fastq
R2.fastq