Hello,
Hoping someone can help me with this one as I'm failing to find a solution anywhere online as yet.
I generated sam files using 'bwa mem' as follows:
bwa mem -M -t 28 mm10bwaidx 1.fastq.gz 2.fastq.gz > output.sam
The data were PE 75bp reads, and as I had only one pair of fastq per sample I chose not to include any RG.
I expected the QNAME in the sam file to be the illumina FASTQ sequence header/ID, for example:
K00103:94:H73C2BBXX:7:1103:14194:9737
Rather, what I have is QNAMEs that look like this:
ERR174324.81165065
This seems to be causing me problems as far as detecting and marking optical duplicates using Picard is concerned.
Does anyone know why this is happening and how to redress the issue?
Best Wishes
How and where did you download this data from? SRA or EBI? Using the
-F
option withfastq-dump
would have given you the fastq headers in original Illumina format.Note: ENA fastq version has these headers
fastq-dump
with-F
producesThank you very much for your response, the data were indeed downloaded from the EBI ENA.