ERROR: Duplicate entry "..." in sam header; samtools view: failed to add PG line to the header
0
0
Entering edit mode
12 days ago
Sony ▴ 10

Hello everyone,

I tried to map paired end reads to reference the genome using BWA-MEM and I got SAM file. When I sort my SAM file with SAMtools, I got this error:

[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 1210764 reads in 479.833 CPU sec, 40.282 real sec
[E::sam_hrecs_update_hashes] Duplicate entry "scf7180000010076" in sam header
samtools view: failed to add PG line to the header

And this is command that I run for mapping:

 bwa mem -t 8 -M -R '@RG\tID:SAMPLE_PE\tPL:ILLUMINA\tSM:ERR3890922' /opt/data/sony/thesis/pangenome_Oi_update_MaSuRCA/3_pan_MaS.fasta /opt/data/sony/thesis/dataset/indica/trimmed_read/ERR3890922_1_paired.fastq.gz /opt/data/sony/thesis/dataset/indica/trimmed_read/ERR3890922_2_paired.fastq.gz > ERR3890922.sam

samtools sort -o ERR3890922_sorted.bam ERR3890922.sam

I currently using samtools 1.19.2 and BWA 0.7.17

I don't understand why SAM header has "Duplicate entry" and what should I do in this case. Thank you everyone.

sort. SAMtools. BAM. SAM. • 274 views
ADD COMMENT
0
Entering edit mode

I don't understand why SAM header has "Duplicate entry" and what should I do in this case.

Your reference sequence must have fasta headers that are identical until the first space character (there may be additional stuff on the line but most aligners will truncate names after first space). So you can change those spaces to _ and that should make all fasta headers unique.

If you do

grep "^>" /opt/data/sony/thesis/pangenome_Oi_update_MaSuRCA/3_pan_MaS.fasta | sort 

you should see any duplicate entries (up until first space in header).

ADD REPLY

Login before adding your answer.

Traffic: 1421 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6