Entering edit mode
8 months ago
Sony
▴
20
Hello everyone,
I tried to map paired end reads to reference the genome using BWA-MEM and I got SAM file. When I sort my SAM file with SAMtools, I got this error:
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 1210764 reads in 479.833 CPU sec, 40.282 real sec
[E::sam_hrecs_update_hashes] Duplicate entry "scf7180000010076" in sam header
samtools view: failed to add PG line to the header
And this is command that I run for mapping:
bwa mem -t 8 -M -R '@RG\tID:SAMPLE_PE\tPL:ILLUMINA\tSM:ERR3890922' /opt/data/sony/thesis/pangenome_Oi_update_MaSuRCA/3_pan_MaS.fasta /opt/data/sony/thesis/dataset/indica/trimmed_read/ERR3890922_1_paired.fastq.gz /opt/data/sony/thesis/dataset/indica/trimmed_read/ERR3890922_2_paired.fastq.gz > ERR3890922.sam
samtools sort -o ERR3890922_sorted.bam ERR3890922.sam
I currently using samtools 1.19.2 and BWA 0.7.17
I don't understand why SAM header has "Duplicate entry" and what should I do in this case. Thank you everyone.
Your reference sequence must have fasta headers that are identical until the first space character (there may be additional stuff on the line but most aligners will truncate names after first space). So you can change those spaces to
_
and that should make all fasta headers unique.If you do
you should see any duplicate entries (up until first space in header).