Question

HTseq-count finding single-end alignment after samtools markdup

0

Entering edit mode

2.6 years ago

Cory ▴ 10

I found that one of my samples had many more reads than the others (~113 million vs 30 million in the flagstat output for the BAM files). I ran samtools markdup after sorting the .bam file by name, tagging with fixmate, then sorting by position for markdup. Below is the markdup stats output, showing that I ended up with about 33 million reads in the markdup.bam file after duplicate removal.

COMMAND:

samtools markdup -r -s Gerson-54_paired_pec.positionsort.bam Gerson-54_paired_pec.markdup.bam
READ: 113282791
WRITTEN: 33036290
EXCLUDED: 0
EXAMINED: 113282791
PAIRED: 106910760
SINGLE: 6372031
DUPLICATE PAIR: 74415464
DUPLICATE SINGLE: 5831037
DUPLICATE PAIR OPTICAL: 0
DUPLICATE SINGLE OPTICAL: 0
DUPLICATE NON PRIMARY: 0
DUPLICATE NON PRIMARY OPTICAL: 0
DUPLICATE PRIMARY TOTAL: 80246501
DUPLICATE TOTAL: 80246501
ESTIMATED_LIBRARY_SIZE: 16975904

I then sorted this markdup.bam file by name and tried to run HTseq (htseq-count -f bam -r name --stranded=yes), but I got the following message:

Error occured when processing BAM input (record #6 in file Gerson-54_paired_pec.markdup.sorted.bam): Sequence of paired-end alignments expected, but got single-end alignment.

I see that in the markdup stats that there are singles, but I'm not sure how to deal with these as they appear to be interfering with the HTseq-count. Am I doing something wrong? Is there a way to eliminate the singe-end alignments, or is there a better way to deal with the issue?

Thanks for the help!

samtools rna-seq htseq • 755 views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 2.6 years ago by Cory ▴ 10

score 1 · Answer 1 · 2022-04-14

1

Entering edit mode

2.6 years ago

Istvan Albert 102k

try filtering the BAM file to retain paired alignments only

samtools view -b -F 1 input.bam > output.bam

where

samtools flags 1
0x1 1   PAIRED