remove duplicates with markdup
1
1
Entering edit mode
5.8 years ago
dimitrischat ▴ 210

Hello all. I am trying to do a paired-end analysis. Ran the 2 fastq files with hisat2:

hisat2 -p 10 -x '..index_hg19/indexed' -1 R1_001.fastq.gz -2 R2_001.fastq.gz -S hisat2output.sam

Then :

samtools view -@ 10 -bS hisat2output.sam > hisat2.bam
samtools sort -@ 12 -n hisat2.bam > testSort.bam
samtools fixmate -m -@ 12 testSort.bam testSort_fixmate.bam

then i try to remove duplicates :

samtools markdup -r -@ 12 testSort_fixmate.bam > testSort_fixmate-markdup.bam

but returns problem :

ERROR: queryname sorted, must be sorted by coordinate.

WHEN i don't use -n option in samtools sort , samtools markdup returns problem:

ERROR: Coordinate sorted, require grouped/sorted by queryname.

I cant find where is the problem. Any help ?

RNA-Seq samtools • 10k views
ADD COMMENT
1
Entering edit mode

I formatted your post (again) with code / quote markdown, to improve readability. Please use the formatting buttons in the future.

ADD REPLY
0
Entering edit mode

What does samtools view -H testSort.bam | head -5 show?

ADD REPLY
0
Entering edit mode
@HD VN:1.0  SO:queryname
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr19    LN:59128983
@SQ SN:chr20    LN:63025520
@SQ SN:chr21    LN:48129895
@SQ SN:chr22    LN:51304566
@SQ SN:chrM LN:16571
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@PG ID:hisat2   PN:hisat2   VN:2.1.0    CL:"/home/app/Desktop/apps/hisat2-2.1.0/hisat2-align-s --wrapper basic-0 -p 10 -x /home/app/Desktop/jim/indexes/hisat2/index_hg19/indexed -S hisat2output.sam -1 /tmp/32541.inpipe1 -2 /tmp/32541.inpipe2"
ADD REPLY
0
Entering edit mode

Likely not the solution to your problem, but you can make these commands a lot shorter and avoid intermediate files:

hisat2 -p 10 -x '..index_hg19/indexed' -1 R1_001.fastq.gz -2 R2_001.fastq.gz -S hisat2output.sam

Then :

samtools view -@ 10 -bS hisat2output.sam > hisat2.bam

samtools sort -@ 12 -n hisat2.bam > testSort.bam

By using pipes and directly going to samtools sort:

hisat2 -p 10 -x '..index_hg19/indexed' -1 R1_001.fastq.gz -2 R2_001.fastq.gz | samtools sort -n -o testSort.bam

This requires a reasonably recent version of samtools.

ADD REPLY
0
Entering edit mode

yes i do use them like you suggested. but now i am trying to figure out where the problem was. thats why i was running the commands 1 by 1

ADD REPLY
6
Entering edit mode
5.8 years ago
h.mon 35k

edited answer:

You have to name-sort for samtools fixmate, and coordinate-sort for samtools markdup. Here is the correct order of operations, from samtools man file:

# The first sort can be omitted if the file is already name ordered
samtools sort -n -o namesort.bam example.bam

# Add ms and MC tags for markdup to use later
samtools fixmate -m namesort.bam fixmate.bam

# Markdup needs position order
samtools sort -o positionsort.bam fixmate.bam

# Finally mark duplicates
samtools markdup positionsort.bam markdup.bam

original answer

The problem is you are sorting by name (samtools sort -n), and you should sort by coordinate:

samtools sort -@ 12 hisat2.bam > testSort.bam
ADD COMMENT
0
Entering edit mode

when i sort by coordinate it returns this problem :

[bam_mating_core] ERROR: Coordinate sorted, require grouped/sorted by queryname.

ADD REPLY
1
Entering edit mode

I edited my answer in view of this comment. Probably one error stems from samtools fixmate, and the other, from samtools markdup.

ADD REPLY
0
Entering edit mode

that worked. But could you tell me why my other pipeline was wrong? Do i need to do the one you suggested when its paired-end ?

ADD REPLY

Login before adding your answer.

Traffic: 2497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6