Picard Markduplicates and samtools markdup Issues with samtools view and htseq
0
0
Entering edit mode
3.4 years ago
aka ▴ 10

Hi,

I do RNASeq data processing, performed trimming and did an alignment with BWA. My BAM is then sorted by coordinates.

The lab told me they had to increase the PCR cycles and I wanted to see how many duplicates there were, and eliminate them if necessary. I used the command:

        "java -Xmx4g -jar $PICARD MarkDuplicates I={input.bam_sort} O={output.final_bam_MARKUP} M={output.metric} ASO=coordinate && samtools flagstat {output.final_bam_MARKUP} > {output.flag} "

and used samtools flagstat and got the file below and the duplicate part is huge.

174209662 + 0 in total (QC-passed reads + QC-failed reads)
3114348 + 0 secondary
0 + 0 supplementary
59374462 + 0 duplicates
112681507 + 0 mapped (64.68% : N/A)
171095314 + 0 paired in sequencing
85547657 + 0 read1
85547657 + 0 read2
94945264 + 0 properly paired (55.49% : N/A)
100496082 + 0 with itself and mate mapped
9071077 + 0 singletons (5.30% : N/A)
5415164 + 0 with mate mapped to a different chr
4365520 + 0 with mate mapped to a different chr (mapQ>=5)

I then wanted to filter with the samtools view command to keep only the proper pairs:

        " samtools index -@ {threads} {input.final_bam_MARKUP} && samtools view -q 20 -f 2 -F 3840 --threads {threads} -b {input.final_bam_MARKUP} -> {output.final_bam}  && samtools flagstat {output.final_bam} > {output.flag2} "

but it crashes and doesn't want to filter me and displays it this error:

[main_samview] region "-" specifies an invalid region or unknown reference. Continue anyway.

I tried with samtools markdup with this method but it gave me the same error than with picard, then I filtered before the proper pairs and I eliminated the duplicates directly with samtools markup, it works in this sense and gives me this flagstat:

Flagstat flitered for proper pairs and duplicates markup:

91372054 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
48231215 + 0 duplicates
91372054 + 0 mapped (100.00% : N/A)
91071666 + 0 paired in sequencing
45535833 + 0 read1
45535833 + 0 read2
91071666 + 0 properly paired (100.00% : N/A)
91071666 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Flagstat samtools markup removes duplicates in propers pairs:

43140839 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
43140839 + 0 mapped (100.00% : N/A)
43119758 + 0 paired in sequencing
21559879 + 0 read1
21559879 + 0 read2
43119758 + 0 properly paired (100.00% : N/A)
43119758 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

However, when I want to make my counts with htseq on this BAM filtered, the count is not done because it seems that my gff file does not correspond to my sam/bam. But when I use htseq on a BAM without picard, so without marking duplicates everything works.

I wonder if it could be due to the RG of my BAM which is not regulatory, or is it something else. I confess I don't understand.

I hope you can help me.

Thank you in advance,

Have a nice day,

Aka

Markduplicates Picard RNASeq samtools • 1.4k views
ADD COMMENT
1
Entering edit mode

but it crashes and doesn't want to filter me and displays it this error:

{input.final_bam_MARKUP} -> {output.final_bam} 

I imagine you wanted

{input.final_bam_MARKUP}  > {output.final_bam} 
ADD REPLY
0
Entering edit mode

Oh! I am sorry I was sure that the error was in the BAM file ... Thank you very much it works!

It's a naive question but -> and > is not the same things?

ADD REPLY
0
Entering edit mode

> is redirection , -> is nothing known to me in bash....

ADD REPLY

Login before adding your answer.

Traffic: 1283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6