Number of PCR duplicates in STAR-generated BAM file have no duplicate before and after fixmate/duplicate removal ?
1
0
Entering edit mode
2.1 years ago
mohsamir2016 ▴ 30

Dear All,

I wonder if any of you could help me solving this puzzle: I used STAR for aligning a paired-end read: the command was :

STAR --runMode alignReads --genomeDir IndexRef/GRCg6a/ --outSAMtype BAM SortedByCoordinate --readFilesIn R0629-S0002_L10AU2_A56593_1_HGFCJDSX2_TCGTCTGA-TCAAGGAC_L003_R1_trimmed.fastq R0629-S0002_L10AU2_A56593_1_HGFCJDSX2_TCGTCTGA-TCAAGGAC_L003_R2_trimmed.fastq --outFileNamePrefix mapped/L10/BAM_L10_GRC6a/L10A2 --runThreadN 16

I did not use the option (--bamRemoveDuplicatesType). The BAM file I obtained did not contain any duplicates as revelaed by using command :

samtools view -c -f 1024 file.BAM 

It was surprising since I expected presence of some duplicates reads, and then I relaized that they might be hidden in the BAM file and thus not readable (Am I right in this ? ), so I went on to use fixmates, after sorting by names, using these command:

  1. sort the BAM file by name (samtools sort -n L10AAligned.sortedByCoord.out.bam -o L10A_sortedbynames.bam)
  2. apply fixmat, mc: samtools fixmate -m L10A_sortedbynames.bam filefixmate.bam
  3. sort it again by coordinates (samtools sort filefixmate.bam -o filefixmatesortedbycordinates.bam)

On the resultant file, I checked the duplicates, it was also 0. I also doubled check it using flagstats, was 0

I am surprised how can I make sure that PCR duplicates are removed after fixmate them? It is also possible that the first BAM file produced from STAR is by default contains no duplicates?

Any comment?

Thanks

RNA-seq STAR • 2.5k views
ADD COMMENT
3
Entering edit mode
2.1 years ago
ATpoint 85k

You have to run a dedicated duplicate marking tools such as samblaster, MarkDuplicates from Picard or MarkDup from samtools to populate the BAM with that flag. STAR does not assign it, nor does fixmate do, so this result is expected and normal.

ADD COMMENT
0
Entering edit mode

Just got a note that you asked this already here: Confused about % of mapped and unmapped reads output from STAR aligner -- and the user there told you the exact same as I did. STAR does not do any flags, so why asking it again? What is unclear? Without UMI you cannot/should not really remove dups in RNA-seq anyway.

ADD REPLY
0
Entering edit mode

Thanks for the answer. There they did nto gave a clear answer like you did here. This was the reason asking it again, but more specifically on duplicates. I think for removing duplicates, almost all the literature that only call variants in RNA seq (not quantifying genes/transcripts) do have a step of duplicate marking and removal before carrying on with the variant calling.

Thanks

ADD REPLY
0
Entering edit mode

May be last question: now I run fixate on a name-sorted BAM file. If I run samtools markdup -r file.BAM, on that file, I will mark them up and remove them. However, I want to mark them and count them in the fixmated BAM file ? is that possible ? this is just to make sure that there were duplicates and provide comparison between before and after removal.

Thanks

ADD REPLY
0
Entering edit mode

markdup -r removes while without -r it does not remove but only marks them. Once marked flagstat will tell you how many there are.

ADD REPLY
0
Entering edit mode

Thanks a lot for clear and direct answer.

ADD REPLY

Login before adding your answer.

Traffic: 1915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6