Question

how to calculate duplicated reads in single cell RNA 10x genomics data

0

Entering edit mode

16 months ago

archanaverma433 ▴ 10

Hi,

I have single cell RNA seq data and cell ranger count output. How to calculate duplicated reads in each sample.

How this duplicated reads affect the quality or further analysis?

singlecellRNA 10x-genomics • 1.6k views

ADD COMMENT • link updated 16 months ago by GenoMax 150k • written 16 months ago by archanaverma433 ▴ 10

GenoMax · Answer 1 · 2023-12-04

3

Entering edit mode

16 months ago

rpolicastro 13k

10X reads include a UMI (random sequence), which is later used to generate the count matrix. Counting unique UMIs instead of reads avoids counting PCR duplicates because you don't expect the same random sequence to appear more than once by chance (generally speaking).

ADD COMMENT • link 16 months ago by rpolicastro 13k

0

Entering edit mode

thank you for the reply.

https://kb.10xgenomics.com/hc/en-us/articles/115003646912 In this article they have mentioned this method, they have used "samtools flagstat"

samtools flagstat pbmc_1k_v3_possorted_genome_bam.bam
76920923 + 0 in total (QC-passed reads + QC-failed reads)
10319036 + 0 secondary
0 + 0 supplementary
24785461 + 0 duplicates
73840063 + 0 mapped (95.99% : N/A)

...

In this duplicated reads is more than secondary.

i have also run same for my sample :

samtools flagstat sample_alignments.bam
31616795 + 0 in total (QC-passed reads + QC-failed reads)
31616795 + 0 primary
0 + 0 secondary
0 + 0 supplementary
19909293 + 0 duplicates
19909293 + 0 primary duplicates
31596859 + 0 mapped (99.94% : N/A)
31596859 + 0 primary mapped (99.94% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

in my bam you can see duplicated reads is much more. Is this correct way to calculate and what does it means 19909293 reads are duplicated in 31596859 reads mapped?

Is this affect my downstream analysis?

Thank you!!!

ADD REPLY • link updated 16 months ago by GenoMax 150k • written 16 months ago by archanaverma433 ▴ 10

1

Entering edit mode

Honestly, don't do these sorts of analysis. Single-cell data have tremendous duplication, and this is expected. That's why one uses UMIs, and that is all that you can do about it. Continue with downstream analysis, there is most likely no novel and interesting biology you are going to generate from counting duplicate reads.

ADD REPLY • link 16 months ago by ATpoint 87k