how to calculate duplicated reads in single cell RNA 10x genomics data
1
0
Entering edit mode
11 months ago

Hi,

I have single cell RNA seq data and cell ranger count output. How to calculate duplicated reads in each sample.

How this duplicated reads affect the quality or further analysis?

singlecellRNA 10x-genomics • 1.1k views
ADD COMMENT
3
Entering edit mode
11 months ago

10X reads include a UMI (random sequence), which is later used to generate the count matrix. Counting unique UMIs instead of reads avoids counting PCR duplicates because you don't expect the same random sequence to appear more than once by chance (generally speaking).

ADD COMMENT
0
Entering edit mode

thank you for the reply.

https://kb.10xgenomics.com/hc/en-us/articles/115003646912 In this article they have mentioned this method, they have used "samtools flagstat"

samtools flagstat pbmc_1k_v3_possorted_genome_bam.bam
76920923 + 0 in total (QC-passed reads + QC-failed reads)
10319036 + 0 secondary
0 + 0 supplementary
24785461 + 0 duplicates
73840063 + 0 mapped (95.99% : N/A)

...

In this duplicated reads is more than secondary.

i have also run same for my sample :

samtools flagstat sample_alignments.bam
31616795 + 0 in total (QC-passed reads + QC-failed reads)
31616795 + 0 primary
0 + 0 secondary
0 + 0 supplementary
19909293 + 0 duplicates
19909293 + 0 primary duplicates
31596859 + 0 mapped (99.94% : N/A)
31596859 + 0 primary mapped (99.94% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

in my bam you can see duplicated reads is much more. Is this correct way to calculate and what does it means 19909293 reads are duplicated in 31596859 reads mapped?

Is this affect my downstream analysis?

Thank you!!!

ADD REPLY
1
Entering edit mode

Honestly, don't do these sorts of analysis. Single-cell data have tremendous duplication, and this is expected. That's why one uses UMIs, and that is all that you can do about it. Continue with downstream analysis, there is most likely no novel and interesting biology you are going to generate from counting duplicate reads.

ADD REPLY

Login before adding your answer.

Traffic: 2535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6