My paired end data became single end data after mapping
1
0
Entering edit mode
11 months ago
jude • 0

Dear community,

Something weird happened to me, my public dataset is obviously paired-end data (stated in 'metadata' part of ENA database, and there are two seperate fastq files (R1 & R2) and index file (I1) per sequencing run). After mapping them to reference genome by cellranger count, I performed typical scRNA-seq downstream analysis and applied stringtie and featureCounts to compare miRNA expression of each cell types. But the problem is, while trying to identify the strandedness of my data I ran infer_experiment.py which resulted in

infer_experiment.py -r hg38_GENCODE_V42_Basic.bed -i my_bam_file.bam

This is SingleEnd Data

Fraction of reads failed to determine: 0.0670

Fraction of reads explained by "++,--": 0.8406

Fraction of reads explained by "+-,-+": 0.0924

so I double-checked whether it's real by

samtools view -c -f 1 my_bam_file.bam

which yielded 0 while

samtools view -c -f 1 my_bam_file.bam

yielded 97581274, made me to think that aligned bam files (all of the generated bam files through downstream analysis) are actually single-end data. The problem might have arised from cellranger count, but there were no errors with mapping and no warnings at the summary.html output file (and also I made sure to include all the R1 R2 fastq files as an input). I totally can't understand why is this happening... any help will be appreciated.

Best,

cellranger stringtie • 833 views
ADD COMMENT
4
Entering edit mode
11 months ago
ATpoint 85k

Normal and expected. In 10x scRNA-seq R1 is cellular barcode and unique molecular identifiers, and R2 is gene expression, so technically (from a gene expression standpoint) it's indeed single-end.

ADD COMMENT
0
Entering edit mode

So you mean data generated by 10x scRNA-seq is basically paired-end data but technically single-end data... Makes me confused but makes sense, thank you

ADD REPLY
1
Entering edit mode

Yes. Why is it confusing? If you look at 10x libraries (below) you see that the left-hand side of each fragments contains CB and UMI and right-hand side contains cDNA. ence, the R1 that "comes from the left" picks up CB/UMI and R2 "from the right" picks up cDNA. So technically it's paired-end because you use two reads on the same fragment, but there is only one read (R2) for the gene expression so it's single-end in that regard, and the aligner in the end only uses R2.

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 1970 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6