Very low RNA splicing rate for pulmonary AT2 cells
1
0
Entering edit mode
9 months ago
e.r.zakiev ▴ 230

I observe a very low mRNA splicing rate in AT2 cells (~6%).

enter image description here

I followed the very nice tutorial on RNA velocity analysis with scVelo by Sam Morabito and up until this point it went smoothly.

Important to note that

  • This is the first time I'm trying the RNA velocity, so I might have dun goofed
  • Multiple references in reputable journals discover RNA velocity in pulmonary tissues, but they never mention the splicing rate they observed
  • I didn't use a masked gtf, because I had the counts from the CellRanger and the bam files already aligned to the Ensembl transcriptome/genome and it is unfeasible to re-align it to the UCSC genome which does provide a masked genome

But where do I even start digging?

Maybe it's because I have the loom files generated using the possorted_genome_bam.bam files which contain all the cells, but I do a lot of dead cell filtering in preparation of the sparse count matrix and the UMAP embedding from the classic scRNAseq counts using Seurat?

RNA-velocity scVelo scRNA-seq splicing • 1.1k views
ADD COMMENT
1
Entering edit mode

Can you add some real numbers? Just saying 6% gives no indication of how many genes you have been able to detect overall. Amount of reads that aligned etc.

ADD REPLY
0
Entering edit mode

Hello GenoMax and thank you for your engagement! Where should I look this up?

adata

# AnnData object with n_obs × n_vars = 27639 × 19
#    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'cNMF_signature_k15_1', 'cNMF_signature_k15_2', 'cNMF_signature_k15_3', 'cNMF_signature_k15_4', 'cNMF_signature_k15_5', 'cNMF_signature_k15_6', 'cNMF_signature_k15_7', 'cNMF_signature_k15_8', 'cNMF_signature_k15_9', 'cNMF_signature_k15_10', 'cNMF_signature_k15_11', 'cNMF_signature_k15_12', 'cNMF_signature_k15_13', 'cNMF_signature_k15_14', 'cNMF_signature_k15_15', 'cNMF_signature_k8_1', 'cNMF_signature_k8_2', 'cNMF_signature_k8_3', 'cNMF_signature_k8_4', 'cNMF_signature_k8_5', 'cNMF_signature_k8_6', 'cNMF_signature_k8_7', 'cNMF_signature_k8_8', 'Schiebinger_MEF.identity', 'Schiebinger_Pluripotency', 'Schiebinger_Proliferation', 'Schiebinger_ER.stress', 'Schiebinger_Epithelial.identity', 'Schiebinger_ECM.rearrangement', 'Schiebinger_Apoptosis', 'Schiebinger_Senescence', 'Schiebinger_Neural.identity', 'Schiebinger_Trophoblast.identity', 'Schiebinger_X.reactivation', 'Schiebinger_XEN', 'Schiebinger_Trophoblast.progenitors', 'Schiebinger_Spiral.Artery.Trophpblast.Giant.Cells', 'Schiebinger_Spongiotrophoblasts', 'Schiebinger_Oligodendrocyte.precursor.cells.(OPC)', 'Schiebinger_Astrocytes', 'Schiebinger_Cortical.Neurons', 'Schiebinger_RadialGlia-Id3', 'Schiebinger_RadialGlia-Gdf10', 'Schiebinger_RadialGlia-Neurog2', 'Schiebinger_Long-term.MEFs', 'Schiebinger_Embryonic.mesenchyme', 'Schiebinger_Cxcl12.co-expressed', 'Schiebinger_Ifitm1.co-expressed', 'Schiebinger_Matn4.co-expressed', 'Schiebinger_2c', 'PanglaoDB_Airwayepithelialcells_Endoderm', 'PanglaoDB_Airwaygobletcells_Mesoderm', 'PanglaoDB_Alveolarmacrophages_Mesoderm', 'PanglaoDB_Ciliatedcells_Endoderm', 'PanglaoDB_Claracells_Endoderm', 'PanglaoDB_Ionocytes_Mesoderm', 'PanglaoDB_PulmonaryalveolartypeIcells_Endoderm', 'PanglaoDB_PulmonaryalveolartypeIIcells_Endoderm', 'CellsPositiveFor_cNMFsignature_k8_1', 'CellsPositiveFor_cNMFsignature_k8_2', 'CellsPositiveFor_cNMFsignature_k8_3', 'CellsPositiveFor_cNMFsignature_k8_4', 'CellsPositiveFor_cNMFsignature_k8_5', 'CellsPositiveFor_cNMFsignature_k8_6', 'CellsPositiveFor_cNMFsignature_k8_7', 'CellsPositiveFor_cNMFsignature_k8_8', 'CellsPositiveFor_cNMFsignature_k15_1', 'CellsPositiveFor_cNMFsignature_k15_2', 'CellsPositiveFor_cNMFsignature_k15_3', 'CellsPositiveFor_cNMFsignature_k15_4', 'CellsPositiveFor_cNMFsignature_k15_5', 'CellsPositiveFor_cNMFsignature_k15_6', 'CellsPositiveFor_cNMFsignature_k15_7', 'CellsPositiveFor_cNMFsignature_k15_8', 'CellsPositiveFor_cNMFsignature_k15_9', 'CellsPositiveFor_cNMFsignature_k15_10', 'CellsPositiveFor_cNMFsignature_k15_11', 'CellsPositiveFor_cNMFsignature_k15_12', 'CellsPositiveFor_cNMFsignature_k15_13', 'CellsPositiveFor_cNMFsignature_k15_14', 'CellsPositiveFor_cNMFsignature_k15_15', 'Clusterk8_maxscore', 'Clusterk15_maxscore', 'barcode', 'UMAP_1', 'UMAP_2', 'DIFFMAP_1', 'DIFFMAP_2', 'sample_batch', 'batch', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts'
#    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
#    uns: 'orig.ident_colors', 'log1p', 'neighbors'
#    obsm: 'X_pca', 'X_umap'
#    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', 'Ms', 'Mu'
#    obsp: 'distances', 'connectivities'
adata.layers['spliced']

# <27639x19 sparse matrix of type '<class 'numpy.float32'>'
#   with 161 stored elements in Compressed Sparse Row format>
adata.layers['unspliced']

# <27639x19 sparse matrix of type '<class 'numpy.float32'>'
#   with 11156 stored elements in Compressed Sparse Row format>

Here is the concatenated .loom data object that was spewed out but the velocyto run10x command:

ldata

# AnnData object with n_obs × n_vars = 59095 × 36601
#    obs: 'batch', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch'
#    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
#   layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'
ldata.layers['spliced']

# <59095x36601 sparse matrix of type '<class 'numpy.uint16'>'
#   with 4198501 stored elements in Compressed Sparse Row format>
ldata.layers['unspliced']

# <59095x36601 sparse matrix of type '<class 'numpy.uint16'>'
#   with 58874869 stored elements in Compressed Sparse Row format>
ldata.layers['matrix']

# <59095x36601 sparse matrix of type '<class 'numpy.float32'>'
#   with 62767930 stored elements in Compressed Sparse Row format>
ADD REPLY
1
Entering edit mode

Someone knowledgeable about this should be along to help. I asked you to add that info in anticipation since they would want to know some numbers instead of just a % value.

ADD REPLY
1
Entering edit mode

Is it possible that this is snRNA-seq data?

ADD REPLY
0
Entering edit mode

Hmm. Shouldn't be, it's supposed to be a classical scRNAseq Single Cell 3' v3 10x assay with multiplexation of several samples in one plate. Encapsulation was performed for each sample separately.

ADD REPLY
1
Entering edit mode
8 months ago
e.r.zakiev ▴ 230

Hello people sorry I am dumb the problem lied in the fact that I had mouse cells but my .gtf transcriptome file was for the human. Now with the proper reference I have ~80% splicing rate, as expected enter image description here

ADD COMMENT
0
Entering edit mode

For anyone it might be useful: a colleague of mine, who was not dumb to use the wrong annotation gtf as I did, also observed an important issue of difference in the chromosome annotation in gtf vs bam files. It is accompanied by the velocyto's warning:

WARNING - The .bam file refers to a chromosome ‘M+' not present in the annotation (.gtf) file

The gtf contained "chrM" for the mitochondrial chromosome while in the bam files it was denoted as "chrMT". After sed -ing the "chrM" into "chrMT" in the reference gtf file, the splicing rate has drastically improved for him (from ~15% to ~80%).

ADD REPLY

Login before adding your answer.

Traffic: 3765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6