Hi, I'm new to scRNA-seq analysis and wanna ask a few things about processing scRNA-seq bam file. While finding paired-reads in scRNA-seq data, I did samtools flagstat to see the stat of bam file. Here's the output of samtools flagstat.
samtools flagstat ./02_cellranger/count_luc_bsd/Vehicle/outs/possorted_genome_bam.bam
Output:
1013_20240619/02_cellranger/count_luc_bsd/Vehicle/outs/possorted_genome_bam.bam
359526689 + 0 in total (QC-passed reads + QC-failed reads)
359526689 + 0 primary
0 + 0 secondary
0 + 0 supplementary
93469059 + 0 duplicates
93469059 + 0 primary duplicates
339612936 + 0 mapped (94.46% : N/A)
339612936 + 0 primary mapped (94.46% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
The output said there is zero pairs in sequencing and there is no read1 and read2, which is weird because the scRNA-seq was done with 150-bp paired-end sequencing and I have both R1 and R2 fastqs. Also, when I did flagstat to the bamfile obtained from bulk RNA-seq, the output said '142669467 + 0 paired in sequencing', which means there are paired reads in the file. I don't think that the differences in methods between bulk and single-cell RNA-seq made the difference in the output of flagsta but I'm not sure the reason why I got this output.
The company I requested scRNA-seq did cellranger mkfastq and I started processing and aligning from cellranger count. Here's the code how I got the position-sorted bam file that I also used to analyze flagstat. I've been already doing downstream analysis using Seurat and there was no problem while doing the job.
cellranger count --sample V-1_F4 --id Vehicle --fastqs /mnt/bigHDD/kjh_mouse_scrna_240620/01_rawreads/V-1/ --transcriptome /mnt/bigHDD/cellranger_customized_reference/
I would really appreciate any help or advice :)
Also to add to the OP - why did you do 2x150? 10X is and other single cell technologies have very clear sequencing guidelines and none of the ones I'm aware of require 2x150bp. You can save a ton of money and time by using the 100 cycle kits.
People don't always have control over the runs their samples go on. If you have to share a run with other paired samples that needs a 2x150, then that's what you get. Cellranger doesn't care.