fragments file generation via Sinto from CellRanger output
0
0
Entering edit mode
13 months ago
ntsopoul ▴ 60

Hi,

I am following the instructions for the PASTA package (https://satijalab.org/seurat/articles/pasta_vignette.html). This package uses scRNA-seq data to infer alternative polyadenylation usage from scRNAseq data. It requires among many input files also a fragment file. The authors state the following must be done to obtain this file:

A fragment file (produced from the aligned BAM file, we recommend using the blocks function in sinto.

So I successfully installed Sinto but when I followed the instructions on the Sinto website I got a file with 0 bytes. As I understood you ought to use the possorted_genome_bam.bam file from CellRanger output.

this is my prompt:

sinto fragments -b possorted_genome_bam.bam  -f my.bed --barcode_regex "[^:]*" -p 4

and this is the output:

Function run_fragments called with the following arguments:

bam possorted_genome_bam.bam
fragments   my.bed
min_mapq    30
nproc   4
barcodetag  CB
cells   None
barcode_regex   [^:]*
use_chrom   (?i)^chr
max_distance    5000
min_distance    10
chunksize   500000
shift_plus  4
shift_minus -5
collapse_within False
func    <function run_fragments at 0x7fe9a85dfa60>

Function completed in  0.0 m 0.59 s

I also ran it without specifying barcode_regex but I got the same result.

In case you wonder what the possorted_genome_bam looks like: this is the output for samtools view possorted_genome_bam.bam | head -n 3

SRR17534016.9003356 16 chr1 3014927 0 63M30S 0 0 CCGACTAGGCCATCTTTTGATACATATGCAGCTAGAGACAAGAGCTCCGGGGTACTAGTTAGTCCCATGTACTCTGCGTTGATACCACTGCTT CCCCCCCCCC;C-CCCCCC-CCCCCCCCC-CC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC NH:i:10 HI:i:1 AS:i:62 nM:i:0 ts:i:30 RG:Z:young2F:0:1:unknow_flowcell:0 RE:A:I xf:i:0 CR:Z:CCGTTTATCGCTGTTC CY:Z:CC-CCCCCCCCCCCCC CB:Z:CCGTTCATCGCTGTTC-1 UR:Z:TTCAAGGTTCCA UY:Z:CCCCCCCCCCCC UB:Z:TTCAAGGTTCCA SRR17534016.11370570 16 chr1 3018673 1 92M 0 0 TTTGTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTG CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:young2F:0:1:unknow_flowcell:0 RE:A:I xf:i:0 CR:Z:CCGTAGGGTAGGATAT CY:Z:CCCCCCCCCCCCCCCC CB:Z:CCGTAGGGTAGGATAT-1 UR:Z:TGAATGGCTTCT UY:Z:CCCCCCCCCCCC UB:Z:TGAATGGCTTCT SRR17534018.9851676 16 chr1 3018686 1 63M30S 0 0 AAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCATGTACTCTGCGTTGATACCACTGCTT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC NH:i:3 HI:i:1 AS:i:62 nM:i:0 ts:i:30 RG:Z:young2F:0:1:unknow_flowcell:0 RE:A:I xf:i:0 CR:Z:CACGAATAGACCAGCA CY:Z:CCCCCCCCCCCCCCCC CB:Z:CACGAATAGACCAGCA-1 UR:Z:AGCTCCCGGGAT UY:Z:CCCC;CCCCCCC UB:Z:AGCTCCCGGGAT (biotools2) [nt793@eris2n4 outs]$ samtools view possorted_genome_bam.bam | head -n 1 SRR17534016.9003356 16 chr1 3014927 0 63M30S 0 0 CCGACTAGGCCATCTTTTGATACATATGCAGCTAGAGACAAGAGCTCCGGGGTACTAGTTAGTCCCATGTACTCTGCGTTGATACCACTGCTT CCCCCCCCCC;C-CCCCCC-CCCCCCCCC-CC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC NH:i:10 HI:i:1 AS:i:62 nM:i:0 ts:i:30 RG:Z:young2F:0:1:unknow_flowcell:0 RE:A:I xf:i:0 CR:Z:CCGTTTATCGCTGTTC CY:Z:CC-CCCCCCCCCCCCC CB:Z:CCGTTCATCGCTGTTC-1 UR:Z:TTCAAGGTTCCA UY:Z:CCCCCCCCCCCC UB:Z:TTCAAGGTTCCA

I understand that Sinto was written for scATAC-seq and I guess some more steps must follow in order to prepare it for Sinto. I am thankful for any help I can get.

scATAC alignment scRNA Sinto 10x • 1.4k views
ADD COMMENT
0
Entering edit mode

What sort of aligner are you using? My guess is that the MAPQ in sinto is set too high :(

ADD REPLY
0
Entering edit mode

Hi, I am using the output from celltanger which itself uses STAR. I just want to emphasize that I am dealing with RNA-seq and not atac seq.

ADD REPLY
0
Entering edit mode

I ran

sinto fragments -b possorted_genome_bam.bam  -f my.bed --barcode_regex "[^:]*" -p 4 -m 0

the output is:

Function run_fragments called with the following arguments:

bam possorted_genome_bam.bam
fragments   my.bed
min_mapq    0
nproc   4
barcodetag  CB
cells   None
barcode_regex   [^:]*
use_chrom   (?i)^chr
max_distance    5000
min_distance    10
chunksize   500000
shift_plus  4
shift_minus -5
collapse_within False
func    <function run_fragments at 0x7fa1c01e7a60>

Function completed in  2.0 m 6.64 s

and the file is again 0 bytes. At least it was running longer now :P

ADD REPLY
0
Entering edit mode

I also tried with: --shift_plus 0 --shift_minus 0 but the result is the same

(base) [nt793@dn018 outs]$ sinto fragments -b possorted_genome_bam.bam  -f my.bed --barcode_regex "[^:]*" -p 4 -m 0 --shift_plus 0 --shift_minus 0
Function run_fragments called with the following arguments:

bam possorted_genome_bam.bam
fragments   my.bed
min_mapq    0
nproc   4
barcodetag  CB
cells   None
barcode_regex   [^:]*
use_chrom   (?i)^chr
max_distance    5000
min_distance    10
chunksize   500000
shift_plus  0
shift_minus 0
collapse_within False
func    <function run_fragments at 0x2b3125ad53a0>
ADD REPLY
0
Entering edit mode

Hi, sorry for taking long to respond.

Since it is RNA (and I assume 3' based at that) sinto wont work. sinto requires paired end reads in order to work (usually one of the reads in RNA-seq is used for DNA storage ). Do you really need a fragment file for RNA-seq?

ADD REPLY
0
Entering edit mode

Satija lab is recommending that sinto be used in the link provided by OP in original post though.

What they say is use blocks function in sinto which I am not sure how it relates to sinto fragments command.

ADD REPLY
0
Entering edit mode

The Satija lab just commented on github.

I will try their code and see if it works. In any case I will let you know in this thread.

ADD REPLY

Login before adding your answer.

Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6