Question

Velocyto: Not found cell and umi barcode in entry of the bam file

0

Entering edit mode

3.4 years ago

bs58 ▴ 10

I'm trying to get the fraction of spliced and unspliced genes to after calculate the RNA velocity with velocyto.

When I run this command:

velocyto run -u Gene -o ./Data_RNAv ./data1.bam ./GenomeIndex/gencodev38annotation.gtf

I get the following Error message:

 The bam file does not contain cell and umi barcodes appropriatelly formatted.

This is my workflow so far:

Downloaded the two fastq files using the sratoolkit
Downloaded hg38.fa and the reference .gtf file
Created the genome index using STAR

Like this:

STAR --runMode genomeGenerate  --genomeDir ./GenomeIndex --genomeFastaFiles ./GenomeInde /hg38.fa --sjdbGTFfile ./GenomeIndex/gencodev38annotation.gtf

Aligned the genome using STAR

Like this:

STAR --runThreadN 24 --genomeDir ./GenomeIndex --sjdbGTFfile ./GenomeIndex/gencodev38annotation.gtf --sjdbOverhang 100 --outSAMtype BAM Unsorted --readFilesIn ./data_Day4/SRR9127057_S1_L001_R1_001.fastq ./H9_D4/SRR9127057_S1_L001_R2_001.fastq

Using velocyto.py to writing out a standard loom file: and here is where I get the error saying that the UMI is not found in the bam file

What did I do wrong?

velocyto scRNA-seq UMI RNA • 2.3k views

ADD COMMENT • link 3.4 years ago by bs58 ▴ 10

0

Entering edit mode

Probably the SRA data does not have UMI/Barcode sequence in the header. You can check that information in the fastq header.

ADD REPLY • link 3.4 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

This is the begining of the fastq file that I have

>head ./data_Day4/SRR9127057_S1_L001_R1_001.fastq
@SRR9127057.1 A00291:31:H5W5MDMXX:2:1101:1579:1000 length=25
TGTTACCCNGCTCGTCGTTATGCCG
+SRR9117954.1 A00291:31:H5W5MDMXX:2:1101:1579:1000 length=25
,#FFFFFFF:FF:FFFFFFFFFFFF
@SRR9127057.2 A00291:31:H5W5MDMXX:2:1101:2338:1000 length=25
NGCTGTCCAAGGAAGCTAGTCCACT
+SRR9117954.2 A00291:31:H5W5MDMXX:2:1101:2338:1000 length=25
F#FFFFFFFFFFFFFFFF:FFFFFF
@SRR9127057.3 A00291:31:H5W5MDMXX:2:1101:3007:1000 length=25
TNAACTTTGCGTGGTCTCCTCAAGC

How can I know if it has the UMI/Barcode?

ADD REPLY • link 3.4 years ago by bs58 ▴ 10

score 0 · Answer 1 · 2021-06-16

0

Entering edit mode

3.4 years ago

swbarnes2 14k

From the description, this was a 10xGenomics single cell dataset. The cell barcode and UMI information is in read 2, but STAR doesn't understand that. Either use STARSolo, or cellranger.

ADD COMMENT • link 3.4 years ago by swbarnes2 14k