Question

Is it possible to use salmon alevin tool for total rna seq samples?

0

Entering edit mode

17 months ago

PK ▴ 130

Hi All,

I'm interested to quantify the propositions of spliced vs unspliced transcripts at transcript level. As pointed out this page(https://combine-lab.github.io/alevin-tutorial/2020/alevin-velocity/). But they used single-RNA sequencing data so, it is salmon alevin. But what i did was i followed till the indexing and used that for the salmon qunat. I'm using HPC with 200gb of RAM after 24 hrs till job was going (6 fastq files). I suspect there is some problem. so i'm wondering is it possible to run salmon alevin on total rna seq for the quantification?

Thanks

salmon alignment • 1.5k views

ADD COMMENT • link 17 months ago by PK ▴ 130

score 0 · Answer 1 · 2023-06-19

0

Entering edit mode

17 months ago

i.sudbery 20k

Instead of using Alevin, I would just add transcripts to the annotation that represetned the unspliced isoforms of the each transcript, and then use the standard salmon.

ADD COMMENT • link 17 months ago by i.sudbery 20k

0

Entering edit mode

if i understood correctly, make annotation for spliced and unspliced separately and index the genome separately followed by salmon qunat. is it correct? Because previously indexed genome has both information together. it is taking unexpectedly longer time which i did not expect from salmon.

ADD REPLY • link 17 months ago by PK ▴ 130

0

Entering edit mode

Make a transcript annotaiton file that contains both the spliced and the unspliced transcripts and the genome, with only the genome entries marked as decoys.

I'm afraid it probably will take longer, because there will be a lot more sequence included in the annotation (exons only make up a small % of the total length of transcripts).

ADD REPLY • link 17 months ago by i.sudbery 20k

0

Entering edit mode

ohh .. I think that's what exactly i did. but it is still running on my HPC.

salmon index --gencode -t <(cat /my/dir/GRCh38_expanded.fa /my/dir/GRCh38_primary_assembly.fa) -i /my/dir/GRCh38.dna.primary_assembly_expanded.sidx -p 6 -d /my/GRCh38.dna.primary_assembly_chrnames.txt

ADD REPLY • link 17 months ago by PK ▴ 130

0

Entering edit mode

First, you shouldn't hog 200 gb memory from the HPC. You only need less than one-tenth of that.

Second, increasing number of threads will improve runtime.

Third, it appears you're indexing the genome -- you should be indexing the targets (e.g. each unspliced transcript and each spliced transcript gets their own fasta entry).

ADD REPLY • link 17 months ago by dsull ★ 6.9k

0

Entering edit mode

I think the command i attached isn't clear. I used grl <- eisaR::getFeatureRanges (intron,spliced)

genome <- Biostrings::readDNAStringSet( "GRCm38.primary_assembly.genome.fa" ) names(genome) <- sapply(strsplit(names(genome), " "), .subset, 1) seqs <- GenomicFeatures::extractTranscriptSeqs( x = genome, transcripts = grl ) Biostrings::writeXStringSet( seqs, filepath = "GRCh38_expanded.fa" )

Then i used GRCh38_expanded.fa and actual GRCh38_primary_assembly.fa. This is what sudbery suggested right? or am i making mistake here. But you are suggesting to create spliced.fa and unpliced.fa then index them separately followed by quantification.

ADD REPLY • link 17 months ago by PK ▴ 130

1

Entering edit mode

Yes, create them and put spliced.fa and unspliced.fa into one fasta file. You shouldn't be indexing genome.fa at all (you're "mapping" against "targets", not "aligning" to the "genome")