Is it possible to use salmon alevin tool for total rna seq samples?
1
0
Entering edit mode
17 months ago
PK ▴ 130

Hi All,

I'm interested to quantify the propositions of spliced vs unspliced transcripts at transcript level. As pointed out this page(https://combine-lab.github.io/alevin-tutorial/2020/alevin-velocity/). But they used single-RNA sequencing data so, it is salmon alevin. But what i did was i followed till the indexing and used that for the salmon qunat. I'm using HPC with 200gb of RAM after 24 hrs till job was going (6 fastq files). I suspect there is some problem. so i'm wondering is it possible to run salmon alevin on total rna seq for the quantification?

Thanks

salmon alignment • 1.5k views
ADD COMMENT
0
Entering edit mode
17 months ago

Instead of using Alevin, I would just add transcripts to the annotation that represetned the unspliced isoforms of the each transcript, and then use the standard salmon.

ADD COMMENT
0
Entering edit mode

if i understood correctly, make annotation for spliced and unspliced separately and index the genome separately followed by salmon qunat. is it correct? Because previously indexed genome has both information together. it is taking unexpectedly longer time which i did not expect from salmon.

ADD REPLY
0
Entering edit mode

Make a transcript annotaiton file that contains both the spliced and the unspliced transcripts and the genome, with only the genome entries marked as decoys.

I'm afraid it probably will take longer, because there will be a lot more sequence included in the annotation (exons only make up a small % of the total length of transcripts).

ADD REPLY
0
Entering edit mode

ohh .. I think that's what exactly i did. but it is still running on my HPC.

salmon index --gencode -t <(cat /my/dir/GRCh38_expanded.fa /my/dir/GRCh38_primary_assembly.fa) -i /my/dir/GRCh38.dna.primary_assembly_expanded.sidx -p 6 -d /my/GRCh38.dna.primary_assembly_chrnames.txt

ADD REPLY
0
Entering edit mode

First, you shouldn't hog 200 gb memory from the HPC. You only need less than one-tenth of that.

Second, increasing number of threads will improve runtime.

Third, it appears you're indexing the genome -- you should be indexing the targets (e.g. each unspliced transcript and each spliced transcript gets their own fasta entry).

ADD REPLY
0
Entering edit mode

I think the command i attached isn't clear. I used grl <- eisaR::getFeatureRanges (intron,spliced)

genome <- Biostrings::readDNAStringSet( "GRCm38.primary_assembly.genome.fa" ) names(genome) <- sapply(strsplit(names(genome), " "), .subset, 1) seqs <- GenomicFeatures::extractTranscriptSeqs( x = genome, transcripts = grl ) Biostrings::writeXStringSet( seqs, filepath = "GRCh38_expanded.fa" )

Then i used GRCh38_expanded.fa and actual GRCh38_primary_assembly.fa. This is what sudbery suggested right? or am i making mistake here. But you are suggesting to create spliced.fa and unpliced.fa then index them separately followed by quantification.

ADD REPLY
1
Entering edit mode

Yes, create them and put spliced.fa and unspliced.fa into one fasta file. You shouldn't be indexing genome.fa at all (you're "mapping" against "targets", not "aligning" to the "genome")

ADD REPLY
0
Entering edit mode

okay. I will do that. Thanks for the answer.

ADD REPLY

Login before adding your answer.

Traffic: 1866 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6