I am trying to create the salmon index for GRCh38 using Gencode v35. When I did the quantification, even though I added GTF file, "quant.genes.sf is only showing the Transcript id's not gene id's. Can you please tell me how to solve this issue? Is there a way to create a GTF based salmon index file for GRCh38? I am using salmon version 0.9.1.
salmon index -t /salmon/GRCh38/gencode.v35.pc_transcripts.fa -i /salmon/GRCh38/salmon_index --type quasi -k 31 --gencode
salmon quant -i /salmon/GRCh38/salmon_index -l A -1 ${FASTQ1} -2 ${FASTQ2} -o transcripts_quant -g ${GTF} --seqBias --validateMappings --useVBOpt --numBootstraps 100
Thanks Parvathi.
Thank you for the reply.
1) Sure, I will upgrade the tool, I was using the old version as it was already available in the HPC cluster.
2) I will use the full transcript fasta. I didn't completely understand the need for using decoy? If I am just looking for gene level and transcript level expression, is there any need to use decoy? Also, which GTF file, would be better to use from Gencode? gencode.v35.annotation.gtf, gencode.v35.basic.annotation.gtf or gencode.v35.primary_assemblyannotation.gtf When we make the Salmon index, is there a way to create a GTF based index? I read somewhere that for Kallisto, there is an option to create a GTF based index, but didn't see any tutorial for Salmon. Will using decoy help in this case?
3) I will use tximport.
Thanks Parvathi
2a) The decoy strategy aims to remove false-positive alignments/quantifications. The idea when e.g. using the whole genome as decoy is that if a given read better matches a sequence in the genome rather than a transcriptome then it is not counted in the transcriptome. This can account for genomic DNA contamination and random background transcription. It might be beneficial in some cases (check the recent salmon papers) but it is not strictly necessary. The main findings will probably be similar with and without decoy. It is a nice feature, I personally simply used the entire genome as decoy by adding the fasta file to the transcriptome with
cat
as described in the manual, but again, this is optional. If you find it too cumbersome then skip it.2b) You only need the transcriptome for the indexing, not the GTF. I am not familiar with kallisto, cannot comment on it. In salmon you index the transcriptome and then later use e.g.
tximport
to summarize the transcript level counts to the gene level in case you want a gene level analysis e.g. with DESeq2 or edgeR. If you want transcript level differential analysis check e.g. theswith
method from the fishpond paper.I understood. Thank you!
For decoys see : C: How does salmon deal with decoy?