Question

Kallisto in snakePipes instead of salmon

1

Entering edit mode

4.8 years ago

arunprasanna83 ▴ 60

Hi,

Salmon version installed inside snakepipes env seems to be 0.7.x, whereas latest version -s 1.1.0. Following are the errors encountered:

After it starts computing gene-level abundance: too many lines with

[jointLog] [warning] Feature has no GFF ID
[jointLog] [info] There were 0 transcripts mapping to 0 genes
[jointLog] [warning] couldn't find transcritpt named [xxxx] in transcript <-> gene map; returning transcript as it's own gene
[jointLog] [warning] NOTE: We recommend using tximport for aggregating transcript-level salmon abundance...

apparently the warning in last line, not seems to be straight-forward and an issue discussed in https://github.com/COMBINE-lab/salmon/issues/198 and https://github.com/COMBINE-lab/salmon/issues/98

Not sure about what the solution is !.

@ATpoint Please find the snippets below with 1 example:

My original gtf file:

CC7scaffold1    AUGUSTUS    exon    23573   23678   .   -   .   transcript_id "AIPCC7_15333.t1"; gene_id "AIPCC7_15333"; gene_name "AIPCC7_15333.t1";
CC7scaffold1    AUGUSTUS    exon    24472   24635   .   -   .   transcript_id "AIPCC7_15333.t1"; gene_id "AIPCC7_15333"; gene_name "AIPCC7_15333.t1";

genes.filtered.gtf

CC7scaffold1    stdin   exon    23573   23678   .   -   .   gene_id "AIPCC7_15333"; transcript_id "AIPCC7_15333.t1"; exon_number "1"; exon_id "AIPCC7_15333.t1.1"; gene_name "AIPCC7_15333.t1";
CC7scaffold1    stdin   CDS 23576   23678   .   -   1   gene_id "AIPCC7_15333"; transcript_id "AIPCC7_15333.t1"; exon_number "1"; exon_id "AIPCC7_15333.t1.1"; gene_name "AIPCC7_15333.t1";
CC7scaffold1    stdin   exon    24472   24635   .   -   .   gene_id "AIPCC7_15333"; transcript_id "AIPCC7_15333.t1"; exon_number "2"; exon_id "AIPCC7_15333.t1.2"; gene_name "AIPCC7_15333.t1";

genes.filtered.fa

AIPCC7_15333.t1 ATGGGAAACGGTTCGTGGATCGACCAATGCACCAGTCTTGGATCTAAAGGCTCGAACTTGCTTCTGATGGCAA.............................................................................................................................................................................................................

gene.filtered.t2g

AIPCC7_15333.t1 AIPCC7_15333 AIPCC7_15333.t1

Thanks in advance.

salmon snakePipes • 1.5k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 4.8 years ago by arunprasanna83 ▴ 60

Ram · Answer 1 · 2020-02-09

7

Entering edit mode

4.8 years ago

Devon Ryan 104k

I'm not going to put kallisto in snakePipes because I don't want to deal with Lior (the other authors of the tool seem fine). The Salmon authors have always been amazingly responsive to questions and bug reports, which is among the reasons we use it.

The most recent snakePipes release uses salmon 0.13.1, we'll update that for the next release since the newer versions have some very nice new features. In general, pipelines should be rather slow to update software versions.

The warning in your message suggests that the input GTF file had no gene ID for a transcript. That happens, some GTF files are malformed.

ADD COMMENT • link 4.8 years ago by Devon Ryan 104k

1

Entering edit mode

Wow. Your paper on snakePipes states "snakePipes provides a set of best-practices workflows". Yet it seems you're not interested at all in providing users best-practice workflows, rather you're using your tool to exercise a personal vendetta against me. Your excuse rings hollow because for starters, incorporating kallisto in snakePipes wouldn't require any interaction with me at all. Too bad for your users.

ADD REPLY • link updated 4.8 years ago by Ram 44k • written 4.8 years ago by Lior Pachter ▴ 700

0

Entering edit mode

Hi @Devon Ryan please take a look at my edit. I have added the snippets of original gtf as well as annotation folder contents for your reference. Another observation is that, DE analysis fails not only if sample names have numbers but also special characters. For instance one of my samples had a name "C23-32L_R1.fastq.gz". Sleuth_salmon failed with a log "unable to find file or directory: /path/to/salmon/C23.32L.quant.sf. "Then I replaced the '-' with 't', after which it worked. Nevertheless, the GTF error still remains the same. Kindly let me know the preferable format of GTF.

ADD REPLY • link 4.8 years ago by arunprasanna83 ▴ 60

0

Entering edit mode

Ah, yeah R really doesn't like some characters in column names, so - ends up getting converted to .. I thought we had a warning about that printed to the screen, but I should double check and probably make it an error since there's so much sent to the screen that it'd be hard to notice.

I've reworked the GTF handling in the next release and will double check how this particular step is working to ensure this issue goes away.

ADD REPLY • link 4.8 years ago by Devon Ryan 104k