Hi,
Salmon version installed inside snakepipes env seems to be 0.7.x, whereas latest version -s 1.1.0. Following are the errors encountered:
After it starts computing gene-level abundance: too many lines with
[jointLog] [warning] Feature has no GFF ID
[jointLog] [info] There were 0 transcripts mapping to 0 genes
[jointLog] [warning] couldn't find transcritpt named [xxxx] in transcript <-> gene map; returning transcript as it's own gene
[jointLog] [warning] NOTE: We recommend using tximport for aggregating transcript-level salmon abundance...
apparently the warning in last line, not seems to be straight-forward and an issue discussed in https://github.com/COMBINE-lab/salmon/issues/198 and https://github.com/COMBINE-lab/salmon/issues/98
Not sure about what the solution is !.
@ATpoint Please find the snippets below with 1 example:
My original gtf file:
CC7scaffold1 AUGUSTUS exon 23573 23678 . - . transcript_id "AIPCC7_15333.t1"; gene_id "AIPCC7_15333"; gene_name "AIPCC7_15333.t1";
CC7scaffold1 AUGUSTUS exon 24472 24635 . - . transcript_id "AIPCC7_15333.t1"; gene_id "AIPCC7_15333"; gene_name "AIPCC7_15333.t1";
genes.filtered.gtf
CC7scaffold1 stdin exon 23573 23678 . - . gene_id "AIPCC7_15333"; transcript_id "AIPCC7_15333.t1"; exon_number "1"; exon_id "AIPCC7_15333.t1.1"; gene_name "AIPCC7_15333.t1";
CC7scaffold1 stdin CDS 23576 23678 . - 1 gene_id "AIPCC7_15333"; transcript_id "AIPCC7_15333.t1"; exon_number "1"; exon_id "AIPCC7_15333.t1.1"; gene_name "AIPCC7_15333.t1";
CC7scaffold1 stdin exon 24472 24635 . - . gene_id "AIPCC7_15333"; transcript_id "AIPCC7_15333.t1"; exon_number "2"; exon_id "AIPCC7_15333.t1.2"; gene_name "AIPCC7_15333.t1";
genes.filtered.fa
AIPCC7_15333.t1 ATGGGAAACGGTTCGTGGATCGACCAATGCACCAGTCTTGGATCTAAAGGCTCGAACTTGCTTCTGATGGCAA.............................................................................................................................................................................................................
gene.filtered.t2g
AIPCC7_15333.t1 AIPCC7_15333 AIPCC7_15333.t1
Thanks in advance.
Wow. Your paper on snakePipes states "snakePipes provides a set of best-practices workflows". Yet it seems you're not interested at all in providing users best-practice workflows, rather you're using your tool to exercise a personal vendetta against me. Your excuse rings hollow because for starters, incorporating kallisto in snakePipes wouldn't require any interaction with me at all. Too bad for your users.
Hi @Devon Ryan please take a look at my edit. I have added the snippets of original gtf as well as annotation folder contents for your reference. Another observation is that, DE analysis fails not only if sample names have numbers but also special characters. For instance one of my samples had a name "C23-32L_R1.fastq.gz". Sleuth_salmon failed with a log "unable to find file or directory: /path/to/salmon/C23.32L.quant.sf. "Then I replaced the '-' with 't', after which it worked. Nevertheless, the GTF error still remains the same. Kindly let me know the preferable format of GTF.
Ah, yeah R really doesn't like some characters in column names, so
-
ends up getting converted to.
. I thought we had a warning about that printed to the screen, but I should double check and probably make it an error since there's so much sent to the screen that it'd be hard to notice.I've reworked the GTF handling in the next release and will double check how this particular step is working to ensure this issue goes away.