I am a new user of StringTie and probably this question is very simple but I still don't get it... I have my sorted bam files (HISAT2 output, genome v19) and here is my StringTie command (v1.3.4):
stringtie hisat2_work/hisat2/alignments.sorted.bam -o stringtie_results/transcripts.gtf -G genes.GRCh37.gtf --rf -A stringtie_results/gene_abund.tab
As a result I have two output files: gene abundances (gene_abund.tab) and transcript annotation file (transcripts.gtf). For example, if I open gene_abund.tab, I will see this line:
Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM
ENSG00000223972 DDX11L1 1 + 11869 14412 0.180934 0.129907 0.341143
But if I search for gene name (and gene id) DDX11L11 in transcripts.gtf I don't see it, it's absent. At the same time, I can find other genes from gene_abund.tab in transcripts.gtf, for example:
line in gene_abund.tab:
ENSG00000227232 WASH7P 1 - 14363 29806 16.906973 12.345821 32.420803
corresponding line in transcripts.gtf:
StringTie transcript 14363 29370 1000 - . gene_id "STRG.2"; transcript_id "STRG.2.2"; reference_id "ENST00000423562"; ref_gene_id "ENSG00000227232"; ref_gene_name "WASH7P"; cov "1.478912"; FPKM "1.061831"; TPM "2.788425";
What can be a problem here, why do I miss some genes from gene_abund.tab in my transcripts.gtf file?
Hello and welcome to biostars,
to show commands you use and file contents you should use the code button (the one with 101 010). This makes your post much more readable.
This time I did it for you.
fin swimmer