Entering edit mode
3.0 years ago
bart
▴
50
Hi,
I'm using stringtie for transcript assembly in galaxy with the output gene abundance estimation file turned on so TPM and FPKM counts are also outputted. What's strange to me is that in the gene abundance estimation file of some samples there are more outputted lines than in others. This means that for an unknown reason the results of some genes are outputted in one sample but not in the other. Is this normal and why does this happen?
I just found this answer by @Kevin Blighe (Stringtie output files):
"The one that was not included has coverage that falls below the threshold. It is virtually not expressed at all. Modify the -C and -c parameter to StringTie: -C <cov_refs.gtf> StringTie outputs a file with the given name with all transcripts in the provided reference file that are fully covered by reads (requires -G). -c <float> Sets the minimum read coverage allowed for the predicted transcripts. A transcript with a lower coverage than this value is not shown in the output. Default: 2.5"
Now my question would be: can I just fill in 0 for the FPKMs and TPMs of all the genes that are not included? I'm trying to use the outputted file for differential expression analysis