I ran Transtermhp with default parameters. The results is similar to this
Seqid Source Type Start End Score Strand Phase Attributes
##gff-version 3
##sequence-region NZ_CP027858.1 1 6748591
NZ_CP027858.1 annotation remark 1 6748591 . . . gff-version=3;sequence-region=%28%27NZ_CP027858.1%27%2C 0%2C 6748591%29,%28%27NZ_CP027859.1%27%2C 0%2C 1795495%29;species=https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi%3Fid%3D1901
NZ_CP027858.1 TransTermHP_2.09 terminator 3074 3111 90 + . ID=TERM 1
NZ_CP027858.1 TransTermHP_2.09 terminator 15364 15401 80 + . ID=TERM 2
NZ_CP027858.1 TransTermHP_2.09 terminator 53418 53453 77 + . ID=TERM 3
NZ_CP027858.1 TransTermHP_2.09 terminator 63742 63775 91 + . ID=TERM 4
NZ_CP027858.1 TransTermHP_2.09 terminator 93259 93300 88 + . ID=TERM 5
As you can see all the predictions are in the + strand. Do you have some ideas about this outcome?
the .coords
file that I used has only the next information
gene_ID start end ID
Thus, I cannot figure out how the software detect the strand if the information about this is not provided.
By other hand, when I ran the software using .ptt
instead .coords
, but with a different genome, the results are as expected with the predictions in minus and plus strands.
By the way, with my genome of interest I cannot use a .ptt
file because this is not available to download.
I acknowledge any ideas that you can give me.
There's a section in the
USAGE.txt
called '10. USING TRANSTERM WITHOUT GENOME ANNOTATIONS'. Did you try with that approach to see if that results in a difference in the outcome? That section also talks about how specifically the gene information is used.I do note though that the output you show and the example under section '3. FORMAT OF THE TRANSTERM OUTPUT' of the
USAGE.TXT
don't match and so maybe the documentation is outdated in some ways? It seems to show the strand information before the confidence score. And there should be aloc
in the results that is a letter indicating placement relative the gene. I don't see that in yours. All that is to say that some things are not consistent, and so I hope that approach to run without the annotations works.Thanks for your response.
Yes, currently I am using that approach that you just mentioned because this gives me the location (start-end) of the terminator in the genome and with that information I can carry out the subsequent analyses I need. Which is a little bit intriguing is that the number of terminators predicted slightly increased.
Regarding your second comment, I used a Python script (I downloaded it from GitHub) to format the output of transtermhp in that tabular format. Thus, it uses
GFF3
to create a.coords
file instead of using.ptt
which, to my knowledge, is not longer supported.As described in the
USAGE.txt
, the difference in the number of candidate terminators is a possibility when you don't specify genes because of how it handles the 'background' GC percentage in computing the scores. The 'background' GC percent comes from contrasting genes vs intergenic regions. Hence, the scores get adjusted differently if indeed the GC-percentage of your genes and intergenic regions do indeed differ. That must be case in your genome because it makes a difference.So what you posted as being the results of Transtermhp weren't the direct results? They were reformatted via a script? Probably best to share the full process when posting to these sort of forums so those helping you know what is involved. Also best to link to the resources so that they can be tested or used by others looking to assist or learn. Are there plus and minus strand ones in the output direct from Transtermhp? The script handles the output flawlessly?
By the way, this paper found the results of transtermhp to be inferior to other software.
Thank you very much for the paper. It looks quite interesting.
I will take into account your recommendation about the ‘background' GC percent.
I did not show the original results of transtermhp because they are exactly the same, so I considered it was not necessary. The script only presents them in a more human readable way.
This is the script mentioned.
https://github.com/galaxyproject/tools-iuc/blob/master/tools/transtermhp/transtermhp.py