Question

Transtermhp predictions are all in "+" strand

0

Entering edit mode

2.8 years ago

Carlos Caicedo ▴ 210

I ran Transtermhp with default parameters. The results is similar to this

Seqid   Source  Type    Start   End Score   Strand  Phase   Attributes
##gff-version 3
##sequence-region NZ_CP027858.1 1 6748591
NZ_CP027858.1   annotation  remark  1   6748591 .   .   .   gff-version=3;sequence-region=%28%27NZ_CP027858.1%27%2C 0%2C 6748591%29,%28%27NZ_CP027859.1%27%2C 0%2C 1795495%29;species=https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi%3Fid%3D1901
NZ_CP027858.1   TransTermHP_2.09    terminator  3074    3111    90  +   .   ID=TERM 1
NZ_CP027858.1   TransTermHP_2.09    terminator  15364   15401   80  +   .   ID=TERM 2
NZ_CP027858.1   TransTermHP_2.09    terminator  53418   53453   77  +   .   ID=TERM 3
NZ_CP027858.1   TransTermHP_2.09    terminator  63742   63775   91  +   .   ID=TERM 4
NZ_CP027858.1   TransTermHP_2.09    terminator  93259   93300   88  +   .   ID=TERM 5

As you can see all the predictions are in the + strand. Do you have some ideas about this outcome? the .coords file that I used has only the next information

gene_ID    start    end    ID

Thus, I cannot figure out how the software detect the strand if the information about this is not provided. By other hand, when I ran the software using .ptt instead .coords, but with a different genome, the results are as expected with the predictions in minus and plus strands. By the way, with my genome of interest I cannot use a .ptt file because this is not available to download.

I acknowledge any ideas that you can give me.

genomics prediction terminator • 1.2k views

ADD COMMENT • link 2.8 years ago by Carlos Caicedo ▴ 210

1

Entering edit mode

There's a section in the USAGE.txt called '10. USING TRANSTERM WITHOUT GENOME ANNOTATIONS'. Did you try with that approach to see if that results in a difference in the outcome? That section also talks about how specifically the gene information is used.

I do note though that the output you show and the example under section '3. FORMAT OF THE TRANSTERM OUTPUT' of the USAGE.TXT don't match and so maybe the documentation is outdated in some ways? It seems to show the strand information before the confidence score. And there should be a loc in the results that is a letter indicating placement relative the gene. I don't see that in yours. All that is to say that some things are not consistent, and so I hope that approach to run without the annotations works.

ADD REPLY • link 2.8 years ago by Wayne ★ 2.1k

0

Entering edit mode

Thanks for your response.

Yes, currently I am using that approach that you just mentioned because this gives me the location (start-end) of the terminator in the genome and with that information I can carry out the subsequent analyses I need. Which is a little bit intriguing is that the number of terminators predicted slightly increased.

Regarding your second comment, I used a Python script (I downloaded it from GitHub) to format the output of transtermhp in that tabular format. Thus, it uses GFF3 to create a .coords file instead of using .ptt which, to my knowledge, is not longer supported.

ADD REPLY • link 2.8 years ago by Carlos Caicedo ▴ 210

0

Entering edit mode

As described in the USAGE.txt, the difference in the number of candidate terminators is a possibility when you don't specify genes because of how it handles the 'background' GC percentage in computing the scores. The 'background' GC percent comes from contrasting genes vs intergenic regions. Hence, the scores get adjusted differently if indeed the GC-percentage of your genes and intergenic regions do indeed differ. That must be case in your genome because it makes a difference.

So what you posted as being the results of Transtermhp weren't the direct results? They were reformatted via a script? Probably best to share the full process when posting to these sort of forums so those helping you know what is involved. Also best to link to the resources so that they can be tested or used by others looking to assist or learn. Are there plus and minus strand ones in the output direct from Transtermhp? The script handles the output flawlessly?

By the way, this paper found the results of transtermhp to be inferior to other software.

ADD REPLY • link 2.8 years ago by Wayne ★ 2.1k

1

Entering edit mode

Thank you very much for the paper. It looks quite interesting.

I will take into account your recommendation about the ‘background' GC percent.

I did not show the original results of transtermhp because they are exactly the same, so I considered it was not necessary. The script only presents them in a more human readable way.

This is the script mentioned.

https://github.com/galaxyproject/tools-iuc/blob/master/tools/transtermhp/transtermhp.py

ADD REPLY • link 2.8 years ago by Carlos Caicedo ▴ 210