Dear User,
Does anyone have some suggestion any script or any pipeline to calculate LTR insertion time. I think codeml or paml will not work here. I have already waste my so much time. please help me.
Thank You
Dear User,
Does anyone have some suggestion any script or any pipeline to calculate LTR insertion time. I think codeml or paml will not work here. I have already waste my so much time. please help me.
Thank You
The LTRpred pipeline now includes a rough dating of insertion time when de novo annotating functional and potentially mobile LTR retrotransposons.
In case you don't wish to de novo annotate an entire genome, but want to work with existing annotations, the function LTRpred::ltr_age_estimation()
may be useful for this purpose.
In general, an example LTRpred run and the corresponding output looks as follows:
# de novo functional annotation of LTR retrotransposons in the Human Y chromosome
LTRpred::LTRpred(genome.file = system.file("Hsapiens_ChrY.fa", package = "LTRpred"))
The output table then includes the following information:
Observations: 21
Variables: 92
$ species <chr> "Hsapiens_ChrY", "Hsapien…
$ ID <chr> "Hsapiens_ChrY_LTR_retrot…
$ dfam_target_name <chr> NA, NA, NA, NA, NA, NA, N…
$ ltr_similarity <dbl> 80.73, 89.85, 79.71, 83.2…
$ ltr_age_mya <dbl> 0.7936246, 0.2831139, 0.7…
$ similarity <chr> "(80,82]", "(88,90]", "(7…
$ protein_domain <chr> "RVT_1", "RVT_1", NA, NA,…
$ orfs <int> 1, 1, 0, 0, 0, 0, 0, 1, 0…
$ chromosome <chr> "NC000024.10Homosa", "NC0…
$ start <int> 3143582, 3275798, 3313536…
$ end <int> 3162877, 3299928, 3318551…
$ strand <chr> "-", "-", "+", "+", "-", …
$ width <int> 19296, 24131, 5016, 12952…
$ annotation <chr> "LTR_retrotransposon", "L…
$ pred_tool <chr> "LTRpred", "LTRpred", "LT…
$ frame <chr> ".", ".", ".", ".", ".", …
$ score <chr> ".", ".", ".", ".", ".", …
$ lLTR_start <int> 3143582, 3275798, 3313536…
$ lLTR_end <int> 3143687, 3276408, 3313665…
$ lLTR_length <int> 106, 611, 130, 126, 218, …
$ rLTR_start <int> 3162769, 3299338, 3318414…
$ rLTR_end <int> 3162877, 3299928, 3318551…
$ rLTR_length <int> 109, 591, 138, 137, 219, …
$ lTSD_start <int> 3143578, 3275794, 3313532…
$ lTSD_end <int> 3143581, 3275797, 3313535…
$ lTSD_motif <chr> "acag", "ttgt", "ttag", "…
$ rTSD_start <int> 3162878, 3299929, 3318552…
$ rTSD_end <int> 3162881, 3299932, 3318555…
$ rTSD_motif <chr> "acag", "ttgt", "ttag", "…
$ PPT_start <int> NA, NA, NA, NA, NA, 34660…
$ PPT_end <int> NA, NA, NA, NA, NA, 34660…
$ PPT_motif <chr> NA, NA, NA, NA, NA, "agag…
$ PPT_strand <chr> NA, NA, NA, NA, NA, "+", …
$ PPT_offset <int> NA, NA, NA, NA, NA, 23, N…
$ PBS_start <int> NA, NA, 3313667, 3372512,…
$ PBS_end <int> NA, NA, 3313677, 3372522,…
$ PBS_strand <chr> NA, NA, "+", "+", "-", "+…
$ tRNA <chr> NA, NA, "Homo_sapiens_tRN…
$ tRNA_motif <chr> NA, NA, "aattagctgga", "c…
$ PBS_offset <int> NA, NA, 1, 3, 0, 5, 2, 5,…
$ tRNA_offset <int> NA, NA, 1, 0, 2, 5, 1, 5,…
$ `PBS/tRNA_edist` <int> NA, NA, 1, 1, 1, 1, 1, 1,…
$ orf.id <chr> "NC000024.10Homosa_314358…
$ repeat_region_length <int> 19304, 24139, 5024, 12960…
$ PPT_length <int> NA, NA, NA, NA, NA, 27, N…
$ PBS_length <int> NA, NA, 11, 11, 11, 11, 1…
$ dfam_acc <chr> NA, NA, NA, NA, NA, NA, N…
$ dfam_bits <dbl> NA, NA, NA, NA, NA, NA, N…
$ dfam_e_value <dbl> NA, NA, NA, NA, NA, NA, N…
$ dfam_bias <dbl> NA, NA, NA, NA, NA, NA, N…
$ `dfam_hmm-st` <dbl> NA, NA, NA, NA, NA, NA, N…
$ `dfam_hmm-en` <dbl> NA, NA, NA, NA, NA, NA, N…
$ dfam_strand <chr> NA, NA, NA, NA, NA, NA, N…
$ `dfam_ali-st` <dbl> NA, NA, NA, NA, NA, NA, N…
$ `dfam_ali-en` <dbl> NA, NA, NA, NA, NA, NA, N…
$ `dfam_env-st` <dbl> NA, NA, NA, NA, NA, NA, N…
$ `dfam_env-en` <dbl> NA, NA, NA, NA, NA, NA, N…
$ dfam_modlen <dbl> NA, NA, NA, NA, NA, NA, N…
$ dfam_target_description <chr> NA, NA, NA, NA, NA, NA, N…
$ Clust_Cluster <chr> NA, NA, NA, NA, NA, NA, N…
$ Clust_Target <chr> NA, NA, NA, NA, NA, NA, N…
$ Clust_Perc_Ident <dbl> NA, NA, NA, NA, NA, NA, N…
$ Clust_cn <int> NA, NA, NA, NA, NA, NA, N…
As you can see, the column ltr_age_mya stores the roughly estimated insertion time in million years.
I hope this helps?
Did you mean age estimation of an extant LTR retrotransposon (i.e. time passed since insertion)? If you are interested in a method based on 5'-3' LTR homology, I think you can find a function calculating that from this yet unplublished R tool from my collaborator: https://github.com/HajkD/LTRpred
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You will waste more time unless you explain your problem in detail. What kind of data do you have? What have you tried so far?
Have a look at this tutorial: Ageing LTR insertions. Does it help?
there is some additional info here : make fasta sequence which is multiple of three ;)
interesting link btw @h.mon
I have already trie and getting error
That's a warning, not an error. It's generated by the pandas python module.
My Pipeline was 1) Pair wise alignment of LTR sequence using muscle 2) Now i have to determine Ks value for each pairwise alignment file.
Is the error you reported above related to the python script in the last part of the pipeline? more specifically this one?
I would also be tempted to go for Ks estimations (lot's of soft can do this) but this is probably not the best way forward as Ks is actually meant for protein sequence comparisons. As there is no selection pressure on the protein (there is obviously none for LTRs) the basic assumption for applying Ks is of course not valid. I would thus go for the 'difference' calculations described in the github page
The first thing you should understand is that SettingWithCopyWarning is a warning, and not an error. The real problem behind the warning is that it is generally difficult to predict whether a view or a copy is returned. In most cases, the warning was raised because you have chained two indexing operations together. The SettingWithCopyWarning was created to flag "chained assignment" operations. This is made easier to spot because you might be used [] (square brackets) twice, but the same would be true if you used other access methods such as .loc[] , .iloc[] and so on.
Moreover, you can change the behaviour of SettingWithCopyWarning warning using pd.options.mode.chained_assignment with three option "None/raise"/"warn".
One recently published paper estimate the insertion time by "DNA divergence between the sequence was estimated with the baseml program from PAML ver. 4.8 (Yang 2007) using the Kimura-2-parameter base substitution model" Since i am new in bioinfo specially in evolution. So i am not able to understand that, How he used baseml output for calculating LTR insertion time.
Thanks For your time guys.