Entering edit mode
6.4 years ago
paraskevopou
▴
20
Hi all!! I would like to estimate dN/dS ratio per each of my genes seperately. I annotated my SNPs with vcfannotator and I ended up with a tab delimited .txt file like the following (complete table has 50000 rows). "SYN" stands for synonymous and "NSY" for non-synonymous substitutions. I don't have any alignment between the two different populations, so PAML is not an option for me. Is there a program or a custome script that estimates dN/dS ratio from this kind of data? Any help would be very much apreciated. Thank you
TRINITY_DN10000_c0_g1.p1 NSY
TRINITY_DN10000_c0_g1.p1 SYN
TRINITY_DN10000_c0_g1.p1 NSY
TRINITY_DN10000_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 NSY
TRINITY_DN10001_c0_g1.p1 NSY
TRINITY_DN10001_c0_g1.p1 NSY
TRINITY_DN10001_c0_g1.p1 NSY
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 NSY
TRINITY_DN10001_c0_g1.p1 NSY
TRINITY_DN10001_c0_g1.p1 NSY
TRINITY_DN10001_c0_g1.p1 NSY
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c0_g1.p1 SYN
TRINITY_DN10001_c1_g1.p1 SYN
TRINITY_DN10001_c1_g1.p1 SYN
TRINITY_DN10001_c1_g1.p1 SYN
TRINITY_DN10001_c1_g1.p1 SYN
TRINITY_DN10001_c1_g1.p1 NSY
TRINITY_DN10001_c1_g1.p1 SYN
TRINITY_DN10001_c1_g1.p1 SYN
TRINITY_DN10001_c1_g1.p1 NSY
TRINITY_DN10001_c1_g1.p1 SYN
TRINITY_DN10002_c0_g1.p1 SYN
TRINITY_DN10002_c0_g1.p1 NSY
TRINITY_DN10002_c0_g1.p1 NSY
TRINITY_DN10005_c0_g1.p1 SYN
TRINITY_DN10005_c0_g1.p1 NSY
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 NSY
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 NSY
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 SYN
TRINITY_DN10006_c0_g1.p1 NSY
TRINITY_DN10007_c6_g1.p1 NSY
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10008_c0_g1.p1 NSY
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10008_c0_g1.p1 SYN
TRINITY_DN10009_c0_g1.p1 SYN
I am a little confused by the output of the table as it doesn't seem to indicate the position of the NSY and SYN SNPs? You could potentially generate an alternate reference sequence with BCFtools consensus using the original reference for calling SNPs and the VCF file of variants. You would then need to align the reference and alternative references.
sorry maybe the information was not precise. The real output is like this with the total length of the cds, the position of the SNP in the cds and the condon position of the SNP. I would like to avoid making alignments between reference and alternative and run PAML. Is there another way to estimate dN/dS per gene "TRINITY_DN10000_c0_g1.p1" with this kind of information?
example of one entry of the output file
TRINITY_DN10000_c0_g1.p1 CDS ORF type:complete len:316 (+) score=92.01 GLCM_MOUSE|50.804|7.06e-99 trans_orient:+ loc_in_cds:232 codon_pos:1 codon:Gaa-Aaa Glu-78-Lys (NSY)
Thanks a lot
I am sorry, but I am not aware of any tools that can calculate dN/dS with the input file you describe.
My idea initially was to estimate the number of (SYN) and (NSY) per gene and then calculate the number of synonymous and non-synonymous sites in order to estimate dNdS. But, I realized that this is not that much straight forward. Now I used BCFtools consnsus to create the alternative reference. I created the individual alignments and I want to make pairwise comparisons. Do you think I can use yn00 PAML for that?
See A: Best Practices/Softwares To Calculate Ka/Ks Ratio regarding a comment for using PAML to calculate pairwise dN/dS. I've used KaKsCalculator2 (https://sourceforge.net/projects/kakscalculator2/) for this purpose though.
sorry spamming you again. Do you have any suggestions of which of the available models in KaKs-calculator package should I use? I though at the beginning that YN model should be sufficient but then I popped on a paper saying about all these γ model such as γ-YN and got a bit confused. MA model is the default model but can be applied also for pairwise analysis? Thanks!
I am not sure what to recommend. I've used the NG model when I have had say one hundred thousand different regions to calculate Ka/Ks for pairwise comparisons. Maybe others will chime in. Sorry.