Question

estimate dN/dS ratio for multiple genes

1

Entering edit mode

6.8 years ago

paraskevopou ▴ 20

Hi all!! I would like to estimate dN/dS ratio per each of my genes seperately. I annotated my SNPs with vcfannotator and I ended up with a tab delimited .txt file like the following (complete table has 50000 rows). "SYN" stands for synonymous and "NSY" for non-synonymous substitutions. I don't have any alignment between the two different populations, so PAML is not an option for me. Is there a program or a custome script that estimates dN/dS ratio from this kind of data? Any help would be very much apreciated. Thank you

TRINITY_DN10000_c0_g1.p1    NSY
TRINITY_DN10000_c0_g1.p1    SYN
TRINITY_DN10000_c0_g1.p1    NSY
TRINITY_DN10000_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    NSY
TRINITY_DN10001_c0_g1.p1    NSY
TRINITY_DN10001_c0_g1.p1    NSY
TRINITY_DN10001_c0_g1.p1    NSY
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    NSY
TRINITY_DN10001_c0_g1.p1    NSY
TRINITY_DN10001_c0_g1.p1    NSY
TRINITY_DN10001_c0_g1.p1    NSY
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c0_g1.p1    SYN
TRINITY_DN10001_c1_g1.p1    SYN
TRINITY_DN10001_c1_g1.p1    SYN
TRINITY_DN10001_c1_g1.p1    SYN
TRINITY_DN10001_c1_g1.p1    SYN
TRINITY_DN10001_c1_g1.p1    NSY
TRINITY_DN10001_c1_g1.p1    SYN
TRINITY_DN10001_c1_g1.p1    SYN
TRINITY_DN10001_c1_g1.p1    NSY
TRINITY_DN10001_c1_g1.p1    SYN
TRINITY_DN10002_c0_g1.p1    SYN
TRINITY_DN10002_c0_g1.p1    NSY
TRINITY_DN10002_c0_g1.p1    NSY
TRINITY_DN10005_c0_g1.p1    SYN
TRINITY_DN10005_c0_g1.p1    NSY
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    NSY
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    NSY
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    SYN
TRINITY_DN10006_c0_g1.p1    NSY
TRINITY_DN10007_c6_g1.p1    NSY
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10008_c0_g1.p1    NSY
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10008_c0_g1.p1    SYN
TRINITY_DN10009_c0_g1.p1    SYN

snp • 2.9k views

ADD COMMENT • link updated 6.7 years ago by Biostar 20 • written 6.8 years ago by paraskevopou ▴ 20

1

Entering edit mode

I am a little confused by the output of the table as it doesn't seem to indicate the position of the NSY and SYN SNPs? You could potentially generate an alternate reference sequence with BCFtools consensus using the original reference for calling SNPs and the VCF file of variants. You would then need to align the reference and alternative references.

ADD REPLY • link 6.8 years ago by jean.elbers ★ 1.7k

0

Entering edit mode

sorry maybe the information was not precise. The real output is like this with the total length of the cds, the position of the SNP in the cds and the condon position of the SNP. I would like to avoid making alignments between reference and alternative and run PAML. Is there another way to estimate dN/dS per gene "TRINITY_DN10000_c0_g1.p1" with this kind of information?

example of one entry of the output file

TRINITY_DN10000_c0_g1.p1 CDS ORF type:complete len:316 (+) score=92.01 GLCM_MOUSE|50.804|7.06e-99 trans_orient:+ loc_in_cds:232 codon_pos:1 codon:Gaa-Aaa Glu-78-Lys (NSY)

Thanks a lot

ADD REPLY • link 6.8 years ago by paraskevopou ▴ 20

0

Entering edit mode

I am sorry, but I am not aware of any tools that can calculate dN/dS with the input file you describe.

ADD REPLY • link 6.8 years ago by jean.elbers ★ 1.7k

0

Entering edit mode

My idea initially was to estimate the number of (SYN) and (NSY) per gene and then calculate the number of synonymous and non-synonymous sites in order to estimate dNdS. But, I realized that this is not that much straight forward. Now I used BCFtools consnsus to create the alternative reference. I created the individual alignments and I want to make pairwise comparisons. Do you think I can use yn00 PAML for that?

ADD REPLY • link 6.8 years ago by paraskevopou ▴ 20

1

Entering edit mode

See A: Best Practices/Softwares To Calculate Ka/Ks Ratio regarding a comment for using PAML to calculate pairwise dN/dS. I've used KaKsCalculator2 (https://sourceforge.net/projects/kakscalculator2/) for this purpose though.

ADD REPLY • link 6.8 years ago by jean.elbers ★ 1.7k

0

Entering edit mode

sorry spamming you again. Do you have any suggestions of which of the available models in KaKs-calculator package should I use? I though at the beginning that YN model should be sufficient but then I popped on a paper saying about all these γ model such as γ-YN and got a bit confused. MA model is the default model but can be applied also for pairwise analysis? Thanks!

ADD REPLY • link 6.8 years ago by paraskevopou ▴ 20

0

Entering edit mode

I am not sure what to recommend. I've used the NG model when I have had say one hundred thousand different regions to calculate Ka/Ks for pairwise comparisons. Maybe others will chime in. Sorry.

ADD REPLY • link 6.8 years ago by jean.elbers ★ 1.7k