Finding genes under positive selection in non-model species - Phylotranscriptomics approach ?
1
2
Entering edit mode
3.7 years ago
sunnykevin97 ▴ 990

Hi,

I work with non-model organisms trying to understand the genes under positive selection in non-model fish species.

I de novo assembled the transcripts using TRINITY

Removed redundant transcripts using CDHIT

Using Transdecoder predicted ORF

The longest ORFs were subjected to Orthofinder to know the MSA species alignment and the species tree (raxml-ng).

The species tree and MSA file seems like the orthofinder aligned the orthologs shared among the non-model fish species(9).

I'm interested to know what are the genes under positive selection for this I converted MSA.fa alignment file to a Phylip format and subjected into codeml/PAML package. It not working for my data, it because of large dataset I suppose.

Error from codeml : 386273 nucleotides, not a multiple of 3!%

Some help is needed. Does my approach is it correct ?

phylip file:

    9 386273
    SRR363205 MSPGVELIKM KTEITTAVGF ITRLLRTTGL ISDEQLQHFS ESLETSLAEH
    SRR363207 MSPGVELIKM KTEITTAVGF ITRLLRTTGL ISDEQLQHFS ESLEKSLAEH
    SRR363206 MSPGVELIKM KTEITTAVGF ITRLLRTTGL ISDEQLQHFS ESLEKSLAEH
    SRR363205 MSPGVELIKM KTEITTAVGF ITRLLRTTGL ISDEQLQHFS ESLEKSLAEH
    SRR363202 MSPGVELIKM KTEITTAVGF ITRLLRTTGL ISDEQLQHFS ESLEKSLAEH
    SRR363201 MSPGVELIKM KTEITTAVGF ITRLLRTTGL ISDEQLQHFS ESLEKSLAEH
    SRR363204 MSPGVELIKM KTEITTAVGF ITRLLRTTGL ISDEQLQHFS ESLEKSLAEH
    SRR363203 MSPGVELIKM KTEITTAVGF ITRLLRTTGL ISDEQLQHFS ESLEKSLAEH
    SRR363205 MSPGVELIKM KTEITTAVGF ITRLLRTTGL ISDEQLQHFS ESLEKSLAEH

               YRHHWFPHMP CKGSGYRCIR INHKMDPLIA RASNIIGLSS QQLFQLLPSE
               YRHHWFPHMP CKGSGYRCIR INHKMDPLIA RASNIIGLSS QQLFQLLPSE
               YRHHWFPHMP CKGSGYRCIR INHKMDPLIA RASNIIGLSS QQLFQLLPSE
               YRHHWFPHMP CKGSGYRCIR INHKMDPLIA RASNIIGLSS QQLFQLLPSE
               YRHHWFPHMP CKGSGYRCIR INHKMDPLIA RASNIIGLSS QQLFQLLPSE
               YRHHWFPHMP CKGSGYRCIR INHKMDPLIA RASNIIGLSS QQLFQLLPSE
               YRHHWFPHMP CKGSGYRCIR INHKMDPLIA RASNIIGLSS QQLFQLLPSE
               YRHHWFPHMP CKGSGYRCIR INHKMDPLIA RASNIIGLSS QQLFQLLPSE
               YRHHWFPHMP CKGSGYRCIR INHKMDPLIA RASNIIGLSS QQLFQLLPSE

tree file:

 ((SRR363206:0.012416,SRR363205:0.069747):0.0036785,((SRR363205:0.013234,(SRR363205:0.00518,((SRR363203:0.00817,SRR363201:0.002449):0.000959(SRR363202:0.003255,SRR363204:0.003052):0.001105):0.002049):0.005519):0.005375,SRR363207:0.016243):0.0036785);%  

Suggestions please.

Thanks
Kevin

gene ortholog assembly rna-seq • 1.0k views
ADD COMMENT
2
Entering edit mode
3.7 years ago
pinn ▴ 210

PAML doesn't work with aminoacid sequences as input, it works only with nucleotide sequences in phylip & paml formats.

ADD COMMENT
0
Entering edit mode

I realized lately, I had a MSA alignment file (orthologs aligned) how do I convert in to PAML format for positive selection analysis?

Suggestions please.

ADD REPLY

Login before adding your answer.

Traffic: 2023 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6