GERP++ (gerpcol) error on a test data
1
1
Entering edit mode
8.3 years ago
kirill1984 ▴ 10

Hi guys,

I'm wondering if there's a tutorial or experienced users of GERP++ as I'm doing something wrong and need your help.

In particular, I can't get GERP++ work even on a test data set

For example, when I run first part of the pipeline as

gerpcol -f ENr111.aln -t ENr111.tree -e rfbat -j human

I get following:

Tree species armadillo not present in alignment and therefore ignored.  
Tree species baboon not present in alignment and therefore ignored.  
Tree species cat not present in alignment and therefore ignored.  
Tree species chimp not present in alignment and therefore ignored.  
Tree species colobus_monkey not present in alignment and therefore ignored.  
Tree species cow not present in alignment and therefore ignored.  
Tree species dog not present in alignment and therefore ignored.  
Tree species dusky_titi not present in alignment and therefore ignored.  
Tree species elephant not present in alignment and therefore ignored.  
Tree species galago not present in alignment and therefore ignored.  
Tree species guinea_pig not present in alignment and therefore ignored.  
Tree species hedgehog not present in alignment and therefore ignored.  
Tree species human not present in alignment and therefore ignored.  
Tree species macaque not present in alignment and therefore ignored.  
Tree species marmoset not present in alignment and therefore ignored.  
Tree species monodelphis not present in alignment and therefore ignored.  
Tree species mouse not present in alignment and therefore ignored.  
Tree species mouse_lemur not present in alignment and therefore ignored.  
Tree species owl_monkey not present in alignment and therefore ignored.  
Tree species rabbit not present in alignment and therefore ignored.  
Tree species rat not present in alignment and therefore ignored.  
Tree species rfbat not present in alignment and therefore ignored.  
Tree species sbbat not present in alignment and therefore ignored.  
Tree species shrew not present in alignment and therefore ignored.  
Tree species st_squirrel not present in alignment and therefore ignored.  
Tree species tenrec not present in alignment and therefore ignored.  
Processed alignment of 0 positions.

It's odd as the alignment has those seqs:

grep ">" ENr111.aln
>human  
>chimp  
>colobus_monkey  
>baboon  
>macaque  
>dusky_titi  
>owl_monkey  
>marmoset  
>mouse_lemur  
>galago  
>rat  
>mouse  
>guinea_pig  
>st_squirrel  
>rabbit  
>cow  
>cat  
>dog  
>sbbat  
>rfbat  
>hedgehog  
>shrew  
>armadillo  
>tenrec  
>elephant  
>monodelphis

Do you have any ideas what is wrong and how to fix it?

Thanks

GERP • 5.1k views
ADD COMMENT
0
Entering edit mode

Heys, I know is a bit out of the topic but what's the structure of the alignment files? I am not able to download the test data and my alignment file is not being accepted by the program. Thanks!

ADD REPLY
2
Entering edit mode
8.2 years ago
robert.young ▴ 20

I just found your question while googling the same error message. In case you're still struggling, I discovered that you need to add the '-a' flag to get gerpcol to read .fa files.

So, if you modify your command to :

gerpcol -a -f ENr111.aln -t ENr111.tree -e rfbat -j

it should run. It did for me at least!

ADD COMMENT
1
Entering edit mode

your message solved it, but now I'm getting this error:

Segmentation fault (core dumped)
ADD REPLY
1
Entering edit mode

or could you tell me what's your reference sequence? Could be that I'm not adding the correct one

ADD REPLY
1
Entering edit mode

Did you by any chance solve this problem? I am getting a similar error:

gpp-gerpcol
Thu 14 Sep 2023 17:27:38 AEST
Neutral rate computed from tree file = 6163.56
Neutral rate after rescaling = 12.3271

Processing /directory/HiC_scaffold_26.fasta, output will be written to /directory/HiC_scaffold_26.fasta.rates
Nucleotide frequencies:  A = 0.262003, C = 0.234787, G = 0.238278, T = 0.264931
Processing alignment of 6867301 positions, maximum neutral rate is 12.3271
Segmentation fault
ADD REPLY
1
Entering edit mode

I solved the problem in the end. I was under the impression that the reference sequence had to be provided in a separate fasta file:

cat alignment.fasta
>Sp1
AAACA
>Sp2
AAAAG
>Sp3
TAAAA

cat ref.fasta
>ref
AAAAA

And I was trying to run it like this:

gerpcol -v -a -f alignment.fasta -t tree.nwk -e ref.fasta

But the correct way is to include the reference sequence in the alignment file:

cat alignment.fasta
>ref
AAAAA
>Sp1
AAACA
>Sp2
AAAAG
>Sp3
TAAAA

And then specify only the NAME of the sequence (you can exclude it with parameter -j):

gerpcol -v -a -f alignment.fasta -t tree.nwk -e ref -j
ADD REPLY
0
Entering edit mode

Hi, I have encountered the same question. But by reading the latest reply and look back at the original question which the command noted -e rfbat, I assume rfbat is set as the ref seq, and it was in the alignment file. This somewhat contradicts the solution you provided (as the problems sits in ref seq NOT being in the alignment file if I understand the principle correctly?)

Anyway, I have set galGal6 as ref seq, it is in the alignmnet file and tree file,

gerpcol -v -a -f msa_fasta/chr16.maf.fa -t  tree.nh -e galGal6

I still get a error message

gpp-gerpcol
Sat Sep 14 13:15:30 CST 2024
Neutral rate computed from tree file = 4.61399
Neutral rate after rescaling = 4.61399

Processing msa_fasta/chr16.maf_rm.fa, output will be written to msa_fasta/chr16.maf_rm.fa.rates
terminate called after throwing an instance of 'BIO::E_InvalidCharacterEx'
Aborted (core dumped)

Help is greatly appreciated!

ADD REPLY
0
Entering edit mode

Not me, at the end I changed software to snpEff! Good luck!

ADD REPLY

Login before adding your answer.

Traffic: 1756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6