SIFT ncbi executables setup incorrectly?
1
0
Entering edit mode
10.1 years ago
arronslacey ▴ 320

EDIT: solved. whilst I had properly defined the path to the NCBI executables in SIFT_for_submitting_fasta_seq.csh, I had not properly set the path in seqs_chosen_via_median_info.csh

It worked after I modified this second file. As I mention in comments below, just because the test files work fine, it doesn't mean your SIFT has properly been configured. When you read the output of the test files, all incorrect path/database configurations are ignored because the required database/protein alignment files are provided within the SIFT directory. I find this a bit misleading, but there we go. something to consider.

ORIGINAL QUESTION

I cannot determine what is causing this however.

I am successfully using standalone SIFT. I can run SIFT using the test files provided:

$ csh bin/SIFT_for_submitting_fasta_seq.csh test/lacI.fasta db/uniref.fa test/lacI.subst

tail is lacI.fasta
query is /home/arron/Phd/programs/sift5.2.1/tmp/lacI.fasta.query
/usr/share/ncbi-blast+/bin//bin//psiblast: Command not found.
exiting because stauts not equal to 0
tell me i've entered
info_on_seqs
*** The following sequences have been removed because they  were found to be over 100% identical with your protein query: *** The following sequences have been removed because they  were found to be over 100% identical with your protein query: QUERY has 100 identity
UniRef90_P03023Lacto has 100 identity
UniRef90_A8AKB7Putat has 81 identity
UniRef90_C1M7F8Lacre has 81 identity
UniRef90_D2TK52Lacto has 78 identity
UniRef90_E8XR69Trans has 59 identity
UniRef90_D2C396Trans has 55 identity
UniRef90_A9MQ83Putat has 76 identity
UniRef90_C6DD30Trans has 53 identity
UniRef90_E0SHJ3Trans has 56 identity
UniRef90_D2ZG46Ribos has 54 identity
UniRef90_C6CD23Trans has 51 identity
UniRef90_A4W7D1Trans has 54 identity
UniRef90_E8XW29Trans has 46 identity
UniRef90_A1RAY2Trans has 38 identity
UniRef90_D6DVN0Trans has 53 identity
UniRef90_C4S4I8Lacto has 52 identity
UniRef90_C9XUV2Lacto has 45 identity
UniRef90_E3G6U0Trans has 54 identity
UniRef90_C4UWB0Lacto has 52 identity
UniRef90_C4UNI7Lacto has 48 identity
UniRef90_D4GHV0LacIn has 50 identity
UniRef90_C4TZY6Lacto has 50 identity
UniRef90_D7CXK6Trans has 36 identity
UniRef90_F0KTF2Lacre has 50 identity
UniRef90_E1SIH9HTH-t has 50 identity
UniRef90_D4E812LacIf has 44 identity
UniRef90_D6YQA5HTH-t has 42 identity
UniRef90_D5CE50Sugar has 43 identity
UniRef90_C6CQF2Trans has 43 identity
UniRef90_C4X4N1Trans has 43 identity
UniRef90_A4TIF9Trans has 44 identity
UniRef90_C4T764Lacto has 52 identity
UniRef90_D1RRC7Trans has 41 identity
UniRef90_E6WEY3Trans has 49 identity
UniRef90_D8MUA6Lacto has 51 identity
UniRef90_C9XWJ1Lacto has 45 identity
UniRef90_D5CJ02Lacre has 54 identity
UniRef90_D1RT73Trans has 43 identity
 UniRef90_P03023Lacto, UniRef90_P03023Lacto,.

.
before seg fault?
9
10
13
14
15
16
18
19
20
21
22
23
25
30
34
45
50
53
56
65
76
127
166
179
187
188
197
201
205
218
220
241
247
249
250
252
256
272
274
284
286
288
326
356
357
358
359
360
filename is /home/arron/Phd/programs/sift5.2.1/blimps/docs/default.diri
about to make predictions
not including UniRef90_C4T764Lacto with X at 1
not including UniRef90_C4T764Lacto with X at 2
not including UniRef90_C4T764Lacto with X at 14
done checking all subst
trying to free things here
unalias: rm not found
Output in /home/arron/Phd/programs/sift5.2.1/tmp/lacI.SIFTprediction

and produces a SIFT prediction file as expected.

However, when I try this with one of my own proteins of interest, the SIFT prediction file is not created.

$ csh bin/SIFT_for_submitting_fasta_seq.csh test/NP_000162.2.fasta db/uniref.fa test/glra1.subst 
tail is NP_000162.2.fasta
query is /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.fasta.query
/usr/share/ncbi-blast+/bin//bin//psiblast: Command not found.
exiting because stauts not equal to 0
tell me i've entered
info_on_seqs
cannot open file /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.alignedfasta 
Output in /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.SIFTprediction

The clue here is in:

cannot open file /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.alignedfasta

where it appears an alignment via psiblast could not be made. I cannot find this file, but it should be produced.

how could this be??

For reference I include my

--1) test files (fasta and substitution file)

lacI.fasta

>gi|2506562|sp|P03023|LACI_ECOLI   LACTOSE OPERON REPRESSOR
MKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQ
SLLIGVATSSLALHAPSQIVAAIKSRADQLGASVVVSMVERSGVEACKAAVHNLLAQRVS
GLIINYPLDDQDAIAVEAACTNVPALFLDVSDQTPINSIIFSHEDGTRLGVEHLVALGHQ
QIALLAGPLSSVSARLRLAGWHKYLTRNQIQPIAEREGDWSAMSGFQQTMQMLNEGIVPT
AMLVANDQMALGAMRAITESGLRVGADISVVGYDDTEDSSCYIPPLTTIKQDFRLLGQTS
VDRLLQLSQGQAVKGNQLLPVSLVKRKTTLAPNTQTASPRALADSLMQLARQVSRLESGQ

lacI.subst

    K2S  
    P3M

--2) my protein files

>gi|119372310|ref|NP_000162.2| glycine receptor subunit alpha-1 isoform 2 precursor [Homo sapiens]
MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCN
IFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEIT
TDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGL
TLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPAR
VGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRR
HHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISR
IGFPMAFLIFNMFYWIIYKIVRREDVHNQ

glra1.subst

P35R

any advice would be greatly appreciated.

blast sift SNP • 3.3k views
ADD COMMENT
1
Entering edit mode

It would seem that the PSI BLAST step would produce an aligned.fasta file. I think the lacI.aligned.fasta file might have existed already, so it did not complain. Maybe your actual run had a file for which the program could not find/create the aligned.fasta file, so it quit.

ADD REPLY
0
Entering edit mode
Yes it must be using a pre aligned file for the test, and psiblast doesnt need to be run. my psi blast path may be in incorrect for when it is called in the absence of a pre aligned file. I will check
ADD REPLY
1
Entering edit mode

Hello arronslacey!

It appears that your post has been cross-posted to another site: http://stackoverflow.com/questions/26663738/fasta-file-not-comptible-with-sift

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Thanks Pierre - duly noted.

ADD REPLY
3
Entering edit mode
10.1 years ago
smilefreak ▴ 420
 /usr/share/ncbi-blast+/bin//bin//psiblast: Command not found.
    exiting because status not equal to 0,

I think that may be your problem is that psiblast cannot be run, and then that causes a cascade in your csh script. I also note that you have a /bin//bin in your path to psiblast, do you actually mean it to be that our is it meant to be /bin.

 /usr/share/ncbi-blast+/bin//bin//psiblast

to

/usr/share/ncbi-blast+/bin//psiblast
ADD COMMENT
1
Entering edit mode
Yes I think this is it, and as Ram pointed out - an aligned file for my test file might have already existed .... hence it did not complain. Thank you!
ADD REPLY
0
Entering edit mode

I can't seem to find out why the extra "bin" is being generated.

My .csh files and config_env.txt file explicitly define the path to the ncbi executables as

/usr/share/ncbi-blast+/bin/

where psiblast is found. However something is adding on this extra "bin" which I just can't seem to find the file responsible.

ADD REPLY
0
Entering edit mode

Are you able to share a copy of your csh file.?

ADD REPLY
2
Entering edit mode

Have solved this now - there is another .csh file called "seq_chosen_via_median_info.csh" where the path was incorrect. it works now, but as word of caution to anyone - after trying to re-do the analysis, it was complaining that the blast database wasn't formatted correctly (which I fixed immediately), however using the test files provided ignores any database conflicts just as it ignores path conflicts because the files are already provided to make a SIFT prediction for the test proteins. this is misleading and makes you think your SIFT setup is good to go, when it might not be. I will provide answer at top of my question. thanks for your help

ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6