EDIT: solved. whilst I had properly defined the path to the NCBI executables in SIFT_for_submitting_fasta_seq.csh
, I had not properly set the path in seqs_chosen_via_median_info.csh
It worked after I modified this second file. As I mention in comments below, just because the test files work fine, it doesn't mean your SIFT has properly been configured. When you read the output of the test files, all incorrect path/database configurations are ignored because the required database/protein alignment files are provided within the SIFT directory. I find this a bit misleading, but there we go. something to consider.
ORIGINAL QUESTION
I cannot determine what is causing this however.
I am successfully using standalone SIFT. I can run SIFT using the test files provided:
$ csh bin/SIFT_for_submitting_fasta_seq.csh test/lacI.fasta db/uniref.fa test/lacI.subst
tail is lacI.fasta
query is /home/arron/Phd/programs/sift5.2.1/tmp/lacI.fasta.query
/usr/share/ncbi-blast+/bin//bin//psiblast: Command not found.
exiting because stauts not equal to 0
tell me i've entered
info_on_seqs
*** The following sequences have been removed because they were found to be over 100% identical with your protein query: *** The following sequences have been removed because they were found to be over 100% identical with your protein query: QUERY has 100 identity
UniRef90_P03023Lacto has 100 identity
UniRef90_A8AKB7Putat has 81 identity
UniRef90_C1M7F8Lacre has 81 identity
UniRef90_D2TK52Lacto has 78 identity
UniRef90_E8XR69Trans has 59 identity
UniRef90_D2C396Trans has 55 identity
UniRef90_A9MQ83Putat has 76 identity
UniRef90_C6DD30Trans has 53 identity
UniRef90_E0SHJ3Trans has 56 identity
UniRef90_D2ZG46Ribos has 54 identity
UniRef90_C6CD23Trans has 51 identity
UniRef90_A4W7D1Trans has 54 identity
UniRef90_E8XW29Trans has 46 identity
UniRef90_A1RAY2Trans has 38 identity
UniRef90_D6DVN0Trans has 53 identity
UniRef90_C4S4I8Lacto has 52 identity
UniRef90_C9XUV2Lacto has 45 identity
UniRef90_E3G6U0Trans has 54 identity
UniRef90_C4UWB0Lacto has 52 identity
UniRef90_C4UNI7Lacto has 48 identity
UniRef90_D4GHV0LacIn has 50 identity
UniRef90_C4TZY6Lacto has 50 identity
UniRef90_D7CXK6Trans has 36 identity
UniRef90_F0KTF2Lacre has 50 identity
UniRef90_E1SIH9HTH-t has 50 identity
UniRef90_D4E812LacIf has 44 identity
UniRef90_D6YQA5HTH-t has 42 identity
UniRef90_D5CE50Sugar has 43 identity
UniRef90_C6CQF2Trans has 43 identity
UniRef90_C4X4N1Trans has 43 identity
UniRef90_A4TIF9Trans has 44 identity
UniRef90_C4T764Lacto has 52 identity
UniRef90_D1RRC7Trans has 41 identity
UniRef90_E6WEY3Trans has 49 identity
UniRef90_D8MUA6Lacto has 51 identity
UniRef90_C9XWJ1Lacto has 45 identity
UniRef90_D5CJ02Lacre has 54 identity
UniRef90_D1RT73Trans has 43 identity
UniRef90_P03023Lacto, UniRef90_P03023Lacto,.
.
before seg fault?
9
10
13
14
15
16
18
19
20
21
22
23
25
30
34
45
50
53
56
65
76
127
166
179
187
188
197
201
205
218
220
241
247
249
250
252
256
272
274
284
286
288
326
356
357
358
359
360
filename is /home/arron/Phd/programs/sift5.2.1/blimps/docs/default.diri
about to make predictions
not including UniRef90_C4T764Lacto with X at 1
not including UniRef90_C4T764Lacto with X at 2
not including UniRef90_C4T764Lacto with X at 14
done checking all subst
trying to free things here
unalias: rm not found
Output in /home/arron/Phd/programs/sift5.2.1/tmp/lacI.SIFTprediction
and produces a SIFT prediction file as expected.
However, when I try this with one of my own proteins of interest, the SIFT prediction file is not created.
$ csh bin/SIFT_for_submitting_fasta_seq.csh test/NP_000162.2.fasta db/uniref.fa test/glra1.subst
tail is NP_000162.2.fasta
query is /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.fasta.query
/usr/share/ncbi-blast+/bin//bin//psiblast: Command not found.
exiting because stauts not equal to 0
tell me i've entered
info_on_seqs
cannot open file /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.alignedfasta
Output in /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.SIFTprediction
The clue here is in:
cannot open file /home/arron/Phd/programs/sift5.2.1/tmp/NP_000162.2.alignedfasta
where it appears an alignment via psiblast could not be made. I cannot find this file, but it should be produced.
how could this be??
For reference I include my
--1) test files (fasta and substitution file)
lacI.fasta
>gi|2506562|sp|P03023|LACI_ECOLI LACTOSE OPERON REPRESSOR
MKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQ
SLLIGVATSSLALHAPSQIVAAIKSRADQLGASVVVSMVERSGVEACKAAVHNLLAQRVS
GLIINYPLDDQDAIAVEAACTNVPALFLDVSDQTPINSIIFSHEDGTRLGVEHLVALGHQ
QIALLAGPLSSVSARLRLAGWHKYLTRNQIQPIAEREGDWSAMSGFQQTMQMLNEGIVPT
AMLVANDQMALGAMRAITESGLRVGADISVVGYDDTEDSSCYIPPLTTIKQDFRLLGQTS
VDRLLQLSQGQAVKGNQLLPVSLVKRKTTLAPNTQTASPRALADSLMQLARQVSRLESGQ
lacI.subst
K2S
P3M
--2) my protein files
>gi|119372310|ref|NP_000162.2| glycine receptor subunit alpha-1 isoform 2 precursor [Homo sapiens]
MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCN
IFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEIT
TDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGL
TLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPAR
VGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRR
HHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISR
IGFPMAFLIFNMFYWIIYKIVRREDVHNQ
glra1.subst
P35R
any advice would be greatly appreciated.
It would seem that the PSI BLAST step would produce an
aligned.fasta
file. I think thelacI.aligned.fasta
file might have existed already, so it did not complain. Maybe your actual run had a file for which the program could not find/create thealigned.fasta
file, so it quit.Hello arronslacey!
It appears that your post has been cross-posted to another site: http://stackoverflow.com/questions/26663738/fasta-file-not-comptible-with-sift
This is typically not recommended as it runs the risk of annoying people in both communities.
Thanks Pierre - duly noted.