I got fasta output by using the following codes in R. And I need to read my fasta file (homo_ref.faa) that I obtained using these codes as "
makeblastdb -in homo_ref.faa -dbtype prot
" via terminal. But I get "BLAST options error: File homo_ref.faa does not exist
". How would you recommend me to make changes to my code for this?
library(seqinr)
library(Biostrings)
library(data.table)
library(tidyverse)
#read homo_tabular format
homo_tab = read.csv("proteins_homo.csv", header = TRUE, sep = ",")
homo_tab_1 = homo_tab[,c(7,9:11)]
colnames(homo_tab_1)[2]="ID"
#select longest locus
son <- homo_tab_1 %>%
group_by(Locus) %>%
slice_max(Length, n = 1) %>%
slice_head(n = 1)
#read homo protein fasta ile and convert list to df/dt
human_prot <- read.fasta(file= "homo_s.faa", seqtype="AA", as.string =TRUE, set.attributes =TRUE)
human_prot = unlist(human_prot)
human_prot = as.data.frame(human_prot)
human_prot = setDT(human_prot, keep.rownames = "ID")
#rename column
colnames(human_prot)[1] ="ID"
colnames(human_prot)[2] ="seq"
#merge csv and fasta file
merged = merge(human_prot , son , by="ID", all.x=TRUE)
#remove na rows
library(dplyr)
merged_1 <- na.omit(merged)
#delete column
merged_2 = subset(merged_1, select = -c(3,4,5) )
write.fasta(sequences = merged_2, names = names(merged_3), file.out = "homo_ref.faa")
where i get tabular and fasta file:
Note: I tried to fix the problem by adding the file path but it didn't work.
When I try what you said and run the makeblast command again, this time I get the following error. "
BLAST options error: homo_ref.faa does not match input format type, default input type is FASTA
"... so there is a clear error message which says that your file is not a fasta file...
I understand the message the error is trying to convey. But I'm printing my file as fasta via R. When I look at the file properties, I see that it is fasta. Since I could not solve this problem, I shared my codes and asked for help from you.
output of the following cmd please ?
Computers don't have a sense of humor, nor are they trying to mess with you. Even if the file seems like FASTA to you, it is more important that they seem like FASTA to the script. That means troubleshooting whatever is it in file format that deviates from the expected format, and no amount of code sharing will help.
It appears that the original problem was in read access, and I think we solved it. The format issue you may have to figure out on your own, or simply follow Pierre's suggestions.