I borrowed a script from here to extract some sequences from a fasta nt file test.fa using the positions stored in the file positiontest.txt
. I was getting a NULL returned I think at the apply() step, so I made some simpler files to test it on. Now executing it returns Error in infas[[seq_id]] : subscript out of bounds, Calls: apply -> FUN
. If you could help me debug and find where I'm asking for something that isn't there that will help, and hopefully I can proceed to the NULL problem.
#!/usr/bin/Rscript
library(seqinr)
infas=read.fasta("test.fa")
posi=read.delim("positiontest.txt", header=FALSE, sep="\t")
outfas=file("orf2seqs.fa", 'w')
doitall=function(x) {
seq_id <- x[[1]]; start <- x[[2]]; stop <- x[[3]]; seq <- infas[[seq_id]]
seqv <- seq[start:stop]
header <- paste(sep="", ">", attr(seq, "name"), "_", start, "_", stop, "\n")
cat(file=outfas, header)
cat(file=outfas, toupper(paste(sep="", collapse="", seqv)), "\n")
}
apply(posi, 1, doitall)
close(outfas)
test.fa
:
>1234567
AATGATAATGATGATGATGAT
>1357924
GTGATGATGATGATGAAGATGGATGATG
positionstest.txt
:
1234567 2 9
1357924 5 10
(There are probably other scripts / approaches, but for the sake of learning I would rather persevere with this script)
Thanks. Inserting
as.character()
removes the Out of bounds error and that makes sense. Now I'm back to the NULL.How would strand be defined?
Heh that's a good question. There's nothing in your data that defines the strand of the sequences. You can just remove this variable from the header. You'll see that the output will be written to the file
orf2seqs.fa
.Oh I'm sorry, this is removed in the script, previously I was trying it with strand info but I simplified it to troubleshoot.
Ignore my last comment, the NULL seems to be there regardless, the output is there as it should be now!
I fixed the
NULL
(or rather hid it) withinvisible()
wrapping.Thanks.