Hello, I have a query related to upstream and downstream regions extraction. See I have done blastn for 30 nt small query sequences against 500 nt long db sequences now my question is after running blastn, how can I extract upstream and downstream regions for my 30 nt small query sequences?
If you have the coordinate of the alignment e.g. loc, then you can extract the sequence from the reference 500nt sequence starting from loc-30 with length 30nt+2*30nt providing that you have no insertion or deletion in the alignment. If you have the indel, then just add the corresponding length.
To actually do the extraction, you can use the substr function from most programme, e.g. R, perl, awk.
ADD COMMENT
• link
updated 2.1 years ago by
Ram
44k
•
written 9.5 years ago by
Sam
★
4.8k
0
Entering edit mode
but i want to extract after the alignment with standalone blast and also how can i come to know that in my local database this is the reference sequence for that small sequence?
After the alignment, you should be able to get the alignment file, which should contain the sequence id for the reference sequence. You can then get the reference sequence fasta file which you can use to extract the required sequence.
ADD REPLY
• link
updated 2.1 years ago by
Ram
44k
•
written 9.5 years ago by
Sam
★
4.8k
0
Entering edit mode
Yes I got the subject id along with query id. Thank you. And is there any R script or perl script (understandable) for extracting the sequence (particular region)?
I normally do it using awk, but I guess you can try the Biostrings package in R
ADD REPLY
• link
updated 2.1 years ago by
Ram
44k
•
written 9.5 years ago by
Sam
★
4.8k
0
Entering edit mode
hello, I'm stuck with the blast command, see when I do blastn for 300 bp long seq then it retains result but when I do blastn for 20-25 nucl seq then it retains no result. Do you have any idea why it is so? I want to blast small sequences with my local database.
Maybe you can post a new question to get an answer to that? I am not familiar with blast so I cannot give you definitive answer. However, the most straightforward guess will be that there simply be no sequence similarity with your small sequence and your local database.
Hello Sam Sir, can you operate my file to extract upstream and downstream regions using your command(s). I shall be thankful to you. actually I tried but I failed and for me it is taking too much time if you could do it or you can provide me awk command I can try that one also. please help me out here.
Where loc is the coordinate of the alignment. If you need to do the for different sequence and different coordinates, you will need a more complicated script.
The first part of the script is essentially concatenating the whole sequence into one string and the second part (substr) is where you extract the region of interest
ADD REPLY
• link
updated 2.2 years ago by
Ram
44k
•
written 9.4 years ago by
Sam
★
4.8k
0
Entering edit mode
Actually I have excel file which contains sequence id, start and stop respectively (generated by blastn) position separately and fasta file which contain sequences separately and I want to extract region from that fasta file.
The result will be print to output.fa do delete the file or rename it in the awk code before each run, otherwise the result will only keep appending onto the file
but i want to extract after the alignment with standalone blast and also how can i come to know that in my local database this is the reference sequence for that small sequence?
After the alignment, you should be able to get the alignment file, which should contain the sequence id for the reference sequence. You can then get the reference sequence fasta file which you can use to extract the required sequence.
Yes I got the subject id along with query id. Thank you. And is there any R script or perl script (understandable) for extracting the sequence (particular region)?
I normally do it using awk, but I guess you can try the Biostrings package in R
hello, I'm stuck with the blast command, see when I do blastn for 300 bp long seq then it retains result but when I do blastn for 20-25 nucl seq then it retains no result. Do you have any idea why it is so? I want to blast small sequences with my local database.
Maybe you can post a new question to get an answer to that? I am not familiar with blast so I cannot give you definitive answer. However, the most straightforward guess will be that there simply be no sequence similarity with your small sequence and your local database.
Can you post awk command?
Hello Sam Sir, can you operate my file to extract upstream and downstream regions using your command(s). I shall be thankful to you. actually I tried but I failed and for me it is taking too much time if you could do it or you can provide me awk command I can try that one also. please help me out here.
Assuming your input is a fasta file containing only your target sequence, then you can use this script:
Where
loc
is the coordinate of the alignment. If you need to do the for different sequence and different coordinates, you will need a more complicated script.The first part of the script is essentially concatenating the whole sequence into one string and the second part (substr) is where you extract the region of interest
Actually I have excel file which contains sequence id, start and stop respectively (generated by blastn) position separately and fasta file which contain sequences separately and I want to extract region from that fasta file.
This command gives error.
I tried this command but it also generates an error.
Can't we connect on team viewer, if you can solve it over there? Thanku :)
If you put your excel content into a text file (tab delimited), you can do the following:
Your input file need to be in the format of
<id>\t<chr>\t<start>\t<end>
If you start and end position doesn't account for the flanking region, then you can use this
The result will be print to output.fa do delete the file or rename it in the awk code before each run, otherwise the result will only keep appending onto the file
And for your information Extract User Defined Region From An Fasta File
do i need to install sam tool for this?