Hi pals, I got some problems here.
I have an excel file with several genomic coordinates in the format "strand:start:end", and I also have access to my genome of interest in different formats (fna, gbk...). Now, what I want to do is to extract and obtain a file with all the sequences corresponding to my coordinates, I don't mind if the sequences have a header or a custom ID, I just want the sequences in the same order as I have them in my excel file. Is there any tool that can do this? It would be much appreciated since I have to extract more than 2000 sequences, and doing this manually is humanly impossible.
The next step is to perform a BLAST with another genome, also accessible in several formats. I'm using the blastn NCBI program and this script -task blastn -query "query" -subject "subject" -oytfmt 6 >name
, how can I make the output file to only contain the most relevant matches? (E.g. only the matches that have an E value lower than 0.0001)
Thanks beforehand!
Hi Matt Shirley,
I tried your approach, and although it's confortable to work in Windows, I find Linux Bedtools more suitable for my current project. Thanks for you help!!