Entering edit mode
9.0 years ago
Eva_Maria
▴
190
I have tab limited file like this
GCF_000707685.1_2840 0 145
GCF_000706885.1_542 0 150
GCF_000593365.1_489 0 156
GCF_000593345.1_3957 256 289
GCF_000593325.1_3041 780 958
I want to extract sequence based on position like 0 to 145 , 256 to 289
>GCF_000707685.1_2840
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>GCF_000706885.1_542
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
How to do this? awk or perl
Hi,
The
getfasta
inbedtools
can accomplish this job. Please follow the link http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html for details.its not working for me (files about 7 gb). is there any programme or awk?
Hi, I write a perl script as follows.
Usage: perl extr_seq.pl genome.fa list.txt
. I am just wondering why bedtools does not work for you. Personally I think it's a good tool.thanks it works
If bedtools is not working for you then try https://github.com/mdshw5/pyfaidx#cli-script-faidx
I think bedtools might read the entire FASTA into memory before extracting, but not sure.
Here is an awk, one line solution to you problem.
if this is your file:
the output is:
I want to do as a Batch processing because i have 20 mb text file and 256 mb fasta
what do you mean by "batch processing"?