Entering edit mode
4.2 years ago
genomes_and_MGEs
▴
10
Hey everyone,
I have identified several regions of interest, using genbank files as input. This tool only outputs the start and stop position of these regions, but doesn't provide the option to extract the regions. Could you please let me know how to do this?
I have a output text file like this:
region organism contig start stop genes
NC_002516.2_17 GCF_000006765.1_ASM676v1_genomic.gbk NC_002516.2_Pa 230543 237111 6
NC_002516.2_0 GCF_000006765.1_ASM676v1_genomic.gbk NC_002516.2_Pa 675861 703058 32
NC_002516.2_4 GCF_000006765.1_ASM676v1_genomic.gbk NC_002516.2_Pa 786074 797598 16
NC_002516.2_14 GCF_000006765.1_ASM676v1_genomic.gbk NC_002516.2_Pa 895824 901046 7
...
I would like to extract these regions into single or multi-fasta file.
Thanks in advance for taking the time!
You can take columns 3, 4, and 5 and use BEDTools (
bedtools getfasta
) to extract the desired regions. Just make sure the naming of the contigs are consistent with the reference genome you're extracting from.