Hi I would like to extract subsequences from a large fasta file and make a new fasta file with the extracted seqences using python preferably.
I have a csv file with the following format:
id, start, stop, header
id1, 3, 10, Contig0
id2, 12, 25, Contig1
id3, 19, 40, Contig2
the input fasta file has the following format:
>Contig0
(Contig0 sequence)
>Contig1
(Contig1 sequence)
>Contig2
(Contig2 sequence)
I would like an fasta file output that has the following format:
>id1
(Contig0 sequence from bp 3-10)
>id2
(Contig1 sequence from bp 12-25)
>id3
(Contig2 sequence from bp 19-40)
If anyone has any suggestions or a script that can do this, any help would be greatly appreciated.
Thanks for the help! I wrote a script and it was not very efficient so it ran very slow, so I did some more research and found bedtools getfasta and that worked for me.