Hi I am trying to exclude a span of nucleotides from inbetween my contigs. I have the contig IDs and the start:end position as given below. I would like to remove all the bases between start & end.
Example;
> ID length start..end
>Contig1 100 20..35
> Contig2 30 3..12
If contig1 looks like below, I want to exclude the bases in bold by splitting the sequence and then create a new sequence with the bases on either side of the excluded region.
Input:
>Contig1
TTGTTCAACGGATCCACCT***GTTGCCAAGAGTGCTTCAGTACATTGCTCACGGCTGAA***TCCCATATCCATCAAAGCACAAGATTTGAATTCACTCGAGGATCTGCTTCGTCGACCATTGGAAATGAAAAAATTACAATTACACATTGAATTTGTAAAGCTTGAAATTAATGA
Output:
>Newcontig
TTGTTCAACGGATCCACCTTCCCATATCCATCAAAGCACAAGATTTGAATTCACTCGAGGATCTGCTTCGTCGACCATTGGAAATGAAAAAATTACAATTACACATTGAATTTGTAAAGCTTGAAATTAATGA
Most of the tools that i encountered remove the bases from trailing ends. Please let me know if you have any suggestions to specify the start:end and create a new sequence excluding that region.
I have about 100 contigs to clean this way. Any help would be highly appreciated. Thank you so much!
post is confusing to me without input sequences, format, input files and expected output. Could you please elaborate with example input and expected output? Thanks.
Hi, I have modified the post. Hope it makes sense now. Thanks.
Format of coordinates to be excluded is in fasta format. Is that correct?
Yes, that's right.