Removing specified range of bases from middle of the contigs and creating new sequences
0
0
Entering edit mode
3.3 years ago

Hi I am trying to exclude a span of nucleotides from inbetween my contigs. I have the contig IDs and the start:end position as given below. I would like to remove all the bases between start & end.

Example;

> ID length start..end

>Contig1 100    20..35

> Contig2 30    3..12

If contig1 looks like below, I want to exclude the bases in bold by splitting the sequence and then create a new sequence with the bases on either side of the excluded region.

Input:

>Contig1
TTGTTCAACGGATCCACCT***GTTGCCAAGAGTGCTTCAGTACATTGCTCACGGCTGAA***TCCCATATCCATCAAAGCACAAGATTTGAATTCACTCGAGGATCTGCTTCGTCGACCATTGGAAATGAAAAAATTACAATTACACATTGAATTTGTAAAGCTTGAAATTAATGA

Output:

>Newcontig
TTGTTCAACGGATCCACCTTCCCATATCCATCAAAGCACAAGATTTGAATTCACTCGAGGATCTGCTTCGTCGACCATTGGAAATGAAAAAATTACAATTACACATTGAATTTGTAAAGCTTGAAATTAATGA

Most of the tools that i encountered remove the bases from trailing ends. Please let me know if you have any suggestions to specify the start:end and create a new sequence excluding that region.

I have about 100 contigs to clean this way. Any help would be highly appreciated. Thank you so much!

Contigs Assembly fastafile • 1.6k views
ADD COMMENT
0
Entering edit mode

post is confusing to me without input sequences, format, input files and expected output. Could you please elaborate with example input and expected output? Thanks.

ADD REPLY
0
Entering edit mode

Hi, I have modified the post. Hope it makes sense now. Thanks.

ADD REPLY
0
Entering edit mode

Format of coordinates to be excluded is in fasta format. Is that correct?

ADD REPLY
0
Entering edit mode

Yes, that's right.

ADD REPLY

Login before adding your answer.

Traffic: 2161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6