Hello
I have performed consensus function in samtools and got the consensus fasta sequence from BAM file, and missing positions were replaced with 'N'.
I would like to find my target sequence's start and end position. My sequence is like
>chr1
NNNNAGTATANNNTATGNNNNN
all the heads and ends are N, and my target sequence is in the middle, while some N could be found in the target sequence, I only want to find the start and end of the position where the sequence is not 'N'.
For example, in my case, the start and end position is chr1:5-17
(including three N in the middle of the target sequence).
I have about 1,000 sequences like my example in the same fasta file, Does anyone know how to find the start and end position for each sequence?
Thanks a lot
Maybe map the sequences with some aligner like
gmap
, and then convert the output bam file to bed so that you get the mapping coordinates of your sequence?