Dear all,
I have a reference that looks like this
AAAAATTTTTTTNNNNNNGGGGGGCCCCCC
and on this reference reads have been mapped
ref: AAAAATTTTTTT-**NNNNNN-**GGGGGGCCCCCC
read1: AAAAATTTTTTT-**ATGCAA**-GGGGGGCCCCCC
....
read2: AAAAATTTTTTT-**CCGTAA**-GGGGGGCCCCCC
and then I extracted the bam file. I want to extract the sequences that have been mapped the reference in the positions 13:18. For this I have used the following command
samtools view -b -h file.sorted.bam Reference_barcodes:13-18 > test.bam
Now I want from this bam to isolate in a fasta file the exact sequences that have mapped the 13-18 and my data to look like this
ref: NNNNNN
read1:ATGCAA
read2:CCGTAA
Have you looked at
samtools ampliconclip
: http://www.htslib.org/doc/samtools-ampliconclip.html You will need a newer version of samtools.samtools faidx
can extract regions of a fasta filehttp://www.htslib.org/doc/samtools-faidx.html
the extracted file will be also in FASTA so a bit of postprocessing may be needed to fix that (see the concept "linearize FASTA file")