Dear all,
I have a bam file from which I want to extract the exact part of DNA sequences that map within the following region 54-78 to my reference .
I have tried so far unsuccessfully bedtools, samtools, custom scripts and I was wondering if there is a quick way to do it with RSamtools or any other tool in R.
samtools command:
samtools faidx Reference_barcodes.fasta "Reference_barcodes:54-78" | grep -v '^>' | tr -d '\n' && echo " REF" && java -Xmx200g -jar /data/biosoftware/sam4weblogo/jvarkit/dist/sam4weblogo.jar --interval "Reference_barcodes:54-78" -F tabular test_4_seqs_sorted.bam
bedmap --echo --fraction-map 1 <(bam2bed < test_4_seqs_sorted.bam) <(echo -e "Reference_barcodes\t54\t78") > answer.bed
Here is a link to my data which is tiny, only 4 sequences, https://1drv.ms/u/s!ArOzUuixE-mggfskZPX_Y0cpyQH4RA?e=hAa0W3 and this is the desired result that I want to take out
TACGGTTATATTGACAGACCGAGGG
TAATTCGAAACCATAAGGCTCGCAA
TACGGTTATATTGACAGACCGAGGG
TAATTCGAAACCATAAGGCTCGCAA
I am at a desperate point so any serious help and guidance is more than appreciated and I am happy to tip for it if possibly though "Buy me a coffee" or patreon format
What you are trying to do is not trivial, and I don't think you'll find a simple command to do that (but maybe I'm wrong). If I had to do that, I would write my own script using some library for parsing bam files. You'll need to consider the alignment start position and the CIGAR string, so you can extract matches within the required range. Sorry this is not really an answer, but more of a general direction. Good luck.