Entering edit mode
7.0 years ago
KVC_bioinfo
▴
600
Hello All,
I have sam and bam file from alignment of reads (single-end) obtained from Oxford nanopore technology(human genome as a reference) I want to extract reads that aligned to given coordinates. This should only include reads that aligned and not the clipped reads. I have used following command which gives me reads with clipping:
samtools file.bam "chr1:region1-region2" > file_out.bam
Can someone help me here? Thank you in advance
Thank you. I really appreciate it. But I am absolutely unfamiliar to java.
Is there any other way of doing it?
sure, you can try awk:
Thanks! what does ref:1-100 mean here?
it's your "chr1:region1-region2"
I am using the command you suggested:
samtools view -h sorted.bam "chr1:region1-region2" | awk -F '\t' '/^@/ {print;next;} (!($6 ~ /[HS]/ )){print;} '
However, this just gives me names such as:
@HD VN:1.3 SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
I want to extract the reads that mapped in that region but I don't want the soft clipped reads. I assume the alignment bam file already excludes the hard clipped dreads.
-h includes the header. The rest of your reads should be after the header. The important bit in that command is !($6 ~ /[HS]/ , which is going to remove any reads where the CIGAR string indicates clipping.
Thank you very much. I am not getting any reads in resulting file. I am interested in obtaining reads that mapped to the reference where I do not want the clipped reads in it.
Suppose, original read length is 2000bp and only 50 map to the reference rest of the reads are clipped. Then I am interested in those 50 reads. I guess the above command removes entire 2000bp read.
Also, I am not sure why the header includes all chr when I have specified only chr1 in my command.
if your region is really "chr1:region1-region2" then, please, try to understand what you're doing...