I have RNA-seq data and would like to get the reads from specific region for example -/+ 50 reads before and after stop site. do you know how I can get it from bam file
I have RNA-seq data and would like to get the reads from specific region for example -/+ 50 reads before and after stop site. do you know how I can get it from bam file
Easiest way is SAMtools view, you can specify a single region:
samtools view -h <your.bam> chr1:1000-2000
or you can give a BED file (I think in tab-delimited format):
samtools view -h -L <your.bed> <your.bam>
Make sure to use same chromosomal naming (either with 'chr' or without) as in the BAM. You can check with
samtools view -H <your.bam>
There is likely some 'random' script out there that can do this. Perhaps even BEDTools has, these days, some way to do this.
What you need to do is start with the POS field in the BAM/SAM file, which represents the left-most co-ordinate to which your read aligns / maps. Then, you have to parse the CIGAR string to understand how many more bases are involved in the alignment.
...or, if all of your reads are 50bp and your interval of interest is 100bp, then you just need to keep any read whose POS value starts between base 1 and 50.
I am having trouble making it work:
samtools view -h -q 10 input.bam chrX:280-350 | awk 'BEGIN{OFS="\t"}{if($1 ~ /^"@"/) {print} else {if($4 >=280 || $4+length($10)-1 <= 350 && length($10) > 60) {print} else {}}}' | samtools view -Sbo output3.bam -
could you please give a look at my question: Extract reads overlapping a specific region in bam file
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello Sara, You may find some info in this post: C: alignment containing sequences from position a to b