Counting reads starting at a coordinate in bam file
0
0
Entering edit mode
5.2 years ago
aboyd003 • 0

I am attempting to count the number of reads in my bam at a list of specific coordinates. I am using samtools. Queryid specifies chromosome and mylookup specifies the coordinate. The coordinate is a single position and not a feature. A GFF file is not available to run standard programs like featureCount. The resulting count is 663504 which is more than 661089. Does anyone have ideas about this?

samtools view -c $FILE "$queryid":"$mylookup"-"$mylookup"
RNA-Seq samtools CAGEseq • 1.4k views
ADD COMMENT
0
Entering edit mode

featureCounts accepts custom position lists in its SAF format, check its manual. From this you can quantify any given intervals in the genome.

ADD REPLY
0
Entering edit mode

I'm unsure of precisely what you want, but chr:pos-pos returns all alignments that overlap pos and not all alignments starting at pos.

Your title implies you wanted the latter. If this is the case, I don't think such a tool exists but you could use mpileup to get a single coordinate and then count the '^' symbols. Eg:

samtools mpileup -r 1:101376-101376 -B -Q 0 in.cram|awk '{gsub("[^^]","",$5);print length($5)}'

The regexp replaces anything not "^" with "" ([^x] is not x, so [^^] is not ^) and then counts the number of remaining symbols. "-Q0" is needed to avoid removal of low quality bases.

That may still remove duplicates unless other mpileup options are used? I forget now how it works.

Alternatively just use normal samtools view and count the number of lines exactly starting at the desired pos.

samtools view in.cram 1:101376-101376 | awk '$4==101376 {n++} END {print n}'

5

ADD REPLY

Login before adding your answer.

Traffic: 2910 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6