Count the numfer of reads supporting a pattern from a BAM file
1
0
Entering edit mode
3.5 years ago
User000 ▴ 710

Dear all,

I have a BAM file and I want to count how many reads have the following 4 patterns at chromosome region around 250bp:

AAAAAAAGAAAAA
AAAAAAAGAAAAAA
AAAAAAAAGAAAAAA
AAAAAAAAGAAAAA

I have downloaded ASCIIgenome software and tried something with samtools mpileup. But I don't know how to do what I am looking for.

samtools • 1.3k views
ADD COMMENT
0
Entering edit mode

I would do it in this way:

1) extract your reads from this region Extract Reads From A Bam File That Fall Within A Given Region

2) transform BAM to FASTA

3) write a python script that mechanically counts these patterns from FASTA, should be 20 lines.

ADD REPLY
0
Entering edit mode

support the following pattern at region around 250:

i don't understand that sentence.

ADD REPLY
0
Entering edit mode

I edited the question.

ADD REPLY
0
Entering edit mode

that's still not clear. What is the 'pattern' ? how is it different from running

grep -E '(AAAAAAAGAAAAA|AAAAAAAGAAAAAA|AAAAAAAAGAAAAAA|AAAAAAAAGAAAAA)'

ADD REPLY
0
Entering edit mode

I am looking for A indels near 250 bp of the chromosome region. So I want to count how many reads support each of the 4 patterns in order to identify how many do have indesl and so on.

ADD REPLY
0
Entering edit mode

Would something like this work?

samtools view in.bam | grep -E 'AAAAAAAGAAAAA' | wc -l
ADD REPLY
2
Entering edit mode
3.4 years ago
jkbonfield ★ 1.3k

Just use grep. Your pattern is "AAAAAAAGAAAAA" with an optional A either side, so AAAAAAAGAAAAA alone matches the other 3 also.

Hence either:

samtools view in.bam | grep -c AAAAAAAGAAAAA

Or with a modern samtools:

samtools view -c -e 'seq=~"AAAAAAAGAAAAA"' in.bam

If you wish to explicitly add a region, then that can be done on the command line too, either adding one arg at a time (eg 1:250-250 which will be reads overlapping that coord) or with a -L regs.bed option.

Edit: the question is poorly worded, but it dawns on me maybe you're looking for AAAAAAAGAAAAA and not flanking A? If so that's regexp [^A]AAAAAAAGAAAAA[^A]. Read the egrep man page is probably the best starting point.

ADD COMMENT

Login before adding your answer.

Traffic: 1719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6