Find patterns in DNA sequence
4
0
Entering edit mode
8.7 years ago
kindlychung ▴ 60

I want te be able to count the number of occurrences in a given sequence (for example ACTTTAG) in the GRCh38 reference genome. Is there an existing tool for doing this? Thanks!

sequence dna • 2.6k views
ADD COMMENT
2
Entering edit mode
8.7 years ago
dago ★ 2.8k

You can use Biostring in Bioconductor.

The function countPattern should do the job. Just check if it is using a sliding window or not.

ADD COMMENT
0
Entering edit mode
8.7 years ago
5heikki 11k

Jellyfish is pretty nice for kmer counting.

ADD COMMENT
0
Entering edit mode
8.7 years ago

I wrote a simple script for finding patterns (regular expressions in fact) in fasta files, it's fastaRegexFinder.py and I also happen to mention it in this post Quadruplex sequence batch prediction

If you just want to count the number of occurrences you can do

fastaRegexFinder.py -f genome.fa -r 'ACTTTAG' | wc -l
ADD COMMENT
0
Entering edit mode
8.7 years ago

You can also use bowtie1.

It is specially nice in finding (mapping) short sequences..

ADD COMMENT

Login before adding your answer.

Traffic: 1770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6