I want te be able to count the number of occurrences in a given sequence (for example ACTTTAG) in the GRCh38 reference genome. Is there an existing tool for doing this? Thanks!
I want te be able to count the number of occurrences in a given sequence (for example ACTTTAG) in the GRCh38 reference genome. Is there an existing tool for doing this? Thanks!
I wrote a simple script for finding patterns (regular expressions in fact) in fasta files, it's fastaRegexFinder.py and I also happen to mention it in this post Quadruplex sequence batch prediction
If you just want to count the number of occurrences you can do
fastaRegexFinder.py -f genome.fa -r 'ACTTTAG' | wc -l
You can also use bowtie1.
It is specially nice in finding (mapping) short sequences..
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.