I'M Looking For A Bioinformatics Problem Which Could Benefit From A New Or Improved Algorithm, Or From Being Mapped To The Gpu Architecture..
3
5
Entering edit mode
11.4 years ago
tyneuroth ▴ 50

Hello. I am an undergrad majoring in computer science. I am learning GPGPU programming right now, and would like to find a good problem to apply it to. I was wondering if someone can point me in the right direction for finding a relatively simple but important bioinformatics problem, of which I can contribute to the solution by attempting to write an improved, or massively parallel algorithm.

algorithm • 4.3k views
ADD COMMENT
1
Entering edit mode

For short reads alignment or genome assembly.

ADD REPLY
7
Entering edit mode
11.4 years ago

Some ideas:

  • How about the calculation of Linkage Disequilibrium? It's relatively basic math (check these slides for a quick introduction) and easily done in a second for just a couple of SNPs. However, if you want to do LD-analysis genome-wide over many chromosomes with several hundred thousand SNPs you'll have to wait for hours until you see results as you compare all SNPs with all SNPs. This could benefit from a pre-processing step which splits up the SNPs into several sets and assigns each set to a different thread and then calculates LD using several CPUs (or GPUs).
  • How about BLAST? As far as I know there's already one GPU-BLAST implementation, but that one can only align protein sequences, so you could go for nucleotide-alignment. The problem here might be that the algorithms involved are a bit more complicated (I guess you could skip the database-creation and just go for the alignment itself?).
  • ab initio gene prediction: There are a couple of programs which predict genes based on Hidden Markov Models like SNAP or Augustus, but none of these have parallel (or even GPU) implementations.
ADD COMMENT
1
Entering edit mode

+1 for GPU-blastn. I've been trying to get the current blastx running on a CUDA, with a distinct amount of difficulty. but blastn is what i really want!

ADD REPLY
0
Entering edit mode

Thanks for the suggestions. I'm not exactly an expert yet in GPGU programming, but I will look into Blastx and see if it is something I can work towards.

ADD REPLY
3
Entering edit mode
11.4 years ago

Hello,

There are currently no good publicly available tools for identifying footprints in DNase-Seq data. The algorithm is pretty straight forward, look within areas of DNase Hypersensitivity for short segments of DNA protected from cleavage by bound Transcription factors. It can be done in parallel with many nodes inspecting different locations simultaneously. If you are interested let me know, and I can help get you started.

ADD COMMENT
0
Entering edit mode

That sounds like something might be interested in.

ADD REPLY
0
Entering edit mode

I am also interested in this concept. I see some relevant publications, but I suppose you mean there is not available code yet, right? I am asking because there have passed 10 months since this post.

Thanks

ADD REPLY
0
Entering edit mode

Since I wrote this, the following paper was published: http://nar.oxfordjournals.org/content/41/21/e201.long It actually provides software for doing DNase-seq analysis. I have not tried the software yet, but I did speak with the author of the paper, and this should address many of the common DNase-seq analysis questions.

ADD REPLY
1
Entering edit mode
11.4 years ago

scan_for_matches could really use an overhaul (reimplementation!). Let me know if you are going for this one.

http://blog.theseed.org/servers/2010/07/scan-for-matches.html

ADD COMMENT
0
Entering edit mode

Thank you for your suggestion. I'll look into it.

ADD REPLY

Login before adding your answer.

Traffic: 1990 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6