Genome data on hadoop chromosome level
1
0
Entering edit mode
9.9 years ago

Hello Everyone,

I am an IT student doing some work on hadoop in human genome project. My first trouble is how do I store the genome data in hadoop cluster? How do I store data Chromosome wise?

We do have cluster of 30 machines with hadoop. The problem is we are planning to process the human genome project using hadoop. Here the data is in the form of BAM files. I know if I load the data to hdfs, it will automatically split it into chunks and store on the name nodes. That is the problem here. I couldn't split the data like that. Need to split the data chromosome wise so that we can perform bio algorithm computing on them.

Bio algorithm computing : for instance bisulfite methylation extraction.

Currently we use bismap ( python tool ). Is there a way to store the data chromosome wise on hadoop.and run the bismap tool command as map reduce jobs

nga hadoop genome chromosome • 2.9k views
ADD COMMENT
2
Entering edit mode

Hello shalini.ravishankar!

It appears that your post has been cross-posted to another site: SEQanswers.

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode
Thanks Alex. Sure I will take a look in to that.
ADD REPLY
0
0
Entering edit mode
I am sorry for the inconvenience. I will delete the other threads.
ADD REPLY
0
Entering edit mode

Cross posted on Quora http://qr.ae/6EsvP

ADD REPLY
0
Entering edit mode
9.9 years ago

You might start with Michael Hoffman's Genomedata paper, code and documentation.

ADD COMMENT

Login before adding your answer.

Traffic: 2203 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6