Hello Everyone,
I am an IT student doing some work on hadoop in human genome project. My first trouble is how do I store the genome data in hadoop cluster? How do I store data Chromosome wise?
We do have cluster of 30 machines with hadoop. The problem is we are planning to process the human genome project using hadoop. Here the data is in the form of BAM files. I know if I load the data to hdfs, it will automatically split it into chunks and store on the name nodes. That is the problem here. I couldn't split the data like that. Need to split the data chromosome wise so that we can perform bio algorithm computing on them.
Bio algorithm computing : for instance bisulfite methylation extraction.
Currently we use bismap ( python tool ). Is there a way to store the data chromosome wise on hadoop.and run the bismap tool command as map reduce jobs
Hello shalini.ravishankar!
It appears that your post has been cross-posted to another site: SEQanswers.
This is typically not recommended as it runs the risk of annoying people in both communities.
cross-posted on SO: http://stackoverflow.com/questions/27958594/hadoop-for-human-genome-data
Cross posted on Quora http://qr.ae/6EsvP