Hello Everyone,
This is a question I asked in stack overflow. I am wondering does anyone here know the solution for this.
http://stackoverflow.com/questions/28128408/python-subprocess-with-hdfs
Here I am trying to modify the bsmap methratio.py so that it can read files from hdfs
If anyone have anyidea on this please let me know.
It's just been 5 hours since you posted on StackOverflow, OP. You need to be a bit more patient. Also, cross-posting is discouraged because it doesn't sit well with folks on either forum.
And you'll benefit a lot if you add comments to the code - so you'll understand what each part of the code does, and others do not need to remember all the lines when they wish to debug your code.
Some of us (well, me actually) have contributed code to bsmap at one point or another. However, what you're looking for is someone familiar with (A) bsmap or, more generally, (B) bisulfite-sequencing and analysis as well as (C) the inter-workings of hadoop and (D) efficient methods for BAM processing on hadoop. I hate to break it to you, but the number of people in the world fitting that description can be counted on one hand. In fact, it's quite likely that they can be counted on one hand with no fingers (i.e., perhaps no such person exists). So, congratulations, you're probably going to have to become that person if you want to actually do this. I should note that it'd be in your best interest to drop python. Hadoop-bam apparently provides a java API, so if you don't know java yet then you're going to "get" to learn.
BTW, this process isn't going to just take a couple days. You're looking at weeks and likely months of coding, debugging and optimizing ahead of you. I don't mean to discourage you with this, I just want you to know what you're in for.
Thanks Ryan. I understood that bam is written in java api but the thing is I am looking for a way to wrap this methratio.py with hadoop. Please give me some insights on this.
It's just been 5 hours since you posted on StackOverflow, OP. You need to be a bit more patient. Also, cross-posting is discouraged because it doesn't sit well with folks on either forum.
And you'll benefit a lot if you add comments to the code - so you'll understand what each part of the code does, and others do not need to remember all the lines when they wish to debug your code.
Some of us (well, me actually) have contributed code to bsmap at one point or another. However, what you're looking for is someone familiar with (A) bsmap or, more generally, (B) bisulfite-sequencing and analysis as well as (C) the inter-workings of hadoop and (D) efficient methods for BAM processing on hadoop. I hate to break it to you, but the number of people in the world fitting that description can be counted on one hand. In fact, it's quite likely that they can be counted on one hand with no fingers (i.e., perhaps no such person exists). So, congratulations, you're probably going to have to become that person if you want to actually do this. I should note that it'd be in your best interest to drop python. Hadoop-bam apparently provides a java API, so if you don't know java yet then you're going to "get" to learn.
BTW, this process isn't going to just take a couple days. You're looking at weeks and likely months of coding, debugging and optimizing ahead of you. I don't mean to discourage you with this, I just want you to know what you're in for.