Hi everyone
I am working on RNA SEQ data for pombe genome. I mapped the reads with the pombe genome and created sam/bam files.
Now my question is i have a list of pombe genes with its coordinates.
I want to find the number of read counts for those genes.
How can i do that. Any code would be really helpful.
I will explain a bit more.
Suppose my gene coordinate for gene SPAC227.11c is 567876 568100
Now i want to find the read coordinates for the gene SPAC227.11c.
How should i do it.
Hope to hear from you guys soon
Regards Varun
actually the contig-name your read was mapped to is written in the third column of your sam file.
Hey David
This works. But i will explain you how my yourGenes.file looks like column1 = chromosome column2 = gene_start(ofcourse 1-based) column3 = gene_end(ofcourse 1 -based)
Now for a bed file we just want to change column2 such that we have value 1 less than the original number i.e if column2 = gene_start is 567876 then in the output of bed file it would be 567875 and column3 = gene_end would remain the same which is 568100.
Is Your perl code above doing the same thing?
Let me know if i am wrong somewhere.
Varun
Sorry, but I actually have no idea what you are talking about! ;)
But if you just want to subtract 1 of your start or end position, you can do it like this:
sorry David,
i have a bed file came from converting the accepted_hits.bam (tophat2 output) but in the column one there is no gene feature or just Latin numbers I, II, II, IV, so one...i used gtf file but i don't know where is the gene coordinate... then how i can get how many reads have mapped to each gene????
thank you