I am new to bioinformatics and python,Now I have downloaded nucleosome-seq data and TSS-seq(which also names CAGE) data,so I try to draw a picture like this:
http://www-huber.embl.de/users/anders/HTSeq/doc/tss.html
The difference was I have the TSS data,so I don't need to subtract the TSS feature from Human GTF annotation. But now I face the problems---the question I want to ask is:
I don't know how to write python script to convent my bed file to instead of GTF.
My TSS data are bed format like this:chrom,chromStart,strand.PS:the read length is 27bp;
Here is my script,could anybody tell me why I am fault?
import HTSeq
import pysam
from matplotlib import pyplot
bamfile = HTSeq.BAM_Reader("DRR000367.bam" )
bedfile = HTSeq.BED_Reader("Poly.bed")
coverage = HTSeq.GenomicArray( "auto", stranded=False, typecode="i" )
for almnt in bamfile:
if almnt.aligned:
almnt.iv.length = fragmentsize
coverage[ almnt.iv ] += 1
list(bedfile)
profile = numpy.zeros( 2*halfwinwidth, dtype='i' )
for p in tsspos:
window = HTSeq.GenomicInterval( p.chrom, p.pos - halfwinwidth, p.pos + halfwinwidth, "." )
wincvg = numpy.fromiter( coverage[window], dtype='i', count=2*halfwinwidth )
if p.strand == "+":
profile += wincvg
else:
profile += wincvg[::-1]
pyplot.plot( numpy.arange( -halfwinwidth, halfwinwidth ), profile )
pyplot.show()
Thanks very much.Any who give me any information would be pretty appreciate.
You might give deepTools a try.
I don't know if I am getting this right, but it seems that your TSS isn't in bed format. This is how bed format should look like.
BTW: Cross-posted on biology.stackexchange
I know the standard format of bed file, but I think the reads are all 27bp ,so the chromEnd seem no use, so eventually I use the three
chrom,chromStart,strand
to set my bed file.Please post this as comment.
What if you try to reformat your bed file into something like this:
OK, thank you.Now I find http://www-huber.embl.de/users/anders/HTSeq/doc/features.html can use my own annotation in a flat file, with each line describing a feature and giving coordinates, so I write python script like this:
but it's error:
I am sorry, the error I show was the false bed format as you pointed
I've reformatted your post so the python should render correctly. You might double check this, since I really didn't go through the code to ensure proper nesting.
Thank you .These are my complete python script.
You must have either left something out or changed something. The following does nothing:
Also,
tsspos
is never defined, soprofile
will just be a bunch of zeroes.Thank you,Actually I want to make my TSS annotation file to produce formats as following:
1. by using
HTSeq.GenomicArray
function to produce a format like this:Below is an example which I imitated from
http://www-huber.embl.de/users/anders/HTSeq/doc/features.html#HTSeq.GenomicFeature.attr
2. Then I can use HTSeq.GenomicInterval function to produce like this
<GenomicInterval object 'chr3', [123203,127245), strand '+'>
. Eventually I can continue to run the scripts to make the plot.I don't know if you could understand what I said.
Yes, I understand that, I was pointing out that you never actually used
genes
oriv
after setting them. And you just write overiv
every time rather than appending to it.Thank you.
genes
andiv
which I set had made some errors, so I can't continue, I have the bed file,so if I follow the instructions as what the guides told,I could succeed. But I was wrong.I've tried to make similar figures, try Genomation R pacakge
Thank you. And these problems had confused me a lot, I will try.