Plot the nucleosome distance to TSS
1
0
Entering edit mode
10.1 years ago
khl0798 • 0

I am new to bioinformatics and python,Now I have downloaded nucleosome-seq data and TSS-seq(which also names CAGE) data,so I try to draw a picture like this:

http://www-huber.embl.de/users/anders/HTSeq/doc/tss.html

The difference was I have the TSS data,so I don't need to subtract the TSS feature from Human GTF annotation. But now I face the problems---the question I want to ask is:

I don't know how to write python script to convent my bed file to instead of GTF.

My TSS data are bed format like this:chrom,chromStart,strand.PS:the read length is 27bp;

Here is my script,could anybody tell me why I am fault?

import HTSeq
import pysam
from matplotlib import pyplot

bamfile = HTSeq.BAM_Reader("DRR000367.bam" )
bedfile = HTSeq.BED_Reader("Poly.bed")
coverage = HTSeq.GenomicArray( "auto", stranded=False, typecode="i" )

for almnt in bamfile:
    if almnt.aligned:
        almnt.iv.length = fragmentsize
        coverage[ almnt.iv ] += 1

list(bedfile)
profile = numpy.zeros( 2*halfwinwidth, dtype='i' )
for p in tsspos:
    window = HTSeq.GenomicInterval( p.chrom, p.pos - halfwinwidth, p.pos + halfwinwidth, "." )
    wincvg = numpy.fromiter( coverage[window], dtype='i', count=2*halfwinwidth )
    if p.strand == "+":
        profile += wincvg
    else:
        profile += wincvg[::-1]

pyplot.plot( numpy.arange( -halfwinwidth, halfwinwidth ), profile )
pyplot.show()

Thanks very much.Any who give me any information would be pretty appreciate.

ChIP-Seq • 4.3k views
ADD COMMENT
1
Entering edit mode

You might give deepTools a try.

ADD REPLY
0
Entering edit mode

my TSS data are bed format like this:chrom,chromStart,strand

I don't know if I am getting this right, but it seems that your TSS isn't in bed format. This is how bed format should look like.

BTW: Cross-posted on biology.stackexchange

ADD REPLY
0
Entering edit mode

I know the standard format of bed file, but I think the reads are all 27bp ,so the chromEnd seem no use, so eventually I use the three chrom,chromStart,strand to set my bed file.

ADD REPLY
0
Entering edit mode

Please post this as comment.

What if you try to reformat your bed file into something like this:

chromosome start start+27 end
ADD REPLY
0
Entering edit mode

OK, thank you.Now I find http://www-huber.embl.de/users/anders/HTSeq/doc/features.html can use my own annotation in a flat file, with each line describing a feature and giving coordinates, so I write python script like this:

import csv
genes = HTSeq.GenomicArray( ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chr20","chr21","chr22","chrX","chrY" ],typecode = "O" )
for feature in bedfile:                                                              
      iv = HTSeq.GenomicInterval( feature.chrom,int (feature.start) - 1000 , int (feature.end) - 1000,feature.strand )

but it's error:

ValueError                                Traceback (most recent call last)
<ipython-input-6-aa6393877b3a> in <module>()
----> 1 for p in bedfile:
      2            iv = HTSeq.GenomicInterval(p.chrom,int (p.start) - 1000 , int (p.start) - 1000,p.strand )
      3 
/home/khl/.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/__init__.pyc in __iter__(self)
   1025          if len(fields) > 9:
   1026             raise ValueError, "BED file line contains more than 9 fields"
-> 1027          iv = GenomicInterval( fields[0], int(fields[1]), int(fields[2]), fields[5] if len(fields) > 5 else "." )
   1028          f = GenomicFeature( fields[3] if len(fields) > 3 else "unnamed", "BED line", iv )
   1029          f.score = float( fields[4] ) if len(fields) > 4 else None
ADD REPLY
0
Entering edit mode

I am sorry, the error I show was the false bed format as you pointed

ADD REPLY
0
Entering edit mode

I've reformatted your post so the python should render correctly. You might double check this, since I really didn't go through the code to ensure proper nesting.

ADD REPLY
0
Entering edit mode

Thank you .These are my complete python script.

import HTSeq
import pysam
from matplotlib import pyplot
fragmentsize = 200
bamfile = HTSeq.BAM_Reader("DRR000367.bam" )
coverage = HTSeq.GenomicArray( "auto", stranded=False, typecode="i" )
for almnt in bamfile:
   if almnt.aligned:
      almnt.iv.length = fragmentsize
      coverage[ almnt.iv ] += 1

import csv
genes = HTSeq.GenomicArray( ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16",
"chr17","chr18","chr19","chr20","chr21","chr22","chrX","chrY" ],typecode = "O" ,stranded=True)
for (chrom ,start,end,strand ) in \ csv.reader( open ("Poly.csv"),delimiter = "\t" ):
    iv = HTSeq.GenomicInterval( chrom, int(start -1000),int(start + 1000),strand )

profile = numpy.zeros( 2*halfwinwidth, dtype='i' )      
for p in tsspos:
   window = HTSeq.GenomicInterval( p.chrom, p.pos - halfwinwidth, p.pos + halfwinwidth, "." )
   wincvg = numpy.fromiter( coverage[window], dtype='i', count=2*halfwinwidth )
   if p.strand == "+":
      profile += wincvg
   else:
      profile += wincvg[::-1]
pyplot.plot( numpy.arange( -halfwinwidth, halfwinwidth ), profile )  
pyplot.show()
ADD REPLY
0
Entering edit mode

You must have either left something out or changed something. The following does nothing:

genes = HTSeq.GenomicArray( ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16",
"chr17","chr18","chr19","chr20","chr21","chr22","chrX","chrY" ],typecode = "O" ,stranded=True)
for (chrom ,start,end,strand ) in \ csv.reader( open ("Poly.csv"),delimiter = "\t" ):
    iv = HTSeq.GenomicInterval( chrom, int(start -1000),int(start + 1000),strand )

Also, tsspos is never defined, so profile will just be a bunch of zeroes.

ADD REPLY
0
Entering edit mode

Thank you,Actually I want to make my TSS annotation file to produce formats as following:

1. by using HTSeq.GenomicArray function to produce a format like this:

{'chr2': {'.': <ChromVector object, chr2:[0,Inf)/+, step>},
 'chr1': {'.': <ChromVector object, chr1:[0,Inf)/+, step>}}

Below is an example which I imitated from

http://www-huber.embl.de/users/anders/HTSeq/doc/features.html#HTSeq.GenomicFeature.attr

>>> ga = HTSeq.GenomicArray( [ "chr1", "chr2" ], stranded=False )
>>> ga.chrom_vectors
{'chr2': {'.': <ChromVector object, chr2:[0,Inf)/+, step>},
 'chr1': {'.': <ChromVector object, chr1:[0,Inf)/+, step>}}

2. Then I can use HTSeq.GenomicInterval function to produce like this <GenomicInterval object 'chr3', [123203,127245), strand '+'>. Eventually I can continue to run the scripts to make the plot.

I don't know if you could understand what I said.

ADD REPLY
0
Entering edit mode

Yes, I understand that, I was pointing out that you never actually used genes or iv after setting them. And you just write over iv every time rather than appending to it.

ADD REPLY
0
Entering edit mode

Thank you. genes and iv which I set had made some errors, so I can't continue, I have the bed file,so if I follow the instructions as what the guides told,I could succeed. But I was wrong.

ADD REPLY
0
Entering edit mode

I've tried to make similar figures, try Genomation R pacakge

ADD REPLY
0
Entering edit mode

Thank you. And these problems had confused me a lot, I will try.

ADD REPLY
0
Entering edit mode
10.0 years ago
chemcehn ▴ 210

You can try HOMER. It is doing the task you want. The important thing is that you need to have your data in some standard format such as e.g. BED file, as was pointed out by the others.

ADD COMMENT
0
Entering edit mode

Thank you! At this moment I haven't solved this problem ,I will have a try.

ADD REPLY

Login before adding your answer.

Traffic: 3414 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6