I have a bed file with peaks called on chipseq data with each line in the format,
1 15520461 15520630 samplex_peak_1
I want to count reads across the peaks called using htseq count. I am new to using HTSeq count, I have installed the package and I'm trying to edit the count.py script. I came across class HTSeq.BED_Reader, and I have edited the following part of the code:
bed = HTSeq.BED_Reader( bed_filename )
I = 0
try:
for f in bed:
if f.type == feature_type:
try:
feature_id = f.attr[ id_attribute ]
except KeyError:
raise ValueError, ( "Feature %s does not contain a '%s' attribute" %
( f.name, id_attribute ) )
if stranded != "no" and f.iv.strand == ".":
raise ValueError, ( "Feature %s at %s does not have strand information but you are "
"running htseq-count in stranded mode. Use '--stranded=no'." %
( f.name, f.iv ) )
features[ f.iv ] += feature_id
counts[ f.attr[ id_attribute ] ] = 0
I += 1
if I % 100000 == 0 and not quiet:
sys.stderr.write( "%d BED lines processed.\n" % I )
except:
sys.stderr.write( "Error occured when processing BED file (%s):\n" % bed.get_line_number_string() )
raise
if not quiet:
sys.stderr.write( "%d BED lines processed.\n" % I )
I'm not sure what parameters to pass for feature type and id attribute. Also, I'm not sure if other parts of the code need to be edited. Could any one help me with this? Thanks!