Question

editing count.py script of HTSeq to work with bed file

0

Entering edit mode

8.9 years ago

Kssr ▴ 110

I have a bed file with peaks called on chipseq data with each line in the format,

1    15520461    15520630    samplex_peak_1

I want to count reads across the peaks called using htseq count. I am new to using HTSeq count, I have installed the package and I'm trying to edit the count.py script. I came across class HTSeq.BED_Reader, and I have edited the following part of the code:

 bed = HTSeq.BED_Reader( bed_filename )
   I = 0
   try:
      for f in bed:
         if f.type == feature_type:
            try:
               feature_id = f.attr[ id_attribute ]
            except KeyError:
               raise ValueError, ( "Feature %s does not contain a '%s' attribute" %
                  ( f.name, id_attribute ) )
            if stranded != "no" and f.iv.strand == ".":
               raise ValueError, ( "Feature %s at %s does not have strand information but you are "
                  "running htseq-count in stranded mode. Use '--stranded=no'." %
                  ( f.name, f.iv ) )
            features[ f.iv ] += feature_id
            counts[ f.attr[ id_attribute ] ] = 0
         I += 1
         if I % 100000 == 0 and not quiet:
            sys.stderr.write( "%d BED lines processed.\n" % I )
   except:
      sys.stderr.write( "Error occured when processing BED file (%s):\n" % bed.get_line_number_string() )
      raise

   if not quiet:
      sys.stderr.write( "%d BED lines processed.\n" % I )

I'm not sure what parameters to pass for feature type and id attribute. Also, I'm not sure if other parts of the code need to be edited. Could any one help me with this? Thanks!

htseq bed • 2.2k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 8.9 years ago by Kssr ▴ 110

Ram · Answer 1 · 2015-12-16

1

Entering edit mode

8.9 years ago

sunhanice ▴ 240

Agree with Istvan, it is easier to convert bed to gtf instead of changing the source code.

Please refer to this post, how to use a bed file with htseq-count

make sure the gtf file has 9 columns like the following,

scaffold00001    UNCC    exon    488375    489180    .    -    0    gene_id "101_g";

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 8.9 years ago by sunhanice ▴ 240

score 0 · Answer 2 · 2015-12-16

0

Entering edit mode

8.9 years ago

Istvan Albert 101k

I would suggest converting your BED file to GTF rather than editing the program. You'll get by much quicker.

The cufflinks suite has a program called gffread that can do that.

ADD COMMENT • link 8.9 years ago by Istvan Albert 101k