Hello
I used HTSeq-Count for counting the reads that overlap with introns and exon. I also need to count the reads that overlap with 5' or 3' splice sites not junctions. I want to count the reads that overlap 5 base pairs with intron and 5 base pairs with exon in 5' or 3' splice site. For example, if start site of my intron (5'ss) is nucleotide 20 on chromosome X, I want to count the reads that overlap from nucleotide 15 up to 24 (10 nucelotides length). I created a GTF file that have these coordinates X 16 24 5'SS, and I think what I should do is asking HTSeq-count to count 5'SS reads with minimum overlap of 10 nucleotides. However, HTSeq-count does not have option to set minimum overlapping bases. I started to use featureCount but it gives me unexpected results. When I set minimum overlap bases in featurecount to 11 or 12, that is bigger than 10 (length of 5'ss), it still reports overlapping reads. Is there a way to count the reads that overlap with few base pairs upstream and downstream of splice sites?
This is probably one of those cases where
bedtools intersect
is more useful. I think it has a convenient option for minimum amount of overlap.Thank you Devon, bedtools intersect -wo returns the number of overlapping bases between reads and features. My data is paired end and featureCount/HTSeq-Count has option to deal with paired-end reads and count read pairs only once for a given feature. However, bedtools intersect -wo will count paired end reads twice for 5'ss and 3'ss. In this case I can not compare read coverage that I did get for exons, introns and junctions (from featureCount) with 5'ss and 3'ss.. I am studying splicing and I need to know read coverage over 5'ss and 3'ss for some of my analysis.
You might have to write something to meet your exact needs. A combination of pysam and pybedtools might make things a bit simpler.
Thanks Devon,
I found what I was doing wrong. I should set featureCount to deal with my paired-end data as single-end data. In this case, featureCount finds single reads that have 10bp overlap with my coordinates.
This might be useful for those who might have the same question.