How to count the reads that overlap with 3' splice sites or 5' splice sites?
0
0
Entering edit mode
8.5 years ago
vaslanzadeh ▴ 20

Hello

I used HTSeq-Count for counting the reads that overlap with introns and exon. I also need to count the reads that overlap with 5' or 3' splice sites not junctions. I want to count the reads that overlap 5 base pairs with intron and 5 base pairs with exon in 5' or 3' splice site. For example, if start site of my intron (5'ss) is nucleotide 20 on chromosome X, I want to count the reads that overlap from nucleotide 15 up to 24 (10 nucelotides length). I created a GTF file that have these coordinates X 16 24 5'SS, and I think what I should do is asking HTSeq-count to count 5'SS reads with minimum overlap of 10 nucleotides. However, HTSeq-count does not have option to set minimum overlapping bases. I started to use featureCount but it gives me unexpected results. When I set minimum overlap bases in featurecount to 11 or 12, that is bigger than 10 (length of 5'ss), it still reports overlapping reads. Is there a way to count the reads that overlap with few base pairs upstream and downstream of splice sites?

alignment RNA-Seq rna-seq • 3.2k views
ADD COMMENT
1
Entering edit mode

This is probably one of those cases where bedtools intersect is more useful. I think it has a convenient option for minimum amount of overlap.

ADD REPLY
0
Entering edit mode

Thank you Devon, bedtools intersect -wo returns the number of overlapping bases between reads and features. My data is paired end and featureCount/HTSeq-Count has option to deal with paired-end reads and count read pairs only once for a given feature. However, bedtools intersect -wo will count paired end reads twice for 5'ss and 3'ss. In this case I can not compare read coverage that I did get for exons, introns and junctions (from featureCount) with 5'ss and 3'ss.. I am studying splicing and I need to know read coverage over 5'ss and 3'ss for some of my analysis.

ADD REPLY
0
Entering edit mode

You might have to write something to meet your exact needs. A combination of pysam and pybedtools might make things a bit simpler.

ADD REPLY
1
Entering edit mode

Thanks Devon,

I found what I was doing wrong. I should set featureCount to deal with my paired-end data as single-end data. In this case, featureCount finds single reads that have 10bp overlap with my coordinates.

This might be useful for those who might have the same question.

ADD REPLY

Login before adding your answer.

Traffic: 2173 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6