Find overlap between bam and gff
2
0
Entering edit mode
6.4 years ago

Hello I have a .bam with RNA-seq data, a .gff with the regions I would like to study and another .gff with repeatmasked positions.

I would like to get a file where I have the regions from my .gff which overlap (completely and not) with at least X reads from my .bam and with no overlap within the repeatmasked positions.

Any tips?

Many thanks in advance!

RNA-Seq • 1.9k views
ADD COMMENT
2
Entering edit mode
6.4 years ago
h.mon 35k

Use bedtools subtract to get gff of interest minus gff repeats, then use featureCounts or bedtools coverage using the resulting gff to count reads mapping to the remaining features.

ADD COMMENT
0
Entering edit mode

Thank you I will give it a try!

ADD REPLY
2
Entering edit mode
6.4 years ago

Convert to BED via convert2bed helper scripts:

$ gff2bed < annotations.gff > annotations.bed
$ gff2bed < rmsk.gff > rmsk.bed
$ bam2bed < reads.bam > reads.bed

If you want at least X reads that overlap annotations that do not overlap repeatmasked regions:

$ X=1234
$ bedmap --count --echo --delim '\t' annotations.bed reads.bed | awk -vX=${X} '$1 >= X' | cut -f2- | bedops -n 1 - rmsk.bed > answer.bed

(Replace X=1234 with whatever threshold you want.)

The file answer will contain annotations that meet your read threshold and which do not overlap repeatmasked regions.

You could instead do conversion, mapping, and filtering with the following one-liner, which avoids making intermediate files and so will be even faster than the usual BEDOPS speedup:

$ gff2bed < annotations.gff | bedmap --count --echo --delim '\t' - <(bam2bed < reads.bam) | awk -vX=${X} '$1 >= X' | cut -f2- | bedops -n 1 - <(gff2bed < rmsk.gff) > answer.bed
ADD COMMENT
0
Entering edit mode

I will try this one as well, so I can compare both, thank you very much

ADD REPLY

Login before adding your answer.

Traffic: 1362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6