bedtool closest
1
0
Entering edit mode
3.6 years ago
CHINMAYA ▴ 10

If I have a reference BED file and second file containg ncRNA BED file.

I want to compute the distance of ncRNAs to the nearest protein-coding genes and remove gene in ncRNA located within 250 bp from protein-coding genes on the same strand.

Can anyone help?

bedtools • 1.1k views
ADD COMMENT
0
Entering edit mode

Side note: It's bedtools, not "bedtool".

ADD REPLY
1
Entering edit mode
3.6 years ago

If you can use the bedops kit, you can use closest-features to solve this:

  1. Separate reference and ncRNA files by strand and sort:

    $ awk -v FS="\t" -v OFS="\t" '($6 == "+")' reference.bed | sort-bed - > reference.for.bed
    $ awk -v FS="\t" -v OFS="\t" '($6 == "-")' reference.bed | sort-bed - > reference.rev.bed
    $ awk -v FS="\t" -v OFS="\t" '($6 == "+")' ncRNA.bed | sort-bed - > ncRNA.for.bed
    $ awk -v FS="\t" -v OFS="\t" '($6 == "-")' ncRNA.bed | sort-bed - > ncRNA.rev.bed
    
  2. Filter by positive distances for forward-strand elements. To confirm from your question, you want to remove elements that are within 250nt of the reference element's TSS (start position), so the distance reported from --dist --closest will be negative:

    $ closest-features --dist --closest reference.for.bed ncRNA.for.bed | awk -v FS="|" -v OFS="\t" '($3 < -250)' > filtered.for.bed
    
  3. Repeat for reverse-stranded elements. In this case, the threshold will be positive, because the TSS of reverse-stranded elements will be at the stop position:

    $ closest-features --dist --closest reference.rev.bed ncRNA.rev.bed | awk -v FS="|" -v OFS="\t" '($3 > 250)' > filtered.rev.bed
    
  4. Take the union of the filtered results:

    $ bedops --everything filtered.for.bed filtered.rev.bed > filtered.bed
    

References:

ADD COMMENT

Login before adding your answer.

Traffic: 2168 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6