Exon parsing from bed file
1
1
Entering edit mode
6.9 years ago
1769mkc ★ 1.2k

This is my bed file for all the exon coordinates ..small subset so i want to take out all the exons that of a given gene let say i have gene in chr 1 which starts from chr1 11868 12227 so i want to parse out all the exons that comes in between 11868 12227

this is my small subset

cat exon.bed | head -10
chr1    11868   12227   +   exon
chr1    11871   12227   +   exon
chr1    11873   12227   +   exon
chr1    12009   12057   +   exon
chr1    12178   12227   +   exon
chr1    12594   12721   +   exon
chr1    12612   12697   +   exon
chr1    12612   12721   +   exon
chr1    12612   12721   +   exon
chr1    12974   13052   +   exon

How do i parse out , i use mostly R and bit of shell script but I m not sure if i can use R , may a few lines of perl or shell script can help me solve my problem.

Any help or suggestion would be highly appreciated

rna-seq • 2.3k views
ADD COMMENT
2
Entering edit mode

how about just using awk ?

awk '($1=="chr1"  && int($2)>=11868 && int($3)<=12227 && $5=="exon")' input.bed

if you need a faster solution, query your file using tabix.

ADD REPLY
1
Entering edit mode

@Pierre thank you very much for the quick solution at least some start for me to think , the way you suggested what if I have to do for all the genes with their respective coordinates ,how do i do that, because some gene might have one exons and some might have multiple exon...I hope i am kind of making you understand my problem

ADD REPLY
0
Entering edit mode

You might also want to look at txdb packages in Bioconductor.

ADD REPLY
1
Entering edit mode

Take a look at the rtracklayer Bioconductor package and import. Then, after importing the bedfile, look at the Bioconductor GenomicRanges %over% method. These are big hammers for a small problem, but if you use R and are doing genomics, GenomicRanges can quickly become your best friend.

ADD REPLY
0
Entering edit mode

okay that sounds really cool , yes i mostly use R for all the genomics work I will try the library and let know

ADD REPLY
1
Entering edit mode

Hello krushnach80!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/3239/parse-out-exon-coordinates-from-bed-file-for-each-gene

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

@Pierre i regret that i posted in earlier but as I didn't get any response so i posted in both communities i would keep in mind not to repeat it

ADD REPLY
2
Entering edit mode

Oh you didn't get a response after 2 hours on a Sunday, that is indeed unreasonably long. Quite a lazy community indeed, next thing you know we'll have a personal life to take care of.

ADD REPLY
0
Entering edit mode

@ WouterDeCoster Im sorry for that i was talking about this question which i asked earlier realted to this which was kind of not specific

Parse out exon for divergent primer design

ADD REPLY
4
Entering edit mode
6.9 years ago

Via BEDOPS bedops -n and Unix I/O streams:

$ echo -e "chr1\t11868\t12227" | bedops -n 1 exon.bed - > answer.bed

Or, if you have your genes in a BED file called genes.bed:

$ bedops -n 1 exon.bed genes.bed > answer.bed

If you have your genes in some other format, like GFF or GTF, you can use gff2bed or gtf2bed, e.g.:

$ bedops -n 1 exon.bed <(gff2bed < genes.gff) > answer.bed

Or:

$ bedops -n 1 exon.bed <(gtf2bed < genes.gtf) > answer.bed

The file answer.bed will contain exons that do not overlap a gene annotation.

ADD COMMENT

Login before adding your answer.

Traffic: 1548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6