Hey everyone,
I have a dataframe with the my selected exon cordinates(dataframe=my exons) and a gtf dataframe (dataframe=gtf). I am trying to first find my exons from the gft and then once found trying to extract the previous exon from the gft as output. i have given an example below.
dataframe my exons
start end exon_num width gene geneid_exon_num_start_end
61178981 61179171 2 191 AHSA3 AHSA3 -2-61178981-61179171
243613671 243613739 7 69 AKT3 AKT3 -7-243613671-243613739
36952886 36953296 3 411 AATF AATF -3-36952886-36953296
dataframe gtf
start end exon_num width gene geneid_exon_num_start_end
61177472 61177618 1 147 AHSA3 AHSA3 -1-61177472-61177618
61178668 61179171 1 504 AHSA3 AHSA3 -1-61178668-61179171
61185526 61185618 1 93 AHSA3 AHSA3 -1-61185526-61185618
61178981 61179171 2 191 AHSA3 AHSA3 -2-61178981-61179171
36950214 36950405 2 192 AATF AATF -2-36950214-36950405
37056601 37056858 2 258 AATF AATF -2-37056601-37056858
36952886 36953296 3 411 AATF AATF -3-36952886-36953296
19958014 19958571 4 558 AATF AATF -4-19958014-19958571
dataframe required output
start end exon_num width gene geneid_exon_num_start_end
61177472 61177618 1 147 AHSA3 AHSA3 -1-61177472-61177618
36950214 36950405 2 192 AATF AATF -2-36950214-36950405
I am using this in order to find my exons from the gtf, once found take the row above to get the previous exon.
previous_exon=gft[which(gtf$geneid_exon_num_start_end %in% myexons$geneid_exon_num_start_end)+c(-1),]
if you see there are multiple previous exons with different coordinates, so i want the row in which the end coordinate of the previous exon is smaller than the start of my exon (withing the same gene)(see the required output). I have a huge dataframe so cant do one by one , so need help to keep this track and get the output.
OP input and expected output are confusing. Please post appropriate example input data and matching output. In addition to exon number, gene (gene symbol) also must be considered.
It is indeed a bit confusing. Is this what you want?
Create a unique key for each region:
Find the row indices of your exons in the GTF, and also the row inices less 1 index:
Then subset GTF to select the previous exon:
@Kevin Blighe i am corrected the input dataframes again, realized there was a mistake. I did made a unique eye i.e geneid_exon_num_start_end and then found the match between exon and gtf by using gft[which(gtf$geneid_exon_num_start_end %in% myexons$geneid_exon_num_start_end)
Now when i have the matched row in the gtf, want to get the row within the gene where the end cordinate is smaller than the matched start cordinate.
@Kevin Blighe can you kindly check the corrected input and output, an help me with the solution
Thanks for the update. How about this? - note that I use GenomicRanges, in this case, in which case I had to create a dummy chromosome ID.
Now get what you want:
@cpad0112 i have corrected the input and output now.
np and thanks