I want to find the overlapping coordinate from query.gff3 to annotWS240.gff3 using following command.
intersectBed -a query.gff3 -b annotWS240.gff3 -wb -wa >OUT_gff3
I want to extract exact coordinate position from annotWS240
which overlap with query.gff3
. However, I am getting the some extra coordinate eg: sequence AAB48626_GHR-10017@H6
has a coordinate 1834890 1835393
, but I am getting the features of coordinate 1834883 1835439
of Transcript C53H9.1
.
(Please see the below file).
- How can I get the exact coordinate features such as
1834890 1835393
fromC53H9.1
? - Is there any tools available to extract the fasta sequence using the current
OUT_gff3
file? I want mature transcript (without introns).
Thanks for your valuable suggestions.
OUT_gff3 file
I ePCR AAB48626_GHR-10017@H6 1834890 1835393 . + . I WormBase CDS 1835092 1835229 . + 0 ID=CDS:C53H9.1;Parent=Transcript:C53H9.1
I ePCR AAB48626_GHR-10017@H6 1834890 1835393 . + . I WormBase CDS 1835278 1835394 . + 0 ID=CDS:C53H9.1;Parent=Transcript:C53H9.1
I ePCR AAB48626_GHR-10017@H6 1834890 1835393 . + . I WormBase intron 1835045 1835091 . + . Parent=Transcript:C53H9.1;Note=Confirmed_EST yk491b8.5 %3B Confirmed_cDNA U89308 %3B Confirmed_EST OSTF036F4_1 %3B Confirmed_EST OSTF036F4_1 %3B
I ePCR AAB48626_GHR-10017@H6 1834890 1835393 . + . I WormBase exon 1835092 1835229 . + . Parent=Transcript:C53H9.1
I ePCR AAB48626_GHR-10017@H6 1834890 1835393 . + . I WormBase intron 1835230 1835277 . + . Parent=Transcript:C53H9.1;Note=Confirmed_EST yk491b8.5 %3B Confirmed_cDNA U89308 %3B Confirmed_EST OSTF036F4_1 %3B Confirmed_EST OSTF036F4_1 %3B
I ePCR AAB48626_GHR-10017@H6 1834890 1835393 . + . I WormBase exon 1835278 1835439 . + . Parent=Transcript:C53H9.1
I ePCR AAB48626_GHR-10017@H6 1834890 1835393 . + . I WormBase gene 1834883 1835439 . + . ID=Gene:WBGene00004441;Name=WBGene00004441;locus=rpl-27;sequence_name=C53H9.1;biotype=protein_coding
I ePCR AAB48626_GHR-10017@H6 1834890 1835393 . + . I WormBase mRNA 1834883 1835439 . + . ID=Transcript:C53H9.1;Parent=Gene:WBGene00004441;Name=C53H9.1;wormpep=WP:CE19381;locus=rpl-27
I ePCR AAB48626_GHR-10017@H6 1834890 1835393 . + . I WormBase exon 1834883 1835044 . + . Parent=Transcript:C53H9.1
I ePCR AAB48626_GHR-10017@H6 1834890 1835393 . + . I WormBase CDS 1834889 1835044 . + 0 ID=CDS:C53H9.1;Parent=Transcript:C53H9.1;Name=C53H9.1;wormpep=WP:CE19381;locus=rpl-27
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase gene 8407802 8410804 . - . ID=Gene:WBGene00003014;Name=WBGene00003014;locus=lin-28;sequence_name=F02E9.2;biotype=protein_coding
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase mRNA 8407802 8409976 . - . ID=Transcript:F02E9.2b;Parent=Gene:WBGene00003014;Name=F02E9.2b;wormpep=WP:CE24880;locus=lin-28
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase mRNA 8407802 8410804 . - . ID=Transcript:F02E9.2a;Parent=Gene:WBGene00003014;Name=F02E9.2a;wormpep=WP:CE24879;locus=lin-28
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase exon 8407802 8408496 . - . Parent=Transcript:F02E9.2b
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase exon 8407802 8408496 . - . Parent=Transcript:F02E9.2a
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase CDS 8408161 8408496 . - 0 ID=CDS:F02E9.2a;Parent=Transcript:F02E9.2a;Name=F02E9.2a;wormpep=WP:CE24879;locus=lin-28
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase CDS 8409134 8409340 . - 0 ID=CDS:F02E9.2a;Parent=Transcript:F02E9.2a
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase CDS 8410663 8410803 . - 0 ID=CDS:F02E9.2a;Parent=Transcript:F02E9.2a
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase CDS 8408161 8408496 . - 0 ID=CDS:F02E9.2b;Parent=Transcript:F02E9.2b;Name=F02E9.2b;wormpep=WP:CE24880;locus=lin-28
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase CDS 8409134 8409340 . - 0 ID=CDS:F02E9.2b;Parent=Transcript:F02E9.2b
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase CDS 8409684 8409731 . - 0 ID=CDS:F02E9.2b;Parent=Transcript:F02E9.2b
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase intron 8408497 8409133 . - . Parent=Transcript:F02E9.2a;Note=Confirmed_EST yk1158b04.5 %3B Confirmed_cDNA U75912 %3B Confirmed_EST OSTR155H1_1 %3B Confirmed_EST OSTR155H1_1 %3B
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase intron 8408497 8409133 . - . Parent=Transcript:F02E9.2b;Note=Confirmed_EST yk1158b04.5 %3B Confirmed_cDNA U75912 %3B Confirmed_EST OSTR155H1_1 %3B Confirmed_EST OSTR155H1_1 %3B
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase exon 8409134 8409340 . - . Parent=Transcript:F02E9.2a
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase exon 8409134 8409340 . - . Parent=Transcript:F02E9.2b
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase intron 8409341 8409683 . - . Parent=Transcript:F02E9.2b;Note=Confirmed_EST yk117g6.5 %3B Confirmed_EST OSTR073F8_1 %3B
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase intron 8409341 8410662 . - . Parent=Transcript:F02E9.2a;Note=Confirmed_EST yk1030a12.5 %3B Confirmed_cDNA U75912 %3B Confirmed_EST OSTF155H1_1 %3B Confirmed_EST OSTF155H1_1 %3B
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase exon 8409684 8409976 . - . Parent=Transcript:F02E9.2b
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase five_prime_UTR 8409732 8409976 . - . Parent=Transcript:F02E9.2b
I ePCR AAB49759_GHR-11053@F05 8408162 8410802 . + . I WormBase exon 8410663 8410804 . - . Parent=Transcript:F02E9.2a
What do you meant by "exact overlapping"? That a feature in
B
has to fit entirely withinA
?Yes, I want same coordinate boundary for both A and B. Basically, I want to extract the sequence of mature mRNA that fall between the coordinate of A GFF3.
You can set the minimum overlap proportion to one with
-f 1.0