Extract intergenic coordinate from gff
2
2
Entering edit mode
8.3 years ago
Prasad ★ 1.6k

Hi,

I have a gff file only with gene feature in it.

NZ_CP009361.1   RefSeq  gene    544 1905    .   +   .   ID=gene0;Name=KQ76_RS00005;gbkey=Gene;gene_biotype=protein_coding;locus_tag=KQ76_RS00005;old_locus_tag=KQ76_00005
NZ_CP009361.1   RefSeq  gene    2183    3316    .   +   .   ID=gene1;Name=KQ76_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=KQ76_RS00010;old_locus_tag=KQ76_00010

I want to extract intergenic coordinate with the strand infromation. Output something like,

 NZ_CP009361.1 1906 2182 + KQ76_RS00005-KQ76_RS00010

I have written a code which extract it. i could not get a way to handle strand when one gene is on + strand and another is not - strand. Is there a tool/way through which i can get the desire result.

Note: I just want the coordinates, not sequences

Thanks

gff intergenic feature • 4.6k views
ADD COMMENT
0
Entering edit mode

I think the output format has a problem with requiring strand information, but gaps don't have a direction, they are just gaps. A gap doesn't become more of a '+ gap' just because the neighbors are on +. Therefore, I don't see a biological meaningful way of assigning a strand to a gap, if that was what you were asking.

ADD REPLY
0
Entering edit mode

I wanted the strand for intergenic region, just to know the neighboring genes are from same strand or from different strand

ADD REPLY
0
Entering edit mode

Do you want to do operon prediction? It is possible to extract the information of the gap neighbors after the gaps have been computed and then give it a '+', '-', or '.' depending on the strand of the enclosing genes. See help('nearest-methods') and use methods precede and follow to get the flanking genes for each gap, then iterate and check if strand precede == strand follow.

ADD REPLY
0
Entering edit mode

thank you for the suggestions. I will go through

ADD REPLY
1
Entering edit mode
8.3 years ago
Michael 55k

In R, you can use the function gaps (in packages IRanges and GenomicRanges) after importing the GFF file. See http://www.rdocumentation.org/packages/GenomicRanges/versions/1.24.1/topics/inter-range-methods

ADD COMMENT
0
Entering edit mode

thanks. i will check how it works. Does it take care of strand information (especially neighboring genes are on different strand )

ADD REPLY
1
Entering edit mode

Gaps have no sensible strand information, but all gaps will be reported by GRanges irrespective of the strand of neighboring genes and features in the input (+- or none), which is in my opinion the correct way . I am not sure how to interpret strand information otherwise. Do you want gaps only on + strand, even if a gene is overlapping them on the opposite strand?

ADD REPLY
0
Entering edit mode
8.3 years ago
michael.ante ★ 3.9k

You can use the bedtools with subtractBed to get the intervals by using the chrom.sizes table from e.g. the UCSC genome browser:

subtractBed -s -A hg38.chrom.sizes -B my_genes.gff > intergenic.bed

Afterwards, you may check the results of your script with the resulting file.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

this did not work for me. But got the intergenic regions from complementBed.

ADD REPLY

Login before adding your answer.

Traffic: 2469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6