Extract list of gene coordinates from gff file
0
0
Entering edit mode
4.5 years ago
the_cowa ▴ 40

I have a list of genes and I need coordinates of those genes from the gff file.

I tried with

grep -wFf gene_list sample.gff

but it is taking too much time to respond (size of gff file is 20GB). Is there any other way to extract coordinates ?

gene gff awk grep python • 3.2k views
ADD COMMENT
1
Entering edit mode

Try to make your regex as specific as possible. E.g. grep GSBRNA2T00155995001 sample.gtf will be slightly slower than grep 'gene_id \"GSBRNA2T00155995001' sample.gtf. How much improvement you can gain from this depends on the structure of your gtf file.

ADD REPLY
0
Entering edit mode

If @Pierre's answer worked for you in this: Bed file grepping from the list have you tried to use it here? BTW, programs written in python etc are not likely to be faster than a system utility like grep for extracting data.

ADD REPLY
0
Entering edit mode

I tried with join but that is also too slow

ADD REPLY
0
Entering edit mode

Break your gff file in several pieces and then do the search.

ADD REPLY

Login before adding your answer.

Traffic: 1643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6