Retrieving features from a genbank files
1
I have downloaded a GenBank file from NCBI containing multiple sequences.
I want to convert this file into a table (data.frame) having column headings e.g. LOCUS, ACCESSION, FEATURES etc.
Can somebody recommend me any solution for it.
R
sequence
• 4.0k views
create a tab delimted table use XSLT:
e.g:
< xsl:stylesheet xmlns:xsl= "<a href=" < a= "" href= "http://www.w3.org/1999/XSL/Transform" rel= "nofollow" > http://www.w3.org/1999/XSL/Transform" " = "" rel= "nofollow" > http://www.w3.org/1999/XSL/Transform' version=' 1.1' >
< xsl:output method= "text" encoding= "UTF-8" />
< xsl:template match= "/" >
< xsl:apply-templates select= "GBSet" />
< /xsl:template>
< xsl:template match= "GBSet" >
< xsl:apply-templates select= "GBSeq" />
< /xsl:template>
< xsl:template match= "GBSeq" >
< xsl:for-each select= "GBSeq_feature-table/GBFeature" >
< xsl:value-of select= "../../GBSeq_locus" />
< xsl:text> < /xsl:text>
< xsl:value-of select= "../../GBSeq_primary-accession" />
< xsl:text> < /xsl:text>
< xsl:value-of select= "GBFeature_key" />
< xsl:text> < /xsl:text>
< xsl:value-of select= "GBFeature_location" />
< xsl:text>
< /xsl:text>
< /xsl:for-each>
< /xsl:template>
< /xsl:stylesheet>
e.g:
$ curl -s "https://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=NC_001664.2,AE001273&retmode=xml" | xsltproc --novalid transorm.xsl -
NC_001664 NC_001664 source 1.. 159322
NC_001664 NC_001664 repeat_region 1.. 8088
NC_001664 NC_001664 repeat_region 56.. 342
NC_001664 NC_001664 gene 501.. 6850
NC_001664 NC_001664 CDS join( 501.. 759,843.. 2653)
NC_001664 NC_001664 gene 4725.. 6850
NC_001664 NC_001664 CDS join( 4725.. 5028,5837.. 6720)
NC_001664 NC_001664 regulatory 6845.. 6850
NC_001664 NC_001664 repeat_region 7655.. 8008
NC_001664 NC_001664 misc_feature 8009.. 151234
( .. .)
Login before adding your answer.
Traffic: 1744 users visited in the last hour
Can this tutorial help you?
This was very useful tutorial. But I'm more interested in metadata, e.g. isolation source, location, Lat, Long, country, date etc.
i have written this python script which creates a .csv file, you can open in R to create a data frame https://github.com/dewshr/NCBI-Genbank-file-parser