Download genomes within a given GC content interval
1
Hey guys,
Does anyone have a clue on how to download only complete genomes with a given GC content from NCBI? Let's say, download all complete genomes that have a GC content from 40 to 50.
Thank you!
Assembly
genome
sequence
• 1.2k views
You can find genome reports for various organisms from NCBI here .
Let us get the prokaryotic genome report .
If you parse this file you can get those genomes where GC% is between 40 and 50:
$ awk -F '\t' '{if ($8 >= 40 && $8 <= 50) print $1 ,"\t",$21 }' prokaryotes.txt | head -5
Yersinia pestis CO92 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/009/065/GCA_000009065.1_ASM906v1
Tropheryma whipplei str. Twist ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/007/485/GCA_000007485.1_ASM748v1
Actinobacillus pleuropneumoniae serovar 5b str. L20 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/015/885/GCA_000015885.1_ASM1588v1
Chlamydia pneumoniae CWL029 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/008/745/GCA_000008745.1_ASM874v1
Vibrio vulnificus ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/215/135/GCA_002215135.1_ASM221513v1
In each of those directories you can find a *.fna.gz
file with the genome sequence.
This variation should get you all the way to a downloadable URLs:
$ awk -F '/' '{print $temp "/"$10 "_genomic.fna.gz"}' < ( awk -F '\t' '{if ($8 >= 40 && $8 <= 50) print $21 }' prokaryotes.txt; temp= $0 ) | head -5
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/009/065/GCA_000009065.1_ASM906v1/GCA_000009065.1_ASM906v1_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/007/485/GCA_000007485.1_ASM748v1/GCA_000007485.1_ASM748v1_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/015/885/GCA_000015885.1_ASM1588v1/GCA_000015885.1_ASM1588v1_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/008/745/GCA_000008745.1_ASM874v1/GCA_000008745.1_ASM874v1_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/215/135/GCA_002215135.1_ASM221513v1/GCA_002215135.1_ASM221513v1_genomic.fna.gz
Login before adding your answer.
Traffic: 1897 users visited in the last hour
Thanks, really appreciate that!