I am trying to compare the locus of a homologous gene of a set of embl files. I am wondering if there is any existing tool which inputs a set of protein_ids and genbank files and outputs a diagram of genes +/-3 of the protein_ids :
I'm not sure of what you really want (what is the content of the Genbank files ? ). Here is a simple script that query the UCSC mysql server for the gene NOTCH2 , find the 3 transcripts on the left/right and transforms the XML/MYSQL output to SVG. ucsc-sql2svg.xsl is available at: https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/ucsc/ucsc-sql2svg.xsl
MYSQL="mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -N "
${MYSQL} -e 'select K.chrom,K.txStart,K.txEnd from knownGene as K, kgXref as X where K.name=X.kgId and X.geneSymbol="NOTCH2" limit 1' |\
awk '{
printf("select name from knownGene where chrom=\"%s\" and txEnd<%s order by txEnd desc limit 3;\n",$1,$2);
printf("select name from knownGene where chrom=\"%s\" and NOT(txStart>=%s or txEnd<%s);\n",$1,$3,$2);
printf("select name from knownGene where chrom=\"%s\" and txStart>%s order by txStart asc limit 3;\n",$1,$3);}' |\
${MYSQL} | sort | uniq |\
awk 'BEGIN {printf("select * from knownGene where name in(\"xxx\"");} {printf(",\"%s\"",$0);} END {printf(");\n");}' |\
${MYSQL} -X |\
xsltproc ucsc-sql2svg.xsl - > ouput.svg
Thank you. The exact term would be synteny.