Hi,
Is there an easy way to display a sequence like "ATCC" as "red blue green green" colors on a figure, when red = A, blue = T, and green = C? I am thinking something like a heatmap in R if I can assign color to discrete values. Thanks.
Hi,
Is there an easy way to display a sequence like "ATCC" as "red blue green green" colors on a figure, when red = A, blue = T, and green = C? I am thinking something like a heatmap in R if I can assign color to discrete values. Thanks.
Ugly & quick HTML hack:
transform it to HTML, e.g. through sed:
sed 's/[ACTG]/&<\/span>/gi' seq.color > seq.html
Attach a stylesheet to it, e.g.:
<head>
<style TYPE="text/css">
.A {
color: red;
background: red;
font-family: monospace;
font-size: 40px;
}
.C {
color: green;
background: green;
font-family: monospace;
font-size: 40px;
}
.G {
color: orange;
background: orange;
font-family: monospace;
font-size: 40px;
}
.T {
color: blue;
background: blue;
font-family: monospace;
font-size: 40px;
}
</style>
</head>
The result file should look like:
<head>
<style TYPE="text/css">
.A {
color: red;
background: red;
font-family: monospace;
font-size: 40px;
}
.C {
color: green;
background: green;
font-family: monospace;
font-size: 40px;
}
.G {
color: orange;
background: orange;
font-family: monospace;
font-size: 40px;
}
.T {
color: blue;
background: blue;
font-family: monospace;
font-size: 40px;
}
</style>
</head>
<span class="A">A</span><span class="G">G</span><span class="G">G</span><span class="C">C</span><span class="T">T</span><span class="T">T</span><span class="T">T</span><span class="A">A</span><span class="G">G</span><span class="t">t</span><span class="g">g</span><span class="c">c</span><span class="A">a</span>
Open in a web browser
Thanks. Sorry I didn't make it clearer. This is what I meant to look like. http://realtamortgage.com/gfx/colors.gif
Well... I'm not sure what you mean by "on a figure", but in the past, I've done this with HTML like Giovanni said or (slightly dumber) with a script that puts a <font color=\"#FF0000\"></font>
around all A's or whatever.
In R, you can use the text()
command to put text on a plot or just use pch='A'
and color='red'
to make points on a plot red A's, for example.
seq<-"ATCGTACG"
seqlist<-strsplit(seq,"")
cols<-c('red','blue','green','purple')
plot(1:length(seqlist[[1]]),rep(1,times=length(seqlist[[1]])),pch=seqlist[[1]],col=cols[factor(seqlist[[1]])])
This would take some fiddling for a longer sequence, but you get the idea.
EDIT: after reading your comment above, it's actually easier.
seq<-"ATCGTACG"
seqlist<-strsplit(seq,"")
cols<-c('red','blue','green','purple')
image(matrix(as.numeric(factor(seqlist[[1]]))),col=cols)
A quick hack (not fully tested, but you'll get the idea): the following C program will generate a postscript file with the colored rectangles:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <ctype.h>
int main(int argc,char** argv)
{
int i,j,k,n;
double SIZE=500.0;
double side=0;
int c;
int len=0;
char* s=malloc(sizeof(char));
if(s==NULL) return EXIT_FAILURE;
while((c=fgetc(stdin))!=EOF)
{
if(isspace(c)) continue;
s=realloc(s,sizeof(char)*(len+2));
if(s==NULL)
{
fprintf(stderr,"Out of memory\n");
return EXIT_FAILURE;
}
s[len++]=c;
}
s[len]=0;
if(len==0) return EXIT_FAILURE;
n=ceil(sqrt(len));
side=SIZE/n;
k=0;
printf("%%!PS\n");
printf("/dside 100 def\n");
printf("/box { 2 dict begin /y exch def /x exch def "
"newpath "
"y dside mul x dside mul moveto "
"dside 0 rlineto "
"0 dside rlineto "
"dside -1 mul 0 rlineto "
"0 dside -1 mul rlineto "
"closepath "
"fill "
" end} bind def\n");
printf("/red { 1 0 0 setrgbcolor box } bind def\n");
printf("/green { 0 1 0 setrgbcolor box } bind def\n");
printf("/blue { 0 0 1 setrgbcolor box } bind def\n");
printf("/yellow { 1 0 1 setrgbcolor box } bind def\n");
printf("/black { 0 0 0 setrgbcolor box } bind def\n");
for(i=0;i< n && k<len;i++)
{
for(j=0;j<n && k<len;++j)
{
printf("%d %d",i,j);
switch(toupper(s[k++]))
{
case 'A': fputs(" red\n",stdout); break;
case 'T': fputs(" green\n",stdout); break;
case 'C': fputs(" yellow\n",stdout); break;
case 'G': fputs(" blue\n",stdout); break;
default: fputs(" black\n",stdout); break;
}
}
}
printf("showpage\n");
return 0;
}
Compilation:
gcc -o biostar12763 -Wall source.c -lm
Execution:
echo "ATAGCTAGCATCAGTCTAGCTTAGCTAGCGCNNACTAGCT" | ./biostar12763 > file.ps
ghostview file.ps ## or evince file.ps or... etc...
JalView is excellent for creating figures of proteins and nucleotides. Even if you do not have an alignment, you can still enter a single sequence. Lots of export options as well including wrapped text and export to a pdf.
I don't know how to make this with R, but I think you can open the sequences with mega or clustalX, in which the nucleotides are colored, and then get a screenshot.
This is pretty hacktastic, but I don't know a better way
library(ggplot2)
dna <- "ATAGCATCGACTAG"
bases <- unlist(strsplit(dna, ""))
col_scheme <- c("red", "green", "yellow", "blue")
names(col_scheme) <- c("A", "T" ,"C", "G")
p <- qplot(1:length(bases), 1, fill=col_scheme[bases])
p + geom_tile() + scale_fill_identity()
You should really think about if you want to use (simple) red and green in the same plot - ~8% of males can't tell the difference. Is there a good reason for colouring these bases but ignoring those people?
Dear,
If you are working on proteins, you can use I-PV just as shown in the link here.
You will need to make your sequence file in a txt editor, ms excel etc. Here is an example.
You can visit the main website for more information.
I hope this helps,
Good luck with your research,
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
what do you mean "On a figure" ?
What I meant was to display ATCC as colored squares in a row. Sort of like this figure. http://realtamortgage.com/gfx/colors.gif