make a different color for a given amino acid
1
1
Entering edit mode
5.8 years ago
Learner ▴ 280

I have a xls file which I want to color some strings differently

My xlsx file is like below lets call it sample.xlsx

>sp|Q96B97|SH3K1_HUMAN SH3 domain-containing kinase-binding protein 1 OS=Homo sapiens OX=9606 GN=SH3KBP1 PE=1 SV=2
MVEAIVEFDYQAQHDDELTISVGEIITNIRKEDGGWW
>sp|Q9UBS4|DJB11_HUMAN DnaJ homolog subfamily B member 11 OS=Homo sapiens OX=9606 GN=DNAJB11 PE=1 SV=1
KLALQLHPDRNPDDPQAQEKFQDLGAAYEVLSDSEKRKQYD
>sp|P61916|NPC2_HUMAN NPC intracellular cholesterol transporter 2 OS=Homo sapiens OX=9606 GN=NPC2 PE=1 SV=1
CQLSKGQSYSVNVTFTSNIQSKSSKAVVHGILMGVP

I want to read it and then save it when I color some letters. I want to color D in a yellow color

>sp|Q96B97|SH3K1_HUMAN SH3 domain-containing kinase-binding protein 1 OS=Homo sapiens OX=9606 GN=SH3KBP1 PE=1 SV=2
MVEAIVEF**D**YQAQH**DD**ELTISVGEIITNIRKE**D**GGWW
>sp|Q9UBS4|DJB11_HUMAN DnaJ homolog subfamily B member 11 OS=Homo sapiens OX=9606 GN=DNAJB11 PE=1 SV=1
KLALQLHPDRNP**DD**PQAQEKFQ**D**LGAAYEVLS**D**SEKRKQY**D**
>sp|P61916|NPC2_HUMAN NPC intracellular cholesterol transporter 2 OS=Homo sapiens OX=9606 GN=NPC2 PE=1 SV=1
CQLSKGQSYSVNVTFTSNIQSKSSKAVVHGILMGVP

Now I can read the xls and print them

#!/usr/bin/perl

use strict;
use warnings;
use Excel::Reader::XLSX;

    my $reader   = Excel::Reader::XLSX->new();
    my $workbook = $reader->read_file( 'sample.xlsx' );
    if ( !defined $workbook ) {
        die $reader->error(), "\n";
    }
    for my $worksheet ( $workbook->worksheets() ) {
        my $sheetname = $worksheet->name();
        print "Sheet = $sheetname\n";
        while ( my $row = $worksheet->next_row() ) {
            while ( my $cell = $row->next_cell() ) {
                my $row   = $cell->row();
                my $col   = $cell->col();
                my $value = $cell->value();
                print "  Cell ($row, $col) = $value\n";
            }
        }
    }

#Sheet = Sheet1
#  Cell (0, 0) = >sp|Q96B97|SH3K1_HUMAN SH3 domain-containing kinase-binding protein 1 OS=Homo sapiens OX=9606 GN=SH3KBP1 PE=1 SV=2
#  Cell (1, 0) = MVEAIVEFDYQAQHDDELTISVGEIITNIRKEDGGWW
#  Cell (2, 0) = >sp|Q9UBS4|DJB11_HUMAN DnaJ homolog subfamily B member 11 OS=Homo sapiens OX=9606 GN=DNAJB11 PE=1 SV=1
#  Cell (3, 0) = KLALQLHPDRNPDDPQAQEKFQDLGAAYEVLSDSEKRKQYD
#  Cell (4, 0) = >sp|P61916|NPC2_HUMAN NPC intracellular cholesterol transporter 2 OS=Homo sapiens OX=9606 GN=NPC2 PE=1 SV=1
#  Cell (5, 0) = CQLSKGQSYSVNVTFTSNIQSKSSKAVVHGILMGVP

Now here I want to focus on the Cell(1,0) and Cell (3, 0) and Cell (5, 0)

perl • 2.3k views
ADD COMMENT
0
Entering edit mode

I'm confused, are you dealing with fasta or xlsx? If you want to color in the terminal you can print in color, see https://stackoverflow.com/questions/287871/print-in-terminal-with-colors for instance.

ADD REPLY
0
Entering edit mode

@Asaf I converted the Fasta to xlsx , it is because I am doing some other analysis. No I am not interested in coloring them in terminal. I want to save them with another color in xlsx

ADD REPLY
0
Entering edit mode

would an R-based solution also work for you? The openxlsx package is great for formatting, see their exhaustive set of examples.

My guess is your code may look something along those lines (taken from the documentation linked above):

wb <- createWorkbook()
addWorksheet(wb, "cellIs")

# add data without styling
writeData(wb, "cellIs", -5:5)
writeData(wb, "cellIs", LETTERS[1:11], startCol=2)

# styling
negStyle <- createStyle(fontColour = "#9C0006", bgFill = "#FFC7CE") # define the colors
conditionalFormatting(wb, "cellIs", cols=1, rows=1:11, rule="!=0", style = negStyle) # define the condition for applying the style you just defined. Here the rule is that there shouldn't be a zero, in your case you may want to require letters
ADD REPLY
0
Entering edit mode

irrespective of this, it may actually be easier to write a VBA script, like this one here or here. That way you won't have to deal with reading in and writing out etc.

I also thought this solution described here sounded nifty:

IF you had a consistent word you were looking for, you could create another column (and hide if needed) and use the formula =Find("Text",CellName) command to find a specific text in another cell. It would return a number of where that word starts. You could then conditional format based on if you received a number back or if it gave you a zero, the word is not there.

ADD REPLY
0
Entering edit mode

@Friederike R could be a good choice, can you put more info there? a better script , it has no start no end ;-)

ADD REPLY
0
Entering edit mode

well, the start would be something like:

library(openxslx)
indata <- read.xlsx(xlsxFile, sheet = 1, startRow = 1, colNames = TRUE, rowNames = FALSE)

I'm actually not 100% sure if that would allow you to specify part of a string within a cell to be colored because I have a feeling that openxlsx mostly reproduces the formula capabilities of Excel (which do not seem to allow for conditional formatting of parts of strings within the same cell). I would, in fact, suggest to look into an Excel-based solution to this unless you have compelling reasons not to use those.

ADD REPLY
0
Entering edit mode

@Friederike this package is not even available for unix!!! use library("xlsx") if you can please

ADD REPLY
0
Entering edit mode

sorry, you will need to install the library first, of course:

install.packages("openxlsx")

Works fine on my unix server

ADD REPLY
0
Entering edit mode

@Friederike I just ran your code! then what ? it is doing nothing ? what are you trying to do with R exactly ? I am a bit confused

ADD REPLY
0
Entering edit mode

Please, please, don't use Excel.

ADD REPLY
0
Entering edit mode

@jrj.healey ok, lets do it in text , what is your solution ?

ADD REPLY
0
Entering edit mode

what do you want to achieve with the coloring?

ADD REPLY
0
Entering edit mode

@Friederike part of a code that I am writing

ADD REPLY
0
Entering edit mode

I can give you a solution, but it won’t be in Perl... (see answer below).

ADD REPLY
4
Entering edit mode
5.8 years ago
Joe 21k

Here’s a bash Function I keep handy:

quickcolor(){
 sed -e "s/A/$(tput setaf 1)A$(tput sgr0)/g" \
     -e "s/T/$(tput setaf 2)T$(tput sgr0)/g" \
     -e "s/C/$(tput setaf 3)C$(tput sgr0)/g" \
     -e "s/G/$(tput setaf 4)G$(tput sgr0)/g" $1 | cat
}

I can’t remember off then top of my head which nucleotide gets which colour, but you can change the number given to setaf to tweak it (it will need some modification to read a fasta/ignore headers)

Edit:

Sorry just noticed this thread was about amino acids. The same approach will work, you'll just have to modify the substitutions to suit.

Edit II:

Incorporated Asaf's addressing of headers, and added the ability to pass multiple fastas at once (e.g. $ quickcolor *.fasta)

quickcolor(){
 for i in "$@" ; do
  sed -e "/^>/!s/A/$(tput setaf 1)A$(tput sgr0)/g" \
      -e "/^>/!s/T/$(tput setaf 2)T$(tput sgr0)/g" \
      -e "/^>/!s/C/$(tput setaf 3)C$(tput sgr0)/g" \
      -e "/^>/!s/G/$(tput setaf 4)G$(tput sgr0)/g" "$i" | cat
 done
}
ADD COMMENT
1
Entering edit mode

I think this one should receive "post of the year".

ADD REPLY
1
Entering edit mode

Added to my .bashrc should be handy. And added a condition to avoid fasta headers:

quickcolor(){
 sed -e "/^>/!s/A/$(tput setaf 1)A$(tput sgr0)/g" \
 -e "/^>/!s/T/$(tput setaf 2)T$(tput sgr0)/g" \
 -e "/^>/!s/C/$(tput setaf 3)C$(tput sgr0)/g" \
 -e "/^>/!s/G/$(tput setaf 4)G$(tput sgr0)/g" $1 | cat
 }
ADD REPLY
0
Entering edit mode

That's the syntax I was looking for! ;)

ADD REPLY

Login before adding your answer.

Traffic: 1722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6