Sequence Logo For Different Alleles Or Generated From Sam/Bam
3
1
Entering edit mode
11.5 years ago

I have a SAM/BAM file and instead of displaying it in IGV or another browser, I would like to make a sequence logo out of it. My favourite software WebLogo does not support SAM/BAM file as an input (instead it supports CLUSTALW, FASTA, plain flatfile, MSF, NBRF, PIR, NEXUS and PHYLIP)

I would like to avoid parsing SAM file by myself and adding '-' when needed. Do you have any idea if there is already some software that could do such a thing? Thanks a lot.

The second question is about different alleles. I would like to visualize sequences (reads) from each allele with another color. I have found many coloring schemes, but none dividing sequences into groups together with corresponding coloring. Thanks a lot.

sam • 6.6k views
ADD COMMENT
3
Entering edit mode
11.5 years ago

I quickly wrote a java tool sam4weblogo: https://github.com/lindenb/jvarkit#sam4weblogo using the picard library.

enter image description here

$ java -jar dist/sam4weblogo.jar I=path/to/samtools-0.1.18/examples/sorted.bam R=seq1:80-110 2> /dev/null 

>B7_593:4:106:316:452/1
TGTTG--------------------------
>B7_593:4:106:316:452a/1
TGTTG--------------------------
>B7_593:4:106:316:452b/1
TGTTG--------------------------
>B7_589:8:113:968:19/2
TGGGG--------------------------
>B7_589:8:113:968:19a/2
TGGGG--------------------------
>B7_589:8:113:968:19b/2
TGGGG--------------------------
>EAS54_65:3:321:311:983/1
TGTGGG-------------------------
>EAS54_65:3:321:311:983a/1
TGTGGG-------------------------
>EAS54_65:3:321:311:983b/1
TGTGGG-------------------------
>B7_591:6:155:12:674/2
TGTGGGGG-----------------------
>B7_591:6:155:12:674a/2
TGTGGGGG-----------------------
>B7_591:6:155:12:674b/2
TGTGGGGG-----------------------
>EAS219_FC30151:7:51:1429:1043/2
TGTGGGGGGCGCCG-----------------
>EAS219_FC30151:7:51:1429:1043a/2
TGTGGGGGGCGCCG-----------------
>EAS219_FC30151:7:51:1429:1043b/2
TGTGGGGGGCGCCG-----------------
>B7_591:5:42:540:501/1
TGTGGGGGCCGCAGTG---------------
>EAS192_3:5:223:142:410/1
TGGGGGGGGCGCAGT----------------
>B7_591:5:42:540:501a/1
TGTGGGGGCCGCAGTG---------------
>EAS192_3:5:223:142:410a/1
TGGGGGGGGCGCAGT----------------
>B7_591:5:42:540:501b/1
TGTGGGGGCCGCAGTG---------------
>EAS192_3:5:223:142:410b/1
TGGGGGGGGCGCAGT----------------
ADD COMMENT
1
Entering edit mode

Pierre, you are definitely Tolkien of bioinformatics! Thanks.

ADD REPLY
0
Entering edit mode

Please help me in running sam4weblogo

I downloaded jvarkit-master.zip and unzip how to use it

ADD REPLY
0
Entering edit mode

you need to compile it; Read the manual please.

ADD REPLY
0
Entering edit mode

I am having problems installing berkeleydb.jar, could you please give me short advice how to install it? (I am ubuntu user)

ADD REPLY
0
Entering edit mode

you don't need bdb, if you have filled the path to the picard jars. Just type:

ant sam4weblogo

should work.

ADD REPLY
1
Entering edit mode
11.5 years ago

There are many tools to convert a SAM file to FASTA:

You should be able to use one of these to convert your alignment to FASTA, and then make use of WebLogo.

ADD COMMENT
1
Entering edit mode

Almost there:) sam2fasta does what I want except dealing with insertions (I would like to gap reference when insertions are introduced). SamToFastq does not seem to do the alignment.

ADD REPLY
0
Entering edit mode

Thank you! I will check how they deal with insertions tomorrow, I am quite curious about that.

ADD REPLY
0
Entering edit mode
11.5 years ago

To solve this another way, without custom parsing (scripts do all the work for you):

  1. Grab the FASTA files for your genomic build-of-interest (e.g., grab and extract all the gzipped FASTA files for hg19 here).
  2. Convert your SAM to BED
  3. Convert the resulting BED to FASTA, incorporating the whole-genome FASTA file you made from Step 1
  4. Run the FASTA file through WebLogo to generate the sequence logo
ADD COMMENT
0
Entering edit mode

Thank you, interesting. Unfortunately I have no reference for my species.

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6