Hi all,
My boss has given me a task to do the following:
- Take fasta files, parse them using biopython
- Take the corresponding sequences and take the part of each sequence that is between two restriction sites (KpnI and BamHI).
- Then, I am to plot the sequences together on one plot, sorted by length, and highlight the bases that correspond
to a certain kind of amino acid sequence that these code for.
The end result is supposed to look like a pretty version of: (the parts in brackets are supposed to be the highlighted bases)
Seq 1 | ATCGGATC .... [ATCG .. ] ...
Seq 2 | ACCATC ... [ some more highlighted bases, not necessarily in the same position, or with the same length] ...
..
Seq p | Some more bases.
My boss would like this to be put together with python, preferably with matplotlib. I am a lowly statistician by training, and could probably crack something like this off in R, but am not as familiar with matplotlib.
By trying to look at some examples, I imagine I could try something like this plot,
http://matplotlib.org/examples/lines_bars_and_markers/marker_fillstyle_reference.html
but I'm unsure how to get started. Is there anyone that has come across a similar problem?
a. If your boss has given you a task, you should try something yourself before asking for help.
b. If you're asking for help, you should give us a lot more details, including the steps you've taken to solve the problem, so we know you're not taking a shortcut.
hahaha, sorry I hit enter too quickly.