Hi. This is a casual question.
I'll have presentation about bioinformatics analysis against people not familiar this academic filed.
In the presentation, I'll demonstrate some kind of bioinformatics analysis in command line in about five minutes. Any idea what analysis is most impressive?
For example, from Bowtie2 mapping to IGV view, SNP detection.
could you please suggest me anything and everything!!
I would download the Human Genome and unzip as HUMAN_GENOME.fasta (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz)
Run
grep 'GATTACA' --color HUMAN_GENOME.fa
for a few seconds while they look at the pretty colours and the sheer amount of data being processed.Run
grep -o 'A' HUMAN_GENOME.fa | wc -l
. Explain how little blocks can be put together into pipelines that make prototyping easier.While that's running (takes about 5-7 min and should eventually print 845,903,867), open up a python interpreter and run
print open('HUMAN_GENOME.fa','rb').read().count('T')
This should take just under a minute (much faster than the grep), and explain that while building blocks are fun, a basic knowledge of programming can result in much faster programs/etc.
Personally I find things like mapping/etc are way too over the top for most biologists, since these are really abstract concepts to them. Plots are great - but only if they know what they're looking at. I've sat in far too many bioinformatics presentations (particularly at Uni) when the presenter wanted to impress us with things he/she found interesting, but it didn't spark anyone else's imagination. Great, we've aligned some DNA. So what?
Is the purpose of this to train them or impress them?
The purpose of this is suggesting to them how useful the analysis is in medicine or biology. I'm sorry ambiguous.
I would show an R session, e.g., from raw expression data counts to nice graphs and figures such as heatmaps or PCA. Or even to pathways or networks.
I second this. Not being biased but biologists in general are very drawn to colorful representation of data (like heatmaps). Having said that, it must be easy to digest in one go or else the interest might be lost.
Thanks b.nota and Amitm.
Colorful representation of figure (heatmaps or PCA) is good idea!
I'll find a befitting data for creating some vivid figure.
I'd recommend using the data and instructions from Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. That way, you can even give them the link if they want to reproduce it.
What about something related to the Ebola or the Zika outbreaks? e.g. this analysis