I am half way through my PhD and I am considering pursuing a career in bioinformatics. My experience in bioinformatics is limited: I started coding 3 years ago but very slowly and my PhD is half bioinformatics half laboratory based. I currently have experience in bash and R. I am also trying to learn Python.
I am trying to find out how best to improve and prepare my CV for when I apply for jobs after my PhD. Any recommendations as to which skills in particular I should focus on would be greatly appreciated.
Becoming adept in Python is very useful (one might say essential; you should be adept in some language (not R) and Python is a good choice). Becoming fluent with a wide variety of commonly-used bioinformatics software and data from different sequencing platforms is also a big plus. So, be able to take raw Illumina data and map it, call variants, look at the data in IGV; de-novo assemble the data and calculate coverage on the assembly; call peaks with Chip-SEQ; analyze differential expression with RNA-seq; taxonomically classify unknown contaminant reads; annotate variants with predicted effects of mutations on proteins; look for structural variations with long-read (PacBio/Nanopore) data; that sort of thing. It helps in interviews if you have personally done each of these on several different datasets so you know some of the nuances of them.
I agree that being able to use published tools, especially beyond!!! simply using the defaults is a big plus. A reasonable understanding of the theory behind the algorithms is also advisable to be able to adapt them to non-standard datasets. Still, I (also being a half-half student, far from being a proper programmer) prefer R much of Py or any other language, as Bioconductor offers a plethora of customizable packages that require a proper knowledge of R for efficient use. Beyond that, being able to write proper bash scripts to efficiently manipulate large files and automate repetitive tasks is a must. Experience in using a Linux cluster also helps, but that depends if your field requires large data handling.
Here are some resources I'm working through:
Bioinformatics tools/resources https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources
Online tutorial over bioinformatics which has a nice python section