I am a python developer, currently working on a project that requires knowledge of biostatistics, specifically human genomics data from the 1000genomes project. I would like to understand more about the various data formats eg vcf, bam, fasta etc and how information can be extracted from them and analysed. A lot of this data is already stored quite efficiently in Google BigQuery, so knowledge of SQL and BigQuery is a plus. I am looking for someone, who can help explain these concepts to me. I live in central London and I would prefer to do this in person. If this is not possible, I am also open to using Skype.
Thanks
Sounds kind of interesting to me, if you post your email I might drop you a message.
My email is iwegbue@gmail.com
There is a bioinformatics meetup group in London, next meeting is this Thursday (but I won't be able to come this time)
Thanks Giovanni. Where and what time are the meetups in London?
https://www.meetup.com/Bioinformatics-London
Thanks. I already joined. See you on Feb 23rd.
I would argue that BigQuery and HDFS are good file systems for genomics data, but unfortunately few NGS analysis tools support them, so there is currently not much advantage from using them.