Is there a command line tool or r package to efficiently retrieve datasets from ICGC?
Is there a command line tool or r package to efficiently retrieve datasets from ICGC?
I am not sure if it is still a topic that people come after, or if the ICGC portal has been updated after the last thread here. However, I did come across your question, @anshupa.vssut, when looking exactly for a way to retrieve the data from ICGC.
So, for those who don't mind downloading the data from the portal, one can do it directly from here: https://dcc.icgc.org/releases/release_27/Projects
I found it very useful and straight-forward, and hope it to be helpful to others as well. And of course, thank you for raising this topic! (:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello, I want to retrieve Simple somatic mutation data in vcf format from ICGC. Did you find any solution for retrieving data from ICGC?
See if this helps: How to download a whole ICGC release of processed data?
Thank you. I also found this in ICGC [http://icgc-data-parser.readthedocs.io/en/master/icgc-ssm-file.html]
I downloaded "simple_somatic_mutation.aggregated.vcf.gz" which contain an aggregated of the information of all simple somatic mutations found across all patients in all cancer projects found in ICGC. But from this I only need mutation data of a particular project.
This is how it looks:
I only need mutation data of "OCCURRENCE=BRCA-EU". How can I extract that?
Start with
grep "OCCURRENCE=BRCA-EU" file.vcf
.I'm not getting the column names when I give that way. Do I need to give any specific options to get the column names?
Try
grep -e "#CHROM" -e "OCCURRENCE=BRCA-EU" file.vcf
This works. Thank you.