Data retrieval icgc
1
1
Entering edit mode
7.8 years ago

Is there a command line tool or r package to efficiently retrieve datasets from ICGC?

icgc r-package • 3.7k views
ADD COMMENT
0
Entering edit mode

Hello, I want to retrieve Simple somatic mutation data in vcf format from ICGC. Did you find any solution for retrieving data from ICGC?

ADD REPLY
2
Entering edit mode
ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I downloaded "simple_somatic_mutation.aggregated.vcf.gz" which contain an aggregated of the information of all simple somatic mutations found across all patients in all cancer projects found in ICGC. But from this I only need mutation data of a particular project.

This is how it looks:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       1000000 MU88749506      T       .       .       .       CONSEQUENCE=.;OCCURRENCE=NKTL-SG|23|23|1.00000;affected_donors=23;mutation=T>T;project_count=1;studies=.;tested_donors=12198
1       100000022       MU39532371      C       T       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=SKCA-BR|1|80|0.01250;affected_donors=1;mutation=C>T;project_count=1;studies=.;tested_donors=12198
1       100000049       MU87095619      TA      T       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=MALY-DE|1|241|0.00415;affected_donors=1;mutation=A>-;project_count=1;studies=.;tested_donors=12198
1       100000110       MU82202760      G       A       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=LICA-FR|2|249|0.00803;affected_donors=2;mutation=G>A;project_count=1;studies=.;tested_donors=12198
1       100000128       MU85052896      A       C       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=MALY-DE|1|241|0.00415;affected_donors=1;mutation=A>C;project_count=1;studies=.;tested_donors=12198
1       10000015        MU91785757      A       G       .       .       CONSEQUENCE=NMNAT1|ENSG00000173614|+|NMNAT1-001|ENST00000377205||upstream_gene_variant||,LZIC|ENSG00000162441|1|LZIC-005|ENST00000377213||intron_variant||,LZIC|ENSG00000162441|1|LZIC-001|ENST00000377223||intron_variant||,LZIC|ENSG00000162441|1|LZIC-201|ENST00000400903||intron_variant||,NMNAT1|ENSG00000173614|+|NMNAT1-002|ENST00000403197||upstream_gene_variant||,RP11-84A14.4|ENSG00000228150|+|RP11-84A14.4-001|ENST00000445884||upstream_gene_variant||,NMNAT1|ENSG00000173614|+|NMNAT1-005|ENST00000462686||upstream_gene_variant||,LZIC|ENSG00000162441|1|LZIC-004|ENST00000488540||upstream_gene_variant||,NMNAT1|ENSG00000173614|+|NMNAT1-004|ENST00000492735||upstream_gene_variant||,LZIC|ENSG00000162441|1|LZIC-202|ENST00000541052||intron_variant||;OCCURRENCE=BOCA-UK|1|130|0.00769;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198

I only need mutation data of "OCCURRENCE=BRCA-EU". How can I extract that?

ADD REPLY
1
Entering edit mode

Start with grep "OCCURRENCE=BRCA-EU" file.vcf.

ADD REPLY
0
Entering edit mode

I'm not getting the column names when I give that way. Do I need to give any specific options to get the column names?

ADD REPLY
1
Entering edit mode

Try grep -e "#CHROM" -e "OCCURRENCE=BRCA-EU" file.vcf

ADD REPLY
0
Entering edit mode

This works. Thank you.

ADD REPLY
2
Entering edit mode
5.9 years ago
dodausp ▴ 190

I am not sure if it is still a topic that people come after, or if the ICGC portal has been updated after the last thread here. However, I did come across your question, @anshupa.vssut, when looking exactly for a way to retrieve the data from ICGC.

So, for those who don't mind downloading the data from the portal, one can do it directly from here: https://dcc.icgc.org/releases/release_27/Projects

I found it very useful and straight-forward, and hope it to be helpful to others as well. And of course, thank you for raising this topic! (:

ADD COMMENT

Login before adding your answer.

Traffic: 2608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6