Question

Data retrieval icgc

1

Entering edit mode

7.8 years ago

anshupa.vssut ▴ 50

Is there a command line tool or r package to efficiently retrieve datasets from ICGC?

icgc r-package • 3.7k views

ADD COMMENT • link updated 5.9 years ago by dodausp ▴ 190 • written 7.8 years ago by anshupa.vssut ▴ 50

0

Entering edit mode

Hello, I want to retrieve Simple somatic mutation data in vcf format from ICGC. Did you find any solution for retrieving data from ICGC?

ADD REPLY • link 7.0 years ago by Vasu ▴ 790

2

Entering edit mode

See if this helps: How to download a whole ICGC release of processed data?

ADD REPLY • link 7.0 years ago by GenoMax 147k

0

Entering edit mode

Thank you. I also found this in ICGC [http://icgc-data-parser.readthedocs.io/en/master/icgc-ssm-file.html]

ADD REPLY • link 7.0 years ago by Vasu ▴ 790

0

Entering edit mode

I downloaded "simple_somatic_mutation.aggregated.vcf.gz" which contain an aggregated of the information of all simple somatic mutations found across all patients in all cancer projects found in ICGC. But from this I only need mutation data of a particular project.

This is how it looks:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       1000000 MU88749506      T       .       .       .       CONSEQUENCE=.;OCCURRENCE=NKTL-SG|23|23|1.00000;affected_donors=23;mutation=T>T;project_count=1;studies=.;tested_donors=12198
1       100000022       MU39532371      C       T       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=SKCA-BR|1|80|0.01250;affected_donors=1;mutation=C>T;project_count=1;studies=.;tested_donors=12198
1       100000049       MU87095619      TA      T       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=MALY-DE|1|241|0.00415;affected_donors=1;mutation=A>-;project_count=1;studies=.;tested_donors=12198
1       100000110       MU82202760      G       A       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=LICA-FR|2|249|0.00803;affected_donors=2;mutation=G>A;project_count=1;studies=.;tested_donors=12198
1       100000128       MU85052896      A       C       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=MALY-DE|1|241|0.00415;affected_donors=1;mutation=A>C;project_count=1;studies=.;tested_donors=12198
1       10000015        MU91785757      A       G       .       .       CONSEQUENCE=NMNAT1|ENSG00000173614|+|NMNAT1-001|ENST00000377205||upstream_gene_variant||,LZIC|ENSG00000162441|1|LZIC-005|ENST00000377213||intron_variant||,LZIC|ENSG00000162441|1|LZIC-001|ENST00000377223||intron_variant||,LZIC|ENSG00000162441|1|LZIC-201|ENST00000400903||intron_variant||,NMNAT1|ENSG00000173614|+|NMNAT1-002|ENST00000403197||upstream_gene_variant||,RP11-84A14.4|ENSG00000228150|+|RP11-84A14.4-001|ENST00000445884||upstream_gene_variant||,NMNAT1|ENSG00000173614|+|NMNAT1-005|ENST00000462686||upstream_gene_variant||,LZIC|ENSG00000162441|1|LZIC-004|ENST00000488540||upstream_gene_variant||,NMNAT1|ENSG00000173614|+|NMNAT1-004|ENST00000492735||upstream_gene_variant||,LZIC|ENSG00000162441|1|LZIC-202|ENST00000541052||intron_variant||;OCCURRENCE=BOCA-UK|1|130|0.00769;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198

I only need mutation data of "OCCURRENCE=BRCA-EU". How can I extract that?

ADD REPLY • link 7.0 years ago by Vasu ▴ 790

1

Entering edit mode

Start with grep "OCCURRENCE=BRCA-EU" file.vcf.

ADD REPLY • link 7.0 years ago by GenoMax 147k

0

Entering edit mode

I'm not getting the column names when I give that way. Do I need to give any specific options to get the column names?

ADD REPLY • link 7.0 years ago by Vasu ▴ 790

1

Entering edit mode

Try grep -e "#CHROM" -e "OCCURRENCE=BRCA-EU" file.vcf

ADD REPLY • link 7.0 years ago by GenoMax 147k

0

Entering edit mode

This works. Thank you.

ADD REPLY • link 7.0 years ago by Vasu ▴ 790

score 2 · Answer 1 · 2018-12-18

I am not sure if it is still a topic that people come after, or if the ICGC portal has been updated after the last thread here. However, I did come across your question, @anshupa.vssut, when looking exactly for a way to retrieve the data from ICGC.

So, for those who don't mind downloading the data from the portal, one can do it directly from here: https://dcc.icgc.org/releases/release_27/Projects

I found it very useful and straight-forward, and hope it to be helpful to others as well. And of course, thank you for raising this topic! (: