Question

TCGA SNP data

0

Entering edit mode

6.3 years ago

Omics data mining ▴ 260

Dear all

In one of my project, I have to use the SNPs from TCGA (PAAD) and convert them into plink format and use for further analysis. There are many issues , that I faced

I was not able to capture all the sites while convering .maf to .vcf. Although I used exactly same genome version as mentioned in .maf build reference genome information.
I worked on rest of sites and convert .maf to .vcf files and processed in PLINK. Now issues were lots of missing data at individual level , end up with no outcome.

Queries :

Which data (for SNPs or Mutation) I should exactly start to work on the tumour and normal samples profiles in PAAD ? Is it .maf or Copy number data or GWAS ? I found many datatypes files http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/PAAD/20160128/ Can anyone suggest here ?

I will appreciate all the suggestions

Thank you

Archana

TCGA maf2vcf SNPs PAAD • 2.0k views

ADD COMMENT • link updated 6.3 years ago by Kevin Blighe 88k • written 6.3 years ago by Omics data mining ▴ 260

score 0 · Answer 1 · 2018-08-03

0

Entering edit mode

6.3 years ago

Kevin Blighe 88k

Hello,

You do not have to convert MAF to VCF for the purposes of input to PLINK. The MAF format was an 'unfortunate' development.

Read the PLINK documentation for creating a custom PED and MAP file, and then you will be able to create your PLINK dataset. You already have all of the information that you need in the MAF file.

Kevin

ADD COMMENT • link 6.3 years ago by Kevin Blighe 88k

0

Entering edit mode

Dear kevin

In one of project , converted vcf to plink and performed downstream analysis. Therefore I thought of following the same strategy for TCGA as well . For "custom PED and MAP", ya i will check. Thanks for your suggestion

Archana

ADD REPLY • link 6.3 years ago by Omics data mining ▴ 260

0

Entering edit mode

Here is information on PED and MAP

By the way, if you still want to use MAF -> VCF -> PLINK, then you should create yor on custom FAM file, and then specify this in every PLINK command with the --FAM flag.

When converting from VCF -> PLINK, there is no wa for plink to know what are your phenotypes.

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

0

Entering edit mode

Hello

My problem is during conversion from maf to vcf, loosing many SNP sites or data information via by use of maf2vcf.pl, which is not normal. I tried to fixed it, but not succeeded. Now I will follow your suggestion and will create custom PED and MAP. I already created .fam for dataset.

Thanks again

Archana

ADD REPLY • link 6.3 years ago by Omics data mining ▴ 260

0

Entering edit mode

Could you send information on some of the variants that are being filtered out? Also, can you link me to the specific MAF file on the GDC that you are using?

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k