TCGA SNP data
1
0
Entering edit mode
6.3 years ago

Dear all

In one of my project, I have to use the SNPs from TCGA (PAAD) and convert them into plink format and use for further analysis. There are many issues , that I faced

  1. I was not able to capture all the sites while convering .maf to .vcf. Although I used exactly same genome version as mentioned in .maf build reference genome information.
  2. I worked on rest of sites and convert .maf to .vcf files and processed in PLINK. Now issues were lots of missing data at individual level , end up with no outcome.

Queries :

  1. Which data (for SNPs or Mutation) I should exactly start to work on the tumour and normal samples profiles in PAAD ? Is it .maf or Copy number data or GWAS ? I found many datatypes files http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/PAAD/20160128/ Can anyone suggest here ?

I will appreciate all the suggestions

Thank you

Archana

TCGA maf2vcf SNPs PAAD • 2.0k views
ADD COMMENT
0
Entering edit mode
6.3 years ago

Hello,

You do not have to convert MAF to VCF for the purposes of input to PLINK. The MAF format was an 'unfortunate' development.

Read the PLINK documentation for creating a custom PED and MAP file, and then you will be able to create your PLINK dataset. You already have all of the information that you need in the MAF file.

Kevin

ADD COMMENT
0
Entering edit mode

Dear kevin

In one of project , converted vcf to plink and performed downstream analysis. Therefore I thought of following the same strategy for TCGA as well . For "custom PED and MAP", ya i will check. Thanks for your suggestion

Archana

ADD REPLY
0
Entering edit mode

Here is information on PED and MAP

By the way, if you still want to use MAF -> VCF -> PLINK, then you should create yor on custom FAM file, and then specify this in every PLINK command with the --FAM flag.

When converting from VCF -> PLINK, there is no wa for plink to know what are your phenotypes.

ADD REPLY
0
Entering edit mode

Hello

My problem is during conversion from maf to vcf, loosing many SNP sites or data information via by use of maf2vcf.pl, which is not normal. I tried to fixed it, but not succeeded. Now I will follow your suggestion and will create custom PED and MAP. I already created .fam for dataset.

Thanks again

Archana

ADD REPLY
0
Entering edit mode

Could you send information on some of the variants that are being filtered out? Also, can you link me to the specific MAF file on the GDC that you are using?

ADD REPLY

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6