Entering edit mode
6.3 years ago
Omics data mining
▴
260
Dear all
In one of my project, I have to use the SNPs from TCGA (PAAD) and convert them into plink format and use for further analysis. There are many issues , that I faced
- I was not able to capture all the sites while convering .maf to .vcf. Although I used exactly same genome version as mentioned in .maf build reference genome information.
- I worked on rest of sites and convert .maf to .vcf files and processed in PLINK. Now issues were lots of missing data at individual level , end up with no outcome.
Queries :
- Which data (for SNPs or Mutation) I should exactly start to work on the tumour and normal samples profiles in PAAD ? Is it .maf or Copy number data or GWAS ? I found many datatypes files http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/PAAD/20160128/ Can anyone suggest here ?
I will appreciate all the suggestions
Thank you
Archana
Dear kevin
In one of project , converted vcf to plink and performed downstream analysis. Therefore I thought of following the same strategy for TCGA as well . For "custom PED and MAP", ya i will check. Thanks for your suggestion
Archana
Here is information on PED and MAP
By the way, if you still want to use MAF -> VCF -> PLINK, then you should create yor on custom FAM file, and then specify this in every PLINK command with the
--FAM
flag.When converting from VCF -> PLINK, there is no wa for plink to know what are your phenotypes.
Hello
My problem is during conversion from maf to vcf, loosing many SNP sites or data information via by use of maf2vcf.pl, which is not normal. I tried to fixed it, but not succeeded. Now I will follow your suggestion and will create custom PED and MAP. I already created .fam for dataset.
Thanks again
Archana
Could you send information on some of the variants that are being filtered out? Also, can you link me to the specific MAF file on the GDC that you are using?