SNP Data Quality control
0
1
Entering edit mode
7.7 years ago
mms140130 ▴ 60

Hi,

I have a 905460 snp genotype data with 1096 patients, I have used the "HardyWeinberg" R package and removed the maf that is less than 0.05, the data reduced to 746907, then I applied the HWexact test using alpha as 0.001 and the data reduced to 384660.

Is this OK? or too much data was lost

I'm doing GWAS analysis and the data was provided by my advisor it is about Brest Cancer

Please help me and recommend what should I do

Thanks,

SNP R • 2.0k views
ADD COMMENT
1
Entering edit mode

Generally, I wouldn't focus "much" on the final number of SNPs, at the end one true positive SNP worths more than 100 need-2be-confirm SNP. Keeping in mind this is cancer, random mutations can happen at any stage. Moreover, it makes a difference whether you used SNP arrays to get these SNPs or you used SNP-calling methods. The type of outcome and the applied quality control will depend massively on that, thus, how much you expect to lose. Also, how many samples you had, control vs tumor? did you have relatives in your samples? If you are doing GWAS analysis you probably know the importance of taking these parameters into account.

ADD REPLY
1
Entering edit mode

As @Hasani pointed out, you shouldn't focus on the final number, because that tells you rather anything. For example, what happens if you divide it by the number of bases in the organism genome? Do you get a high mutation rate or one in the range?

Other discussion point are:

ADD REPLY
0
Entering edit mode

I recommend you to choose a more descriptive title. "help and recommendation please" doesn't tell anyone what you thread is about and might just be ignored because it's not very specific.

Also, your post doesn't contain enough information. You didn't state how the data was obtained or what the aim is of your analysis.

ADD REPLY
0
Entering edit mode

Done I have changed the title and add some information

ADD REPLY
0
Entering edit mode

I don't think @wouter wanted to know who gave you the data but rather where it comes from. For example: do you expect a lot of SNPs in a Breast cancer sample? (I would, but i'm not a breast cancer expert so this is just my guess).

ADD REPLY
0
Entering edit mode

My question also concerned the technology used to obtain the data. In addition, if this is a dataset on cancer you also should know if these mutations are germline mutations (e.g. from blood) or somatic mutations (from the tumor itself), and if the latter, which purity of the tumor you expect.

ADD REPLY
2
Entering edit mode

For GWAS errors in sequencing (including systematic) from my experience are way less important than errors in understanding your population structure upfront (and taking it into account) and inheritance pattern. If your population consists of say "normal mothers" and their "abnormal children" from ten different ethnic groups and phenotype is recessive, then running straightforward pink GWAS will find nothing, since allele is present in most mothers as well. Even if you do take this into account and forget about ethnicities you most likely will miss the right mutation or its p-value is not going to be low enough, because, for different ethnicities, different mutations stable within the population due to some compensatory positive effects can cause the same phenotype. Overall having 2 times fewer SNPs because of way too stringent filtration can cause more harm. I would focus on this only after preliminary analyses resulted in a few SNPs that can be confirmed or at least explained and only if my original ratio of SNPs per kilobase is way too high compared to expected (since this might rase question during the review process of your paper).

ADD REPLY
0
Entering edit mode

Thank you all the issue is I'm new to genetics and I'm trying to understand how to analyze such data

ADD REPLY
2
Entering edit mode

Great. Welcome to bioinformatics! And thank you for an interesting question that might help others too. You can accept one of the answers in order to show others that your question is solved, also you can bookmark any answer or your question to see it in bookmarks for future reference.

ADD REPLY
0
Entering edit mode

Since all reactions here were posted as comment they can't be accepted as the resolving answer of this question. I'm in doubt which comment here would be a satisfying answer, which could then be moved and accepted...

ADD REPLY

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6