I am working with NCBI2R. I have the SNP's generated bu plink from the Whole genome sequencing, however, I woule like to consider only the SNP's based on the exome sequencing. Greatly appreciated if some one could helo me.
I am working with NCBI2R. I have the SNP's generated bu plink from the Whole genome sequencing, however, I woule like to consider only the SNP's based on the exome sequencing. Greatly appreciated if some one could helo me.
Two options: One is the answer suggested by Tky, which you be to select only those SNPs that appear in both groups. The downside would if there is a SNP that appears in one group, but no member of the other group has it. That is still a relevant comparison.
The other option would be to take the targeted intervals from your exome sequencing and extract from both groups only the SNPs that fall within those regions. There would be several options to do this, and it would all depend on how you currently have your data in terms of format.
I don't know about working with plink encoded files but if you have VCFs of your SNP calls it is fairly trivial to extract only the segments of the VCF that fall within defined regions using a combination of bedtools and perhaps IntervalTree from bxpython. I have code that will do this based on genomic regions of interest, but I have never tried it before if you have an interval for every targeted exon.
I guess your question is how to filter out common/annotated SNPs. If that is the case, you may use ANNOVAR, check here.
I recommend that you phrase your question appropriately and pay attention to the typo errors (e.g. bu plink/helo me), perhaps you don't know, the moderators on our site are keen in closing questions :-)
Thanks. Really sorry for the typo errors. I used a new computer, not get used to the keyboard of this computer. Probably, I did not described my question in the right way. I tried to compare the SNP different from the two groups. Unfortunately, one of these two group is from the whole genome sequencing (group a) and the another one is from the exome sequencing (group b). To make these two groups comparable, I tried to compare the SNP's both from the exome sequencing. My question is that: is there a easy way, with which I could extract the SNP's based on the exome sequencing from the result from the whole genome sequencing (the results form group a)?
Now I am testing plink. I have two folder. one is about all the SNP's from patients and the other group are from the control group. I would like to compare these two groups. I tied plink --file mydata --assoc. However, how could I input the format of mydata?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Really thanks! The suggestions really helpful. For "select only those SNPs that appear in both groups", it seems not the best choice, since there should be some SNP's only appeared in one group and not in the other group, which also are interesting for us.
The other option would be to take the targeted intervals from your exome sequencing and extract from both groups only the SNPs that fall within those regions.
This seems the best choice. However, do you have any idea about the fast tools, which could found the region of the known SNP's belong to? Since I have more than 100 thousands SNP's.
I have VCF document, and will get more information about bedtools. Thanks a lot.
The other option would be to take the targeted intervals from your exome sequencing and extract from both groups only the SNPs that fall within those regions.
This seems the best choice. However, do you have any idea about the fasttools, which could found the region of the known SNP's belong to? Since I have more than 100 throunds SNP's.
I have VCF document, and will get more information about bedtools. Thanks a lot.
Probably the quickest is to write a small script that uses bedtools to parse the appropriate bed file. Use Tabix to index your VCF files and call tabix in a script to retrieve from the VCF file anything that overlaps your chromosomal regions, which will be the targeted exons.