Entering edit mode
5.3 years ago
rebecca08238
▴
20
Hello! I am working on Nephrotic syndrome patients. I have to check SNPs which are link with Nephrotic syndrome. study Involves 2 groups: Control, Nephrotic syndrome
Work done till now:
We used DNA Ampliseq illumina Library kit.
Source:Genomic DNA
Let me tell you in brief what I did after getting raw data from Illumina Miseq. We used samtool pipeline
- Reference genome used: hg19
- 1st step: Mapping was done by using BWA mem
- 2nd step: Conversion of Sam to Bam file (using a fixmate command)
- 3rd step Bam sorting& Bam indexing
- 4th step: Variant calling using bcftools (v1.9)
- We created 2 groups files: group1: Control 30 individuals and group 2: syndromic patients total 60 individuals in second group.We used BCF isec command, is this a correct way to generate vcf files or we have to make vcf files for each sample of a particular group and then combining all samples groupwise?
- So now we have two vcf files: group1.vcf & group2.vcf, I used SnpEff for annotation of these two groups’ files.
- I observed that some of the SNPs are present in the control group and syndromic group too, so should I remove those SNPs which are common in both groups using bedtools subtract command? Please let me know that I am going in a proper way or not.
- Even I tried to create individual patients vcf files total 90 Samples (30 control and 60 syndromic. But after this I am stuck what to do. I used bcf isec command to combine all snp of 1 group into single vcf file but it’s taking only few SNPs my why it’s like that? Can you please help me which command I should use to combine all snp of one group.
- Can you please let me know If you had any idea that how we can say that this or that particular SNP responsible for particular disease.
Thank you !
Hi rebecca08238, thanks for the details, it is good to provide background to understand the question. I suggest you add the command for the problematic
isec
command and the command for the initial variant calling. The experiencedbcftools
folks will then probably see right away where the issue is.Why did you use hg19 ? It's an old, poor genome which has been superseded by both GRCh37 (added baits, better SNV calling) and GRCh38 (better baits/Alts), updated chromosomes. Many groups have seen improved SNV results using more up to date genomes. Might be a point for remapping in the future.
More details:
https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use
Yes, I am not sure that
bcftools isec
is the correct choice - each time that I usebcftools isec
, I am reminded of why I stopped using it previously.Whether or not to filter out variants also present in controls is your choice. If your hypothesis is that there is a single variant of high penetrance that causes Nephrotic Syndrome, then it may help to filter out any variant found in the controls. If your hypothesis is that the syndrome has complex pathophysiology and complex patterns of inheritance, then variants in controls are to be expected.
Is Nephrotic Syndrome not caused by variants in NPHS1 or NPHS2?
Previously, when I did this, I merged all samples into the same VCF and then imported the VCF to PLINK, where I generated association test results.
Technically, you can also have 2 VCFs for controls and patients, and then compare these by using a key, like
CHR:POS:REF:ALT
, and usingawk
or even grep (or R / Python).Thank you so much for your reply! yes NPHS1$NPHS2 are responsible for Nephrotic syndrome but there are some other genes also which are invloved in Nephrotic syndrome.
which pacakage we should use for SNP analysis in R?
What do you mean? - you have not really explained what you want to do (?). So far, it seems that you just want to filter variants based on their frequencies in controls and patients. You should use other programs like ANNOVAR and Ensembl VEP in order to annotated your variants for other things, like functionality predictors (CADD, FATHMM, PolyPhene2, etc.).
yes previously I asked about filter variants in controls and patients. I used samtool pipeline (on Linux), Now I want to ask you that Can we do snp analysis in R studio? and for that which package we should download ?