Entering edit mode
6.0 years ago
Hello, I need help in my Bioinformatics Project which is due in 4 days. I'm trying to see the overlapping SNPs of Alzheimer's Disease and Sex Hormones in R. But I can't seem to find a code that can show it. Anyone who knows how to do it and the codes for that? THANK YOU SO MUCH!!!
Try findOverlaps function from GenomicRanges package in R.
Hello, thanks for your response.
I had this data from AD patients that is in .txt format which includes the Chromosome Number, Position, Marker Name, Affected Allele, Non-Affected Allele, Beta, Standard Error, and Pvalue.
I apologize because I'm just a beginner in using R so I am clueless on what files and other stuff I would be needing.
Well, it is a real shame that you have this deadline. However, by providing absolutely no information on the format of your input data, how can any of us help you in any great way? Nicolas has made a reasonable attempt.
Hello, thanks for your response.
I had this data from AD patients that is in .txt format which includes the Chromosome Number, Position, Marker Name, Affected Allele, Non-Affected Allele, Beta, Standard Error, and Pvalue.
I apologize because I'm just a beginner in using R so I am clueless on what files and other stuff I would be needing.
Cool, but what is the part about 'sex hormones'? Do you have genomic co-ordinates (Chromosome, Start position, End position) for genes that encode the sex hormones?
Hello, yes I also have the file now for the testosterone and estradiol with Chromosome Number, Position, Marker Name, Major Allele, Minor Allele and Pvalue .
Okay, cool, then GenomicRanges is indeed the correct approach.
Create 2 separate GenomicRanges objects and then overlap them with the
findOverlaps()
function.If you even take a look at the example here, it may help: GRanges : setdiff and keep extra columns
Hello, I've tried doing it but I don't understand the IRanges. Is that the position number? Because the position number only gives 1 numerical value.
Thank you so much!
Try to think it through... what do you need to overlap with what? Try to visualise the regions.
If you only have a single co-ordinate, why not use that for both the start and end co-ordinate?
Assuming you have variants loaded into a dataframe in R in the format of Chrom Pos Ref Alt, I typically do this by making a new column in the dataframe where I concatenate those fields into a single text string. Then I make a new
list
out of such columns for all the datasets I am overlapping, and run it through R's Venn.Diagram package. You can get the exact overlaps withcalculate.overlaps
I believe, which is also used by the mainvenn.diagram
function. This assumes that the same SNP in multiple datasets will be represented by the exact same character string of "Chrom Pos Ref Alt" in each dataset; in real life this may not be the case so you should double check, its common for one set or the other to be/not be left-normalized, notation may be slightly different if you're looking at .vcf vs annotation tables, or for anything >1bp, etc. etc.