Getting the intersection of two dbSNP lists
2
0
Entering edit mode
9.9 years ago
devenvyas ▴ 760

I have two lists of rs ids. One for an old chip I have data from (Illumina Human CNV370), and one for a new chip that I will be running sooner or later (Affymetrix Axiom Human Origins 1). I want to find out what overlap that they have.

I was wondering, how could I go about figuring out how many dbSNP rsids are common between the two? I have tried Excel, but that was a dumb idea as it exceeded the amount of memory the program could use. Any suggestions on how to do this? Thanks!

snp • 2.2k views
ADD COMMENT
1
Entering edit mode
9.9 years ago

The following will count how many SNPs are in both files some_snps.txt and other_snps.txt:

$ grep -Ff some_snps.txt other_snps.txt | wc -l

The -F flag will enforce a full match of SNP IDs between files.

ADD COMMENT
0
Entering edit mode

Thanks! I'll try it out sometime tomorrow (when I have access to terminal). So to be clear, this will give me the number of rsids that are found both lists (so the rsids found in only one list will not be counted), right?

Is there anyway I can get an output listing specifically which rsids were found in both of the lists?

ADD REPLY
1
Entering edit mode

You need to read the manual on grep, use 'man grep'. It's quite long, so I'll summarize. grep -f will search for all entries in the some_snps.txt among the given file other_snps.txt, and report hits. pipe to wc (word count) dash L will count the lines reported. The reason these nuances are important is that if your search list has "rs123" it will cause grep to find it among lines "rs1234" and "rs123100", seriously inflating the results. There's another flag you can give grep, capital F, to force 'fixed' search of just exactly what was given in the input list.

ADD REPLY
0
Entering edit mode

If you want the list, take out the | wc -l portion that counts the number of lines coming out of the grep statement. And thanks to Karl for pointing out the missing flag in my answer.

ADD REPLY
1
Entering edit mode
9.9 years ago
Floris Brenk ★ 1.0k

Another option is comm (when using linux terminal). First sort both lists in terminal

sort listA.txt > sorted_listA.txt
sort listB.txt > sorted_listB.txt

then

comm -12 sorted_listA.txt sorted_listB.txt > common_snps.txt
ADD COMMENT

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6