How To Find The Snps Which Are Merged More Than Once In The Past?
2
1
Entering edit mode
13.4 years ago
Edi ▴ 120

Hi,

I downloaded the RsMergeArch from the NCBI. I want to find out the SNPs which are merged more than once out of all SNPs. I wrote the below SQL query:

insert into rsmergearch_filtered
SELECT rsmergearch.rsHigh,rsmergearch.rsLow
from rsmergearch
where rsmergearch.rsLow
in (select rsmergearch.rsHigh from rsmergearch);

But this didn't give the complete list of SNPs which are merged more than once where I double checked later. Please advice me on this regard.

Thanking in advance.

snp merge • 2.2k views
ADD COMMENT
2
Entering edit mode
13.4 years ago

to get the rsLow merged more than once:

create temporary table snp2(rsLow int PRIMARY KEY);

insert into snp2(rsLow)
select
 rsLow
from
 rsmergearch
group by
 rsLow
having
 count(*)>1

then use this table to find the rsHigh:

   select rsmergearch.* from snp2,rsmergearch where rsmergearch.rsLow=snp2.rsLow;

EDIT:

or using unix cmd line:

sort  -t '  '-k2,2 RsMergeArch.bcp > sortLow.txt
cut -d '    ' -f 2 sortLow.txt | uniq -d > twice.txt
join  -t '  ' -1 1 -2 2 twice.txt sortLow.txt
ADD COMMENT
0
Entering edit mode

Thank you for the reply. Now I got the correct output.

ADD REPLY
2
Entering edit mode
13.4 years ago

I'm not an expert on SQL queries, so I will suggest an alternative based on command line awk scripting.

if you take a look to the RsMergeArch table description you will find that apart of the rsHigh (the one that disappear, column 1) and rsLow (the one that stays, column 2) columns there's another one where the current rsId is reported (rsCurrent, column 7). therefore looking for SNPs that have been merged more than once is equivalent to look through the table for SNPs whose rsLow id do not correspond to the rsCurrent id.

having said that, and considering that you downloaded RsMergeArch.bcp.gz file, you can run the following line in order to retrieve those SNPs:

zcat RsMergeArch.bcp.gz | awk -F"\t" '$2!=$7 {print $1}' > multimerged.txt

if what you need is just the count of those SNPs, you could simply run this faster one:

zcat RsMergeArch.bcp.gz | awk -F"\t" '$2!=$7 {cnt++} END {print cnt}'

a simple "cat multimerged.txt | wc -l" after running the first command will report a number of 110214 SNPs that were merged more than once. of course this the same count that the second command would report.

ADD COMMENT
0
Entering edit mode

a simple "cat multimerged.txt | wc -l" after running the first command will report a number of 110214 SNPs that were merged more than once. of course this the same count that the second command would report.

ADD REPLY

Login before adding your answer.

Traffic: 2197 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6