check duplicates in two columns
0
0
Entering edit mode
9.4 years ago

Hi all!

I have two different files: a .map (from illumina genotyping with bead chip) and a .vcf (from NGS of a Pools of individuals). I'm interested in finding variations that are in both files, so I would have to compare for column 1: #CHROM and 4: POS (for .map) and column 1 #CHROM and 2: POS (for .vcf) to obtain some variations that are in common. I tried using awk but without success. Any suggestions will be very appreciated.

Greetings

Marco

ChIP-Seq sequencing SNP • 2.2k views
ADD COMMENT
2
Entering edit mode

On stackoverflow.com you will find "thousands" of questions related to this issue.

ADD REPLY
0
Entering edit mode

Yup, that's true, but not thousands :-P.

ADD REPLY
0
Entering edit mode

You win. Technically.

ADD REPLY
1
Entering edit mode

Can you post your awk command?

ADD REPLY
0
Entering edit mode

Thank you for the answers, my awk command is:

awk -F'\t' 'NR==FNR{c[$1$2]++;next};c[$1$4] > 0' file.vcf file.map

where $1 and $2 are #CHROM and POS in the .vcf file and $1 and $4 are #CHROM and POS for the .map file

ADD REPLY

Login before adding your answer.

Traffic: 1842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6