Merge two SNP vcf files
1
1
Entering edit mode
6.1 years ago
fufuyou ▴ 110

File1

#CHROM  POS     ID      REF_Zv  ALT_lm                             
chr1A   219620  .       T       A
chr1A   219648  .       A       G
chr1A   219867  .       A       G

file2

#CHROM  POS     ID      REF_Zv  ALT_RV                             
chr1A   219457  .       C       T
chr1A   219670  .       A       G
chr1A   219867  .       A       C

File3

#CHROM  POS     ID      REF_Zv  ALT_lm ALT_RV                            
chr1A   219620  .       T       A    NA
chr1A   219648  .       A       G    NA
chr1A   219867  .       A       G    C
chr1A   219457  .       C       NA   T
chr1A   219670  .       A       NA   C

My command is

awk 'FNR==NR{a[$1,$2];next} {if(a[$1,$2]==""){a[$1,$2]=0};print $1,$2,$3,$4,$5, a[$4,$5]} ' file1 file2 > file3

However, I can not get the file3 which I want. Could you help me improve the command? Thanks, Fuyou

SNP vcf awk • 2.7k views
ADD COMMENT
1
Entering edit mode

Did you try VCF tools ? They have merge option. http://vcftools.sourceforge.net/perl_module.html

ADD REPLY
0
Entering edit mode

But my data has no SNP format. So vcf-merge does not work.

ADD REPLY
0
Entering edit mode

@Kevin has examples of the right tool to do this with in: Merging vcf files (intersection and union)

ADD REPLY
0
Entering edit mode

My SNP vcf files do not have other columns. such as "GT". Thanks, Fuyou

ADD REPLY
1
Entering edit mode

Then those are not vcf files and you make things harder by not using standardised file formats.

ADD REPLY
4
Entering edit mode
6.1 years ago
zx8754 12k

Using R merge:

# example files
file1 <- read.table(text = "#CHROM  POS     ID      REF_Zv  ALT_lm                             
chr1A   219620  .       T       A
chr1A   219648  .       A       G
chr1A   219867  .       A       G", header = TRUE, stringsAsFactors = FALSE,
                    comment.char = "")
file2 <- read.table(text = "#CHROM  POS     ID      REF_Zv  ALT_RV                             
chr1A   219457  .       C       T
chr1A   219670  .       A       G
chr1A   219867  .       A       C", header = TRUE, stringsAsFactors = FALSE,
                        comment.char = "")

merge(file1, file2, by.x = c("X.CHROM", "POS", "ID", "REF_Zv"), all = TRUE)
#   X.CHROM    POS ID REF_Zv ALT_lm ALT_RV
# 1   chr1A 219457  .      C   <NA>      T
# 2   chr1A 219620  .      T      A   <NA>
# 3   chr1A 219648  .      A      G   <NA>
# 4   chr1A 219670  .      A   <NA>      G
# 5   chr1A 219867  .      A      G      C
ADD COMMENT

Login before adding your answer.

Traffic: 2997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6