Hi,
I am working with two very large vcf files (over 10 Gb each so copy pasting is too large and I want to include the function as part of a script for future studies) and need to replace the "ID" column variables in one of them in order to have matching IDs for merging. First, I removed all rows containing ## to have a simple matrix with no information liens. When I try replacing the column using awk (first by converting the vcf to a .txt file) (
awk 'FNR==NR{a[NR]=$3;next}{$3=a[FNR]}1' file2.txt file1.txt > output.txt
and then converting back to a vcf), it does not work. When I remove the first 3 columns of the vcf and convert to a .txt and try using a simple
paste file2.txt file1.txt > output.txt
(where file2.txt is the CHROM, POS and new ID columns) and converting back to a vcf, the contents are not put in the same row, but rather one row after the other. So, I tried the following command afterwards to try to merge every other row together, but it is not working either (
awk '{getline b;printf("%s %s\n",$0,b)}' output.txt > final.txt
). Any help would be appreciated.
unfortunately this is still giving me the same problem with the two files being on different lines rather than being together on the same line. Thanks though
uhh ??? .....
Ya, I can't figure out why it's doing that. I ended up doing it in R- took forever, but it worked