Hello,
my final goal is to replace, for each line of the GQPDOMB_impute_copie.vcf file, the info column (column 8 of the GQPDOMB_impute_copie.vcf file) by the contents of columns 2 and 3 of the formatting.txt file:
That's my idea:
For each line of the GQPDOMB_impute_copie.vcf file
do
the variable rs retrieves the rsID of the current line in column 3 of the GQPDOMB_impute_copie.vcf file
The variable VAR1 searches for the content of the variable rs in the formatting.txt file for each line
if the variable is not empty (the content of rs for this line has been found in the formatting.txt file)
so
the ra variable recovers the contents of columns 2 and 3 of the formatting.txt file
The content of column 8 of the current row is replaced by the content of the variable ra (which contains the information contained in columns 2 and 3 of formatting.txt)
fi
done
GQPDOMB_impute_copie.vcf :
##fileformat=VCFv4.3
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AERMQNK-paris-8400326-P6-recipient_AERMQNK-paris-8400326-P6-recipient ....
1 783071 rs142849724 C T . PASS TYPED;RefPanelAF=0.018571;AN=80;AC=5;INFO=1 GT 0|0 0|0 1|0 0|0 1|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0|1 0|0 0|0 0|0
1 783186 rs141989890 G C . PASS RefPanelAF=0.000323375;AN=80;AC=0;INFO=1 GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
1 783632 rs193023236 G A . PASS RefPanelAF=0.00040037;AN=80;AC=0;INFO=1 GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
formating.txt :
rs142849724;ENSG00000228794;ENST00000624927|ENST00000623808|ENST00000445118|ENST00000448975|ENST00000610067|ENST00000608189|ENST00000609139|ENST00000449005|ENST00000416570|ENST00000623070|ENST00000609009|ENST00000622921
rs141989890;ENSG00000228794;ENST00000624927|ENST00000623808|ENST00000445118|ENST00000448975|ENST00000610067|ENST00000608189|ENST00000609139|ENST00000449005|ENST00000416570|ENST00000623070|ENST00000609009|ENST00000622921
rs193023236;ENSG00000228794;ENST00000624927|ENST00000623808|ENST00000445118|ENST00000448975|ENST00000610067|ENST00000608189|ENST00000609139|ENST00000449005|ENST00000416570|ENST00000623070|ENST00000609009|ENST00000622921
After a lot of research on the internet, here is the code I can offer you:
#!/bin/bash
while read line
do
rs=$(awk -F '\t' '{print $3}' GQPDOMB_impute_copie.vcf) #recovery rsID
VAR1=$(grep "${rs}" formating.txt) #we check if the rsID of the current line is found in the file formatting.txt
if [ -n "$VAR1" ] ; #if the rsID of the current line has been found
then
ra=$(grep "${rs}" formating.txt | awk -F ';' '{print $2,";",$3}') #recovery of the contents of columns 2 and 3 of the formating.txt file in the same vaiable
awk -F '\t' -v t="\"$ra\"" '{$8=t; print }' OFS='\t' GQPDOMB_impute_copie.vcf #replace the content of the column 8 (info) with the content of the prévious var
fi
done < GQPDOMB_impute_copie.vcf
However, I think the program does not read the vcf file line by line and does not succeed in creating the variable VAR1. Here is the error that was returned to me:
./script-info.sh: line 16: /usr/bin/grep: Argument list too long
./script-info.sh: line 16: /usr/bin/grep: Argument list too long
./script-info.sh: line 16: /usr/bin/grep: Argument list too long
How to succeed in creating this script and if possible as efficiently as possible?
I thank you for that.