Dear ALL
I have about 200 tab delimited and in first step I want merge them to have a comprehensive file, then want to summarize file to have another out put, so in summarize file just keep unique ID in 1st ,2nd and 3rd columns and for pos1 and pos2 add the number of number of lines with repetitive ID after dash symbol respectively but for other columns convert tab to / in each line and add the data of lines with repetitive ID after dash symbol respectively. I would greatly appreciate it if you kindly give me some feedback on this.
Thanks
annoPos gene chr pos1 pos2 ref alt genotype quality MQ ref-depth alt-depth
UTR3 Eucgr.A00214.1.v2.0,Eucgr.A00214.2.v2.0 Chr01 82411 82411 G T hom 84.5 60 0 4
UTR3 Eucgr.A00214.1.v2.0,Eucgr.A00214.2.v2.0 Chr01 82437 82437 A C hom 87.5 60 0 4
exonic Eucgr.A00214.1.v2.0,Eucgr.A00214.2.v2.0 Chr01 84195 84195 C T hom 183 60 0 17
exonic Eucgr.A00214.1.v2.0,Eucgr.A00214.2.v2.0 Chr01 84209 84209 G C hom 182 60 0 17
exonic Eucgr.A00214.1.v2.0 Chr01 84631 84631 C T het 118 60 5 7
intronic Eucgr.A00214.1.v2.0,Eucgr.A00214.2.v2.0 Chr01 84718 84718 C T het 80 60 7 5
intronic Eucgr.A00214.1.v2.0,Eucgr.A00214.2.v2.0 Chr01 84754 84754 C T het 85 60 7 5
intronic Eucgr.A00214.1.v2.0,Eucgr.A00214.2.v2.0 Chr01 89492 89492 C T het 78 60 2 4
convert to this :
annoPos gene chr pos1 pos2 ref alt genotype quality MQ ref-depth alt-depth
UTR3 Eucgr.A00214.1.v2.0,Eucgr.A00214.2.v2.0 Chr01 82411-82437 82411-82437 G/T-A/C hom-hom 0-0 4-4
exonic Eucgr.A00214.1.v2.0,Eucgr.A00214.2.v2.0 Chr01 84195-84209-84631 84195-84209-84631 C/T-G/C-C/T hom-hom-het 0-0-5 17-17-7
intronic Eucgr.A00214.1.v2.0,Eucgr.A00214.2.v2.0 Chr01 84718-84754-89492 84718-84754-89492 C/T-C/T-C/T het-het-het 7-7-2 5-5-4
alleles are G/T-A/C but for this command "awk -v OFS="\t" '{print $1,$2,$3,$4,$5,$6"/"$7,$8,$11,$12}' test.txt | datamash -g1 unique 2 unique 3 collapse 4-9 | awk -v OFS="\t" '{gsub(",","-");gsub("-",",",$2)}1' "
I got error : datamash: -H or --header-in must be used with named columns
@sam: If your data is same as the text posted in OP, it should not give any problems. Try the code on text furnished in OP. This is because no where in furnished code, datamash uses header information. I re ran the code no issues with furnished data in OP and output. Check your data again. Btw, datamash version is 1.3.