Hello all,
I am at a loss here.
I am making a bash script as generic as possible and I want to replace one column in a csv file (test1.csv
) with a column from another csv file (test4b.csv
).
I know how to do this using the number of column in awk, however, I am trying to generate a very generic script as these files may present a variable number of fields while the header of each field will stay unvaried. I want to do this using the header of the field.
I have asked on forums and all I got was partial replies (with no explaination) and rude comments telling me to LEARN first and ask question after (?).
Well, I am learning... I am a biologist with no background in coding and I have not found any documentation to help me with this.
Putting together some suggestions from forums, this is what I have so far but it is not work.
awk -v col="index2" -v rc="2xedni" '
BEGIN{
FS=OFS=","
}
NR==1{
for(i=1;i<=NF;i+=1){
columns[$i]=i
};
print
}
NR>1{
$(columns[col])=rc;
print
}
NR>1' test1.csv test4b.csv > test5.csv
cat test1.csv
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
1,O_bru,1,A01,UDI_55,GACAGTAA,UDI_55,GTAATCGC,,
2,A_lych,1,B01,UDI_56,CCTTCGCA,UDI_56,AAGTGATG,,
3,A_litu,1,C01,UDI_57,CATGATCG,UDI_57,GGAGAGTA,,
cat test4b.csv
GCGATTAC
CATCACTT
TACTCTCC
The desired output is
cat test5.csv
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
1,O_bru,1,A01,UDI_55,GACAGTAA,UDI_55,GCGATTAC,,
2,A_lych,1,B01,UDI_56,CCTTCGCA,UDI_56,CATCACTT,,
3,A_litu,1,C01,UDI_57,CATGATCG,UDI_57,TACTCTCC,,
The output I get with the line of code above
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
1,O_bru,1,A01,UDI_55,GACAGTAA,UDI_55,1,,
2,A_lych,1,B01,UDI_56,CCTTCGCA,UDI_56,2,,
3,A_litu,1,C01,UDI_57,CATGATCG,UDI_57,3,,
4,A_trag,1,D01,UDI_58,TCCTTGGT,UDI_58,4,,
5,C_trap,1,E01,UDI_59,GTCATCTA,UDI_59,5,,
6,P_pen,1,F01,UDI_60,GAACCTAG,UDI_60,6,,
7,A_aura,1,G01,UDI_61,CAGCAAGG,UDI_61,7,,
8,O_goth,1,H01,UDI_62,CGTTACCA,UDI_62,8,,
9,C_pasi,1,A02,UDI_63,TCCAGCAA,UDI_63,9,,
10,A_hera,1,B02,UDI_64,CAGGAGCC,UDI_64,10,,
11,I_dimi,1,C02,UDI_65,TTACGCAC,UDI_65,11,,
12,I_fusc,1,D02,UDI_66,AGGTTATC,UDI_66,12,,
13,M_tith,1,E02,UDI_67,TCGCCTTG,UDI_67,13,,
14,H_post,1,F02,UDI_68,CCAGAGCT,UDI_68,14,,
15,D_punc,1,G02,UDI_69,TACTTAGC,UDI_69,15,,
16,D_nigr,1,H02,UDI_70,GTCTGATG,UDI_70,16,,
17,E_abbr,1,A03,UDI_71,TCTCGGTC,UDI_71,17,,
18,P_scab,1,B03,UDI_72,AAGACACT,UDI_72,18,,
19,L_rube,1,C03,UDI_73,CTACCAGG,UDI_73,19,,
20,IsMShag_M,1,D03,UDI_74,ACTGTATC,UDI_74,20,,
21,IsMShag_F,1,E03,UDI_75,CTGTGGCG,UDI_75,21,,
22,Laura_Hayes,1,F03,UDI_76,TGTAATCA,UDI_76,22,,
23,Owen_Trimming,1,G03,UDI_77,TTATATCT,UDI_77,23,,
GCGATTAC,,,,,,,GCGATTAC
CATCACTT,,,,,,,CATCACTT
TACTCTCC,,,,,,,TACTCTCC
I am aware that there are multiple issues here. One of them could possibly be the formatting of test4b.csv
? Is it a valid csv?
I could really do with some help here. Suggestions/help/examples are welcome.
Looks like you are trying to reverse complement the index sequences for use with
bcl2fastq
.Have you considered using
bcl-convert
instead. It is smarter thanbcl2fastq
(i.e. it will automatically do RC part depending on the sequencer being used) and faster. It is also the "new" (not so much anymore) software that Illumina will keep supporting.Hi GenoMax Yes, I am aware of tools/ways to get the RC done easily, but I am learning bash scripting and this was an easy enough challenge for me to get familiar with awk etc.
Thanks.