if file1 exactly be (tab delimited with row number in second column):
chr13:32914976-32919384 1
chr13:32900191-32900291 2
chr22:19137233-19152094 3
chr4:69401231-69493479 4
chr17:44171207-44237068 5
and
file2 exactly be (tab delimited):
chr16:77201441-77224685 numsnp=10 length=23,245 state5,cn=3sample.split220startsnp=kgp5597583endsnp=kgp11117565conf=24.089
chr4:69401231-69493479 numsnp=145 length=92,249 state5,cn=3sample.split171startsnp=4:69401231_CNV_UGT2B17endsnp=rs17663945conf=118.433
chr17:41267632-41276134 numsnp=85 length=8,503 state5,cn=3sample.split207startsnp=rs8176100endsnp=rs886040902conf=29.833
chr19:1220367-1234676 numsnp=84 length=14,310 state5,cn=3sample.split90startsnp=rs567202367endsnp=19:1234676-GAconf=88.991
chr19:41385767-41385775 numsnp=6 length=9 state1,cn=0sample.split90startsnp=19:41385767_CNV_CYP2A7endsnp=19:41385775_CNV_CYP2A7_Ilmndup2conf=21.121
chr17:44171207-44237068 numsnp=14 length=65,862 state5,cn=3sample.split166startsnp=JHU_17.44171206endsnp=rs2532292conf=21.762
chr1:1266841-1269724 numsnp=27 length=2,884 state5,cn=3sample.split60startsnp=rs577691125.1endsnp=rs536836467.2conf=20.617
chr1:2381164-2389702 numsnp=14 length=8,539 state5,cn=3sample.split60startsnp=1:2381164endsnp=1:2389702conf=20.634
chr1:228497228-228612939 numsnp=71 length=115,712 state5,cn=3sample.split60startsnp=rs761649292endsnp=rs531385963.3conf=23.547
chr13:32914976-32919384 numsnp=92 length=4,409 state5,cn=3sample.split78startsnp=rs886040659endsnp=rs191253965conf=55.675
chr22:24301643-24301695 numsnp=16 length=53 state1,cn=0sample.split209startsnp=22:24301643_CNV_GSTT2B_Ilmndup1endsnp=22:24301695_CNV_GSTT2B_Ilmndup1conf=66.493
chr22:24302601-24302603 numsnp=7 length=3 state1,cn=0sample.split209startsnp=22:24302601_CNV_GSTT2Bendsnp=22:24302603_CNV_GSTT2B_Ilmndup3conf=25.047
chr22:42522134-42522313 numsnp=12 length=180 state1,cn=0sample.split209startsnp=22:42522134_CNV_CYP2D6endsnp=22:42522313_CNV_CYP2D6_Ilmndup2conf=43.298
chr13:32900191-32900291 numsnp=31 length=101 state5,cn=3sample.split147startsnp=rs81002842endsnp=rs276174848conf=25.818
chr13:32903511-32906684 numsnp=101 length=3,174 state5,cn=3sample.split147startsnp=rs61948377endsnp=rs886040343conf=49.625
chr13:32913339-32913896 numsnp=122 length=558 state5,cn=3sample.split147startsnp=rs398122786endsnp=rs80358763conf=65.213
chr13:32914891-32921037 numsnp=153 length=6,147 state5,cn=3sample.split147startsnp=rs80359581endsnp=rs876661201conf=143.004
chr22:19137233-19152094 numsnp=13 length=14,862 state2,cn=1sample.split26startsnp=rs540621015endsnp=kgp5062455conf=20.778
chr22:24301643-24302227 numsnp=36 length=585 state5,cn=3sample.split26startsnp=22:24301643_CNV_GSTT2B_Ilmndup1endsnp=22:24302227_CNV_GSTT2B_Ilmndup1conf=35.287
chr22:24374342-24386612 numsnp=126 length=12,271 state1,cn=0sample.split26startsnp=22:24374342_CNV_GSTT1endsnp=22:24386612_CNV_GSTT1conf=317.105
This command probably works:
awk '{str=substr($0,1,index($0,"\t"))} FNR==NR{a[str];next} (str in a)' file1 file2
result:
chr4:69401231-69493479 numsnp=145 length=92,249 state5,cn=3sample.split171startsnp=4:69401231_CNV_UGT2B17endsnp=rs17663945conf=118.433
chr17:44171207-44237068 numsnp=14 length=65,862 state5,cn=3sample.split166startsnp=JHU_17.44171206endsnp=rs2532292conf=21.762
chr13:32914976-32919384 numsnp=92 length=4,409 state5,cn=3sample.split78startsnp=rs886040659endsnp=rs191253965conf=55.675
chr13:32900191-32900291 numsnp=31 length=101 state5,cn=3sample.split147startsnp=rs81002842endsnp=rs276174848conf=25.818
chr22:19137233-19152094 numsnp=13 length=14,862 state2,cn=1sample.split26startsnp=rs540621015endsnp=kgp5062455conf=20.778
Hi, For better help, if possible, paste the output of the following commands
bash(linux):
"head -n 20 ./yourfile.cnv"
powershell(win) :
"type .\yourfile.cnv | select -First 20"
If I understand correctly, you want to delete the rows of the this file based on a list of "chr:start:end" similar to the first column of this file?
Yes. Thank you for replying by the way.
This is exactly what you want: https://stackoverflow.com/questions/35728766/awk-to-filter-file-by-specific-field-in-another
I tried using the code suggested by the karakfa in the website.
awk 'NR==FNR{a[$1];next} FNR==1 || ($7 in a)' file1 file2
So, I changed, the file1 = file containing the desired list file2 = file containing all the chromosome
Do I need to change other things? because it doesn't seem to work.