I have a file like this:
"
DGM97JN1_135:2:1101:1283:2110 16 chr13.fa ...
DGM97JN1_135:2:1101:1434:2186 16 chr08.fa ...
DGM97JN1_135:2:1101:1385:2244 0 chr16.fa ...
DGM97JN1_135:2:1101:1663:2038 0 chr13.fa ...
...........
"
and I would like to use a fast and easy way to delete the ".fa" after chromosome names in each row of the third column. How I do it now is
cat <input file> | tr -d '.fa'
But it does not works well, since it will also delete all "." in the file for some reasons, why this would happen and what is the right way to code it?
Besides since it is a large file, I wonder if there is a way I can narrow down the searching and substituting on only third column and therefore accelerate the process?
and for inplace, use sed -i ''
This will probably work for his data if and only if the ".fa" appears only in the third column ... or if he wants ".fa" removed from everywhere.
If it were specifically for the third column, things get a bit more complicated ... especially if columns are separated by spaces instead of single tabs. Then I'd write a quick (perl|tcl|whatever) filter.