Entering edit mode
7.8 years ago
adityabandla
▴
30
Hi
Post blastx, I have the alignment results as a .m8 blast tabular file, with lines that look as follows
HISEQ:329:HMKF3BCXX:1:1101:4293:5950/1 gi|753197404|ref|WP_041503856.1| 54.3 81 37 0 6 248 141 221 4.4e-18 99.0
I would like to simplify the NCBI identifiers of the second column i.e. keep only the accession numbers in the blast output file, essentially something like
HISEQ:329:HMKF3BCXX:1:1101:4293:5950/1 WP_041503856.1 54.3 81 37 0 6 248 141 221 4.4e-18 99.0
Thanks
Thanks Pierre! Much appreciated
On another note, I am trying to simplify the fasta header which contains similar text i.e. gi|753197404|ref|WP_041503856.1|
I am currently removing the gi numbers using
sed 's/^[^ ][|]([^|])[|] .*$/>\1/'
Is there a faster alternative using awk?