Entering edit mode
21 months ago
saadleeshehreen
▴
140
Hi,
I have a weird .txt file with this line.
lcl|CU459141.1_prot_CAM87240.1_2248 - TniQ PF06527.14 0.018 13.6 0.0 0.024 13.2 0.0 1.1 1 0 0 1 1 1 0 [locus_tag=ABAYE2390] [db_xref=EnsemblGenomes-Gn:ABAYE2390
I need to process the line into 2 columns like following:
CU459141.1 CAM87240.1
Can anyone help me to write a bash command for this?
Thanks
First, this is not bioinformatics. Simple pattern matching and extraction.
Second, there isn't enough information in your message. Does each line start with
lcl|
? Are the words that need to be extracted always separated by_prot_
. Can't expect help without making some effort on your own.Third, what have you tried? You are asking for help in writing a command. If you haven't tried anything, the translation of your request is that you want someone to solve this for you.
Yes the word start with lcl| and always sperated by _prot_. I am very naive pattern matching and extraction. I was trying to cut the field with cut -f1 command. But, I then realise the file is not a tab delimited. I do try following
cut
can use any delimiter. Change the delimiter to_
and you should be able to figure out the rest.