Entering edit mode
10.1 years ago
tremblayemilie9
•
0
Hi!
I have something like this:
>barcodelabel=#ITS2_A_B1_VG6RM_00076_01732;size=12594;
TCGTTCTCGGACTTTGGGTACAAGGGGCAGGGCTGGCTGCTTCCGGCAGGCGGCCCCGCCGGCGGCGGGGGCCGCCAGTC
GCCGAGTCCTGGCCGCGGTTGCAAAGGGTGGGGTGGCGCCCGGGGGCGTGACCCATTAATGATCCTTCCGCAGGTTCACC
TACGGAAACCTTGTTACGACTTTTACTTCCTCTAAATGACCAAG
>barcodelabel=#ITS2_A_B2_VG6RM_00466_00157;size=9208;
TTAAGTTTTTTCAGACGCTGATTGCAACTGCAAATGGTTTAAATTGTCCAATCGGCGGGCGGACCCGCCGAGGAAACGTA
AGGTACTTAAAAGACATGGGTAAGAGATAGCAGGCAAAGCCTACAACTCTAGGTAATGATCCTTCCGCAGGTTCACCTAC
GGAAACCTTGTTACGACTTTTACTTCCTCTAAATGACCAAG
>barcodelabel=#ITS2_A_B1_VG6RM_00284_01321;size=8857;
TTAATTTGTTACTGACGCTGATTGCAATTACAAAAGGTTTATGTTTGTCCTAGTGGTGGGCGAACCCACCAAGGAAACAA
GAAGTACGCAAAAGACAAGGGTGAATAATTCAGCAAGGCTGTAACCCCGAGAGGTTCCAGCCCGCCTTCATATTTGTGTA
ATGATCCCTCCGCAGGTTCACCTACGGAGACCTTGTTACGACTTTTACTTCCTCTAAATGACCAAG
and I would like to take off the read id part (VG6RM_00284_01321
) with sed
.
I do not know how do do so since the part before varies (B1, B2, etc).
Thank you
That seems a bit off the mark to me. Shouldn't it be
as the aim is to retain everything except the part between
B[0-9]_
and;
?Oh, I understood the opposite, maybe you're right RamRS, but I'm not sure yet.
But it is a bit strange to keep everything except the read ID, isn't it?
Maybe, but that's what OP wants to do. No idea why.
Also, you might wanna edit your answer so OP can accept it.
RamRS, you won the battle ;)
It is the correct way to do? I mean, the correct answer is yours, not mine, so I guess that you should post your comment as an answer?
It's OK. Virtual points are a petty thing to compete for. Plus, you misunderstood the question, so it's not like you did not know how to get there :-)
Also, OP has accepted your answer. High time for you to edit the content!
hahaha, OK, thank you :)
Also, to circumvent your "if there is only one B1/B2", you can use a minimal/non-greedy match expression. Over here, a greedy match of the
\1
expression serves the purpose, but let's say you wanna match the shortesta*b
in"aabb"
, if you use the expressiona*?b
, the match isaab
(as opposed to theaabb
match for the usuala*b
expression - the?
makes all the difference!)Hi, I did your command line (
sed -r 's/(>*).*_B[0-50]_([^;]+).*/\1\2/'g test.fasta > out_test_.fasta
) but it actually does the opposite of what I want:Sorry it might not have been clear before hand, but I am looking ot get something like this:
Yep, sorry, I understood the opposite, see RamRS comment, he has written the correct command for your goal.