Extract specific patter from string in MySQL:
3
0
Entering edit mode
10.2 years ago
Renesh ★ 2.2k

I would like to extract specific pattern from string in MySQL.

The column contains specific string like xxx-atg168d and xxx-atg444-6x. From these string, I want to extract atg168 and atg444 only. How can I perform this in MySQL?

Input column

xxx-atg168d
xxx-atg444-6x
xxx-atg1689d
xxx-atg16507d

Output column

atg168
atg444
atg1689
atg16507
mysql • 23k views
ADD COMMENT
0
Entering edit mode

It's really unclear how this is related to bioinformatics.

ADD REPLY
0
Entering edit mode

Yes this related to bioinformatics. This table mimics the miRNA data.

ADD REPLY
0
Entering edit mode

This is plain text processing and SQL. Not really related to Bioinformatics and like Devon says, can be found with the tiniest bit of googling.

ADD REPLY
3
Entering edit mode
10.2 years ago

If the pattern is always "-atg", then the following sql statement should work:

select SUBSTRING_INDEX(SUBSTRING(name FROM IF(locate("-atg",name)=0,1,1+LOCATE("-atg",name))),"-",1) from mydatabase order by name;
ADD COMMENT
0
Entering edit mode

Thanks for you reply. But, the all record does not have atg pattern. How can i generalize this for all different patterns?

ADD REPLY
2
Entering edit mode

It might be easiest to just write a special function to use substring() or substring_index() and also then run instr() to determine if this should be done a second time.

Alternatively, just use a different language that can interface with mysql (perl, python, etc.). mysql is great for storing and querying data and bad for processing it in a complex way.

ADD REPLY
2
Entering edit mode

FYI, when you're asking questions on sites like these, you should specify exactly what you want. You're getting free help from the best bioinformaticians in the world. Pierre is in the top .1% in his field. He answered your question as stated brilliantly, but you were lazy and didn't properly state your needs.

Otherwise you waste the time of yourself and the busy brilliant minds who are trying to answer your question. If you goof up and end up needing to change the parameters of the question, the decent thing is to edit your original question and apologize for wasting others' time.

We're nice. At the StackExchange network they'll downvote you to the Earth's core.

How To Ask Good Questions On Technical And Scientific Forums

ADD REPLY
1
Entering edit mode

If I could up-vote this more than once I would.

ADD REPLY
0
Entering edit mode

I still wonder why we don't have a downvote option. Maybe mods should start adding "not-bioinformatics" tags so experts need not waste their time on such questions.

ADD REPLY
0
Entering edit mode

Yeah, we tend to just close the ones that are too off-topic. It's always a difficult to find the right threshold to close things though.

ADD REPLY
0
Entering edit mode

Actually, Devon, I was thinking new folks like myself could take up these questions so experts need not waste their time on them. Do you think that makes sense?

ADD REPLY
0
Entering edit mode

The really off-topic ones should just get closed. But otherwise please feel encouraged to field any questions you come upon :) I can't speak for the others but part of my personal impetus for answering questions here is that I often run into a similar problem later and will then already know the solution (not to mention that everyone else's problems always seem so much more tractable than mine!).

BTW, since you're on twitter, do follow @BiostarQuestion. I find it convenient to just look at the twitter feed when something is running or compiling.

ADD REPLY
0
Entering edit mode

I already follow @BiostarQuestion :) I agree that the off topic questions should ideally get closed, but who decides where the line is? Case in point, OP insists this is relevant because the data happens to follow, roughly, the format of some random IDs. Pretty sure any bioinformatician that has any idea of RegEx (so, all of us) can solve said question with simpler tools. Or, if one is Excel savvy, one knows that FIND() and MID()/SUBSTR() can be used to achieve almost anything.

You know what, maybe we should ask for a RegEx captcha before anyone can post a question :-D

ADD REPLY
0
Entering edit mode

Yeah, there's no clear line and sometimes we disagree how off-topic things need to be before we close them.

ADD REPLY
1
Entering edit mode

yes but that will be tricky. If you the owner of the database, you should create a new field containing the new name (dump the database, transform with sed, update the records)

ADD REPLY
2
Entering edit mode
10.2 years ago

FYI, we shouldn't have to assume/figure out how questions are relevant (these could be any IDs and this is actually just a mysql question), that's your responsibility.

In mysql, you'd just use the substring command select substring(some_column, 4,) from some_table. If you want to remove suffixes too, then you'd need to add substring_index(). BTW, you could have found this with a small bit of googling.

ADD COMMENT
0
Entering edit mode

Devon, can you please, stop replying here while I'm looking for the correct sql statement ? :-)

ADD REPLY
0
Entering edit mode

I'll see what I can do :)

ADD REPLY
1
Entering edit mode
10.2 years ago
Renesh ★ 2.2k

Hi, I use following two steps to solve my problem and it works. Thanks for your help,

substring_index (substr(cn,instr(cn,'-')+1),'-',1);
CONCAT(LEFT(cn, CHAR_LENGTH(cn) - 1),IF(RIGHT(cn, 1) REGEXP '[a-z]' = 0, RIGHT(cn, 1), ''));

http://stackoverflow.com/questions/26045071/extract-specific-patter-from-string-in-mysql

http://stackoverflow.com/questions/15776957/mysql-remove-last-characters-is-character-if-its-number

ADD COMMENT
1
Entering edit mode

FYI, it's not unheard of to see miRNAs with names like mmu-miR123-3-3p, which that method will miss (perhaps you don't have any of those in this dataset, but keep that in mind for the future). In general it's probably better to just use a regex with perl/python/etc or standard string splitting in a normal (i.e., not SQL) language.

ADD REPLY
0
Entering edit mode

Yes you are right. I also missed data like cbn-miR-44. Therefore, the method I proposed is not generalized for all miRNAs

ADD REPLY

Login before adding your answer.

Traffic: 1672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6