Entering edit mode
8.8 years ago
mbk0asis
▴
700
Hi, all!
I've got a file containing CDS mutation information of cancer genes.
The data have mutation positions and sequences in normal/cancer.
c.863_864insTCTG
c.1799T>A
c.1849G>T
c.2504A>T
c.2509_2510AT>CC
c.2506_2508ATC>TTT
I want to extract positions from it, but no separator is there between numbers and text.
How can I separate them?
Thank you!
Wow! It worked. Would you explain what the '\d+' mean in this code?
Thank you!
Nervermind, I found an answer.
http://stackoverflow.com/questions/14017134/what-is-d-d-in-regex
Actually, the example above is one of columns from a data with multiple columns (~ 30 columns). I tried to apply your code but found no luck. I tested using 2 column data, but didn't work. I was going to 'paste' the results to original data, but rows without numbers disappeared (e.g. "c.?") in output.
How can I do it when data are composed of multiple columns?
It should work even if the data is having multiple columns. The above command line will extract the pattern in a line [the number (represented by
\d
) and anything in between (represented by.*
) and again number (\d
)]. can you show how the pattern will look like? then may be we can try something else!Here is my test data and results. As you can see, the first column replace by some numbers.
I forgot to mention that I wanted to keep other columns in the output.
Thank you!
Oh Ok . I got it. In your first post, you gave me only one column. Thats why it worked. Now, by using multiple columns it is printing the number found in the first column and other column. That number 2 is the number in JAK
2
followed by space andc.1849
.OK. Try this if your positions are all present only one column.
-f2
is the column 2. Replace 2 with the column number in your file.Ah. That's where the numbers in column 1 came from. I understood.
Another questions is If I
cut
the column I will lose other columns.If I want to keep the other columns, I don't think
grep
will do it.Do you have any thoughts on that?
Thank you!
Yeah. I have many tricks in my pocket!! :)
Try this (assuming all the patterns you want have c. infront) :
It worked!
I think I just overcome the biggest hurdle.
Thank you for your help! You rock!