Question

how to modify one column in text file in python - solved

0

Entering edit mode

8.8 years ago

ashkan ▴ 160

I have a text file in which there is column containing IDs. here is a small example of this column:

ENSG00000072803.13 
ENSG00000163002.8 
ENSG00000102221.9 
ENSG00000072121.11
ENSG00000149532.11
ENSG00000134419.11

I want to get rid of point and any number after that. so, for this example I need something like this:

ENSG00000072803
ENSG00000163002
ENSG00000102221
ENSG00000072121
ENSG00000149532
ENSG00000134419

do you guys know how to do that in python?

RNA-Seq • 7.8k views

ADD COMMENT • link updated 8.8 years ago by WouterDeCoster 48k • written 8.8 years ago by ashkan ▴ 160

0

Entering edit mode

It would be nice if you would follow up on your earlier questions before opening new threads. See for example a set of guidelines in this post: How To Ask Good Questions On Technical And Scientific Forums and https://www.ncbi.nlm.nih.gov/pubmed/21980280

ADD REPLY • link 8.8 years ago by WouterDeCoster 48k

0

Entering edit mode

Hi Ashkan,

and a simple command line : cut -f1 -d '.' yourfile > yourfile.pointless

~ Best

ADD REPLY • link 8.8 years ago by Farbod ★ 3.4k

0

Entering edit mode

As usual people don't ask clear questions so if there is only one column then this solution may be fine.

It is not clear from the original question if there is only one column in the file or other things as well.

ADD REPLY • link 8.8 years ago by GenoMax 153k

0

Entering edit mode

Hi genomax2, yes you are right (as always).

By the way, does this one vs. several column situation has any impact on @Wouter python script or not?

and I guess the first thing that most of the people try to solve is the "example" have been offered in the question.

ADD REPLY • link 8.8 years ago by Farbod ★ 3.4k

0

Entering edit mode

My piece of code should work fine regardless of one or multiple column files, but if there is only one column a tab will be appended to each line. Essentially my script will modify the first column of the file and leave the rest of the file as it, regardless of there is a rest.

ADD REPLY • link 8.8 years ago by WouterDeCoster 48k

0

Entering edit mode

Dear Wouter, I have used your code in a two column tab separated file (two columns are separate from each other with tab), but it seems that it has just remove the first column point and show the second column original lines!

Am I having any error in running your code?

ENSG00000072803 ENSG00000072803.18

ENSG00000163002 ENSG00000072803.13

ENSG00000163088 ENSG00000072803.33

ENSG00000163054

ADD REPLY • link 8.8 years ago by Farbod ★ 3.4k

1

Entering edit mode

He mentioned that his script will modify only the first column. If you want to do it for all columns:

with open("inputfile.txt") as input:
        for line in input:
                linelist = [ l.split(".")[0] for l in line.strip().split("\t")]
                print "\t".join(linelist)

ADD REPLY • link 8.8 years ago by GouthamAtla 12k

0

Entering edit mode

Thank you (and WouterDeCoster), that works!

ADD REPLY • link 8.8 years ago by Farbod ★ 3.4k

0

Entering edit mode

Yes exactly, I assumed his IDs would be in the first column/field (but that can be adapted.) and that removing '.' in the rest of the file wasn't desirable. So the removing only from the first column is a feature, not a bug ;) But as Goutham Atla demonstrates it can be done easily.

ADD REPLY • link 8.8 years ago by WouterDeCoster 48k

0

Entering edit mode

It may.

That is why the mods are fighting this battle of making sure people realize that they need to ask clear questions (and/or do a minimal search for a solution before posting a new question).

ADD REPLY • link 8.8 years ago by GenoMax 153k

score 2 · Answer 1 · 2016-10-14

Ehm something trivial like the following, but this is actually not a real bioinformatics question... If I understood correctly there also other columns in the file which you want to keep.

with open("inputfile.txt") as input:
    for line in input:
        linelist = line.strip().split('\t') #Assuming tab separated file
        print("{}\t{}".format(linelist[0].split('.')[0], '\t'.join(linelist[1:])))