Question

grep extract command will remove replicates automatically

0

Entering edit mode

4.9 years ago

mxlsherry1992 ▴ 80

Dear all,

I have a file 1, containing many gene ID, and also has replicates for the gene ID. I want to extract the corresponding line from an gff file, I have a script can accomplish this, but this script will remove the replicates automatically, if I don't want to remove the replicates and want to keep the output just like the order in file 1, if there is anyway I could do this...

Here is file 1:

gene14184
gene25736
gene14184
gene8906
gene25736
gene14775
gene4224
gene8906
gene14184
gene24702

Here is the script I am using:

grep -Fwf file1.txt 001660625_genomic.gff > output.txt

RNA-Seq • 962 views

ADD COMMENT • link 4.9 years ago by mxlsherry1992 ▴ 80

1

Entering edit mode

I don't know if I really understand what you want to do, but try this:

(for ID in $(cat file1); do echo -e $ID"\t"$(grep $ID gff1); done;) > output.txt

ADD REPLY • link 4.9 years ago by hugo.avila ▴ 530

score 0 · Answer 1 · 2020-01-09

0

Entering edit mode

4.9 years ago

mxlsherry1992 ▴ 80

Solved!! and Vlookup in excel could also do this. Thanks!!

ADD COMMENT • link 4.9 years ago by mxlsherry1992 ▴ 80

2

Entering edit mode

Excel could also do this:

Gene name errors are widespread in the scientific literature

ADD REPLY • link 4.9 years ago by zx8754 12k

0

Entering edit mode

Excel can do many things, but once a file gets larger it will crash plus user interaction with it is irreproducable in terms of coding it up in a script. If it helped you this time it is fine but I strongly recommend to learn how to do things outside of Excel. Imagine one of the files was 10Gb in size...what then.

ADD REPLY • link 4.9 years ago by ATpoint 85k