Question

how to replace certain column name based on another file

0

Entering edit mode

8.5 years ago

wu.zhiqiang.1020 ▴ 50

Dear all, I have two files, one file is from the GFF file with genome annotation in it, as

"

NW_004848299.1 RefSeq region 1 2133925 . + . ID=id0;Name=Unknown;Dbxref=taxon:

NW_004848299.1 Gnomon gene 255845 257824 . - . ID=gene0;Name=LOC101930845;Dbxref

NW_004848299.1 Gnomon mRNA 255845 257824 . - . ID=rna0;Name=XM_005278412.1 .... "

for my second file as the scaffold information like this:

" Assembly Genome_name RefSeq_Accession GenBank_Accession NCBI_name

Chrysemys_picta_bellii-3.0.1 Group1 NW_004848299.1 JH584390.1 GPS_001879038.1

Chrysemys_picta_bellii-3.0.1 Group2 NW_004848300.1 JH584391.1 GPS_001879039.1

Chrysemys_picta_bellii-3.0.1 Group3 NW_004848301.1 JH584392.1 GPS_001879040.1

.. ..

"

the name of first file is the scaffold name. the file of second is containing the scaffold information but with different names.

in my first GFF file, the first column name is corresponding to the third column name of second file.

but I want to replace the first column name from first gff file as the name from second column name from second file.

the result would be as

"

Group1 RefSeq region 1 2133925 . + . ID=id0;Name=Unknown;Dbxref=taxon:84

Group1 Gnomon gene 255845 257824 . - . ID=gene0;Name=LOC101930845;Dbxref

Group1 Gnomon mRNA 255845 257824 . - . ID=rna0;Name=XM_005278412.1

.... "

How can I do it by using R or unix command or perl script. the files all separated as tabs.

thanks

ZQ

R gene • 2.5k views

ADD COMMENT • link updated 8.5 years ago by H.Hasani ▴ 990 • written 8.5 years ago by wu.zhiqiang.1020 ▴ 50

score 2 · Answer 1 · 2016-09-29

And they're in the correct order?

In R:

a <- read.table("file1.tsv", header=FALSE, stringsAsFactors=FALSE, sep="\t")
b <- read.table("file2.tsv", header=TRUE, stringsAsFactors=FALSE, sep="\t")

d <- cbind(b$V2, a$V2, a[, 3:ncol(a)]) #or whatever columns you so desire

write.table(d, file="newtable.tsv", sep="\t")

You'll need to make sure that a and b are set up properly for what you want, possibly with string splitting.

score 2 · Answer 2 · 2016-09-30

man join

Basically:

join \
-1 $FIELDWITHSAMEIDASOTHERFILE \
-2 $FIELDWITHSAMEIDASOTHERFILE \
-t $'\t' (tab-delimited, right?) \
-o 1.1,1.2,2.3 (or whatever fields we want in output) \
<(sort -t $'\t' (tab-delimited?) -k$FIELDWITHSAMEIDASOTHERFILE,$FIELDWITHSAMEIDASOTHERFILE FILE1) \
<(sort -t $'\t' (tab-delimited?) -k$FIELDWITHSAMEIDASOTHERFILE,$FIELDWITHSAMEIDASOTHERFILE FILE2)

score 2 · Answer 3 · 2016-09-30

2

Entering edit mode

8.5 years ago

H.Hasani ▴ 990

if you have at least one column in common use function merge in R

ADD COMMENT • link 8.5 years ago by H.Hasani ▴ 990