how to replace certain column name based on another file
3
0
Entering edit mode
8.2 years ago

Dear all, I have two files, one file is from the GFF file with genome annotation in it, as

"

NW_004848299.1 RefSeq region 1 2133925 . + . ID=id0;Name=Unknown;Dbxref=taxon:

NW_004848299.1 Gnomon gene 255845 257824 . - . ID=gene0;Name=LOC101930845;Dbxref

NW_004848299.1 Gnomon mRNA 255845 257824 . - . ID=rna0;Name=XM_005278412.1 .... "

for my second file as the scaffold information like this:

" Assembly Genome_name RefSeq_Accession GenBank_Accession NCBI_name

Chrysemys_picta_bellii-3.0.1 Group1 NW_004848299.1 JH584390.1 GPS_001879038.1

Chrysemys_picta_bellii-3.0.1 Group2 NW_004848300.1 JH584391.1 GPS_001879039.1

Chrysemys_picta_bellii-3.0.1 Group3 NW_004848301.1 JH584392.1 GPS_001879040.1

.. ..

"

the name of first file is the scaffold name. the file of second is containing the scaffold information but with different names.

in my first GFF file, the first column name is corresponding to the third column name of second file.

but I want to replace the first column name from first gff file as the name from second column name from second file.

the result would be as

"

Group1 RefSeq region 1 2133925 . + . ID=id0;Name=Unknown;Dbxref=taxon:84

Group1 Gnomon gene 255845 257824 . - . ID=gene0;Name=LOC101930845;Dbxref

Group1 Gnomon mRNA 255845 257824 . - . ID=rna0;Name=XM_005278412.1

.... "

How can I do it by using R or unix command or perl script. the files all separated as tabs.

thanks

ZQ

R gene • 2.4k views
ADD COMMENT
2
Entering edit mode
8.2 years ago
Brice Sarver ★ 3.8k

And they're in the correct order?

In R:

a <- read.table("file1.tsv", header=FALSE, stringsAsFactors=FALSE, sep="\t")
b <- read.table("file2.tsv", header=TRUE, stringsAsFactors=FALSE, sep="\t")

d <- cbind(b$V2, a$V2, a[, 3:ncol(a)]) #or whatever columns you so desire

write.table(d, file="newtable.tsv", sep="\t")

You'll need to make sure that a and b are set up properly for what you want, possibly with string splitting.

ADD COMMENT
2
Entering edit mode
8.2 years ago
5heikki 11k
man join

Basically:

join \
-1 $FIELDWITHSAMEIDASOTHERFILE \
-2 $FIELDWITHSAMEIDASOTHERFILE \
-t $'\t' (tab-delimited, right?) \
-o 1.1,1.2,2.3 (or whatever fields we want in output) \
<(sort -t $'\t' (tab-delimited?) -k$FIELDWITHSAMEIDASOTHERFILE,$FIELDWITHSAMEIDASOTHERFILE FILE1) \
<(sort -t $'\t' (tab-delimited?) -k$FIELDWITHSAMEIDASOTHERFILE,$FIELDWITHSAMEIDASOTHERFILE FILE2)
ADD COMMENT
2
Entering edit mode
8.2 years ago
H.Hasani ▴ 990

if you have at least one column in common use function merge in R

ADD COMMENT

Login before adding your answer.

Traffic: 2104 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6