How to merge two files genotype and ped In Linux? I sample files as follows.
1
0
Entering edit mode
8.0 years ago
mm ▴ 20

How to merge two files genotype and pre-ped In Linux? I sample files as follows.

S949C08 111071 900533 900409 Susceptible 2
S949G08 111064 900533 900469 Susceptible 2
S949E09 111051 910054 890231 Susceptible 2
S949209 111049 910054 910087 Susceptible 2
R949C06 111034 920283 920207 Susceptible 1

genotype file: One example of an animal's genotype

R949C06 TC TT CC TC CC TT GG CC AG TT AA GG AA TT CC TC -- CC TC GC TC AA TC AG AG TC AA AA AA AG TC CC AT AA TT AA TT GG AA TC AG TC TA TA AG -- TG TT -- AA -- TT TT CC AG GG TC GG CC AA -- CC AC AA GG -- AA CC CC AA TC AG AA TC CG TT GG CC TT GG AG GG TT AA CC AA CC TC AG GG TC AG AG AG GC CC AG GG AA TC GG AA AA GG TC AG CC AG CC TC AA CC CC CC GG CC AG CC CC AG AC CC GG TT CC AG CC AA TC TT GG AG GG CC TC TC AA GG CC TC AG AG TT GG TG AG AA TG
linux plink • 2.6k views
ADD COMMENT
1
Entering edit mode

man join, noting that you need to sort the files before use.

ADD REPLY
0
Entering edit mode

I do not understand what are you saying?

ADD REPLY
1
Entering edit mode

Devon asks you to read the manual for the linux command 'join'. https://linux.die.net/man/1/join

"join - join lines of two files on a common field "

noting that you need to sort the files before use.

"Important: FILE1 and FILE2 must be sorted on the join fields. "

ADD REPLY
0
Entering edit mode

I SNP Chip id ,common in both files. That is done I want to join

ADD REPLY
0
Entering edit mode
6.9 years ago
mittu1602 ▴ 200

You can try awk one-liner

cat Test1.txt

S949C08 111071 900533 900409 Susceptible 2
S949G08 111064 900533 900469 Susceptible 2
S949E09 111051 910054 890231 Susceptible 2
S949209 111049 910054 910087 Susceptible 2
R949C06 111034 920283 920207 Susceptible 1

cat Test2.txt

R949C06 TC TT CC TC CC TT GG CC AG TT AA GG AA TT CC TC -- CC TC GC TC AA TC AG AG TC AA AA AA AG TC CC AT AA TT AA TT GG AA TC AG TC TA TA AG -- TG TT -- AA -- TT TT CC AG GG TC GG CC AA -- CC AC AA GG -- AA CC CC AA TC AG AA TC CG TT GG CC TT GG AG GG TT AA CC AA CC TC AG GG TC AG AG AG GC CC AG GG AA TC GG AA AA GG TC AG CC AG CC TC AA CC CC CC GG CC AG CC CC AG AC CC GG TT CC AG CC AA TC TT GG AG GG CC TC TC AA GG CC TC AG AG TT GG TG AG AA TG

awk 'FNR==NR{a[$1]=$2 FS $3 FS $4 FS $5 FS $6;next}{ print $0, a[$1]}' Test1.txt Test2.txt > result.txt

cat result.txt

R949C06 TC TT CC TC CC TT GG CC AG TT AA GG AA TT CC TC -- CC TC GC TC AA TC AG AG TC AA AA AA AG TC CC AT AA TT AA TT GG AA TC AG TC TA TA AG -- TG TT -- AA -- TT TT CC AG GG TC GG CC AA -- CC AC AA GG -- AA CC CC AA TC AG AA TC CG TT GG CC TT GG AG GG TT AA CC AA CC TC AG GG TC AG AG AG GC CC AG GG AA TC GG AA AA GG TC AG CC AG CC TC AA CC CC CC GG CC AG CC CC AG AC CC GG TT CC AG CC AA TC TT GG AG GG CC TC TC AA GG CC TC AG AG TT GG TG AG AA TG 111034 920283 920207 Susceptible 1
ADD COMMENT

Login before adding your answer.

Traffic: 2615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6