Combining files based on chromosome and position next to each other - column vise
1
0
Entering edit mode
5.0 years ago
Gene ▴ 20

I have multiple files in format: chr position value.

I want to combine them in format "chr", "position", "samp1", "samp2", "samp3", "samp4",........

For example:

Samp1:

chr position value

1   3774318 1

1   3774319 1

1   3775200 2

1   3775201 7

1   3775202 70

1   3775203 7

1   3775204 270

1   3775205 3

1   3775206 5

Samp 2:

chr position value

1   3775200 1

1   3775201 1

1   3775202 10

1   3775203 1

1   3775204 12

1   3775205 1

1   3775206 13

1   3775207 1

1   3775208 1

1   3775209 18

and so on ....

Desired output file: / I put random values in the output file

chr, position, value-samp1, value-samp2, value-samp3, value-samp4

1 50204 2 17 5 2

1 50205 2 17 5 2

1 50206 2 18 5 2

1 50207 2 19 5 3

1 50208 3 19 5 3

1 50209 3 19 5 3

Or in this case : { chr position samp1 samp2

1 3774318 1 0

1 3774319 1 0

1 3775200 2 1

1 3775201 0 1

1 3775202 70 10

1 3775203 7 1

1 3775204 270 12

1 3775205 3 1

1 3775206 5 13

1 3775207 7 1

1 3775208 0 1

1 3775209 0 18 }

I tried join, merge, cat, but it does not work as I expected. I am a begginer. Do you have any ideas how it can be done?

sequencing coverage Assembly • 1.2k views
ADD COMMENT
0
Entering edit mode

Sounds like bedtools intersect can help. Just duplicate the position column to two columns, probably should use -loj, then remove unnecessary columns

ADD REPLY
0
Entering edit mode

I don't think there are some one-click tools can do this job. It may require some coding work to do this task, python, R , etc .

ADD REPLY
2
Entering edit mode
5.0 years ago

assuming tab delimted fliles. create a file of uniq keys.

# create the uniq keys
cat input.*.txt  | cut -f 1,2 | tr "\t" "_" | sort | uniq > keys.txt

# for each file, fill the empty field
for F in input.*.txt
do
    sed 's/\t/_/' "$F" | sort -t $'\t' -k1,1 > "${F}.2"
    join -t $'\t' -e NA -a 1 -1 1 -2 1 -o "1.1,2.2" keys.txt "${F}.2" > "${F}.3"
done

# join all files
for F in input.*.txt
do
    join -t $'\t' -1 1 -2 1  keys.txt "${F}.3" > tmp
    mv tmp keys.txt
done

# dump result
tr "_" "\t" < keys.txt
ADD COMMENT
0
Entering edit mode

Thank you a lot. It was really helpful and it is working.

ADD REPLY

Login before adding your answer.

Traffic: 1677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6