Question

how to merge two files in two columns using bash command lines

0

Entering edit mode

3.5 years ago

FadyNabil ▴ 20

I have these two files:

the first one looks like that:

################################
## Counts of transcripts, etc.
################################
Total trinity 'genes':  2478
Total trinity transcripts:  2615
Percent GC: 54.11

########################################
Stats based on ALL transcript contigs:
########################################

    Contig N10: 4830
    Contig N20: 2877
    Contig N30: 1729
    Contig N40: 1181
    Contig N50: 819

    Median contig length: 315
    Average contig: 576.10
    Total assembled bases: 1506492


#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

    Contig N10: 3232
    Contig N20: 1828
    Contig N30: 1181
    Contig N40: 835
    Contig N50: 607

    Median contig length: 306
    Average contig: 497.60
    Total assembled bases: 1233051

the second one looks like that:

################################
## Counts of transcripts, etc.
################################
Total trinity 'genes':  2467
Total trinity transcripts:  2516
Percent GC: 53.97

########################################
Stats based on ALL transcript contigs:
########################################

    Contig N10: 3700
    Contig N20: 2225
    Contig N30: 1448
    Contig N40: 981
    Contig N50: 681

    Median contig length: 309
    Average contig: 526.77
    Total assembled bases: 1325345


#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

    Contig N10: 3183
    Contig N20: 1912
    Contig N30: 1235
    Contig N40: 856
    Contig N50: 607

    Median contig length: 308
    Average contig: 500.83
    Total assembled bases: 1235540

my desired output file should contain two columns

the first column should be like that:

################################
## Counts of transcripts, etc.
################################
Total trinity 'genes':  2478 2467
Total trinity transcripts:  2615 2516
Percent GC: 54.11 53.97

the second column should be like that:

########################################
Stats based on ALL transcript contigs:
########################################

    Contig N10: 4830 3700
    Contig N20: 2877 2225
    Contig N30: 1729 1448
    Contig N40: 1181 981
    Contig N50: 819 681

    Median contig length: 315 309
    Average contig: 576.10 526.77
    Total assembled bases: 1506492 1325345


#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

    Contig N10: 3232 3183
    Contig N20: 1828 1912
    Contig N30: 1181 1235
    Contig N40: 835 856
    Contig N50: 607 607

    Median contig length: 306 308
    Average contig: 497.60 500.83
    Total assembled bases: 1233051 1235540

I tried this code

paste file_1 file_2 > outputfile.txt

but it does not get what I want

assembly fasta trinity • 1.2k views

ADD COMMENT • link updated 3.5 years ago by husensofteng ▴ 410 • written 3.5 years ago by FadyNabil ▴ 20

0

Entering edit mode

Built-in utilities cannot do this for you, as you're looking at a custom operation. You will need to write your own script. You may want to follow this approach:

Separate first file using : as delimiter and print 1st field only
From both files, print the second field with : as delimiter, paste these output streams to the output stream from step 1.

Try the above and fine tune to get your desired output.

EDIT: You mention two "columns" in your desired output, and separate blocks of lines into those two columns. Are you describing your requirement correctly? Do you need the stats in separate columns or just need separate columns for the two individual files' content?

ADD REPLY • link 3.5 years ago by Ram 44k

0

Entering edit mode

i need the stats in separate columns

ADD REPLY • link 3.5 years ago by FadyNabil ▴ 20

0

Entering edit mode

You're going to have to write custom awk/python code to do this. Try some scripting on your own and ask for help if you run into difficulties.

ADD REPLY • link 3.5 years ago by Ram 44k

score 0 · Answer 1 · 2021-06-24

I some times use bedtools groupby for such task. But your file should have proper columns for that to work (no extra spaces and tab-separated columns). Here is a possible script that would need some modifications from your side:

cat file1.txt file2.txt | awk '{if($0~":") {gsub(" ",""); gsub(":", "\t"); print $0}}' | sort -k1 | bedtools groupby -i stdin -g 1 -c 2 -o collapse