Entering edit mode
3.4 years ago
FadyNabil
▴
20
I have these two files:
the first one looks like that:
################################
## Counts of transcripts, etc.
################################
Total trinity 'genes': 2478
Total trinity transcripts: 2615
Percent GC: 54.11
########################################
Stats based on ALL transcript contigs:
########################################
Contig N10: 4830
Contig N20: 2877
Contig N30: 1729
Contig N40: 1181
Contig N50: 819
Median contig length: 315
Average contig: 576.10
Total assembled bases: 1506492
#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################
Contig N10: 3232
Contig N20: 1828
Contig N30: 1181
Contig N40: 835
Contig N50: 607
Median contig length: 306
Average contig: 497.60
Total assembled bases: 1233051
the second one looks like that:
################################
## Counts of transcripts, etc.
################################
Total trinity 'genes': 2467
Total trinity transcripts: 2516
Percent GC: 53.97
########################################
Stats based on ALL transcript contigs:
########################################
Contig N10: 3700
Contig N20: 2225
Contig N30: 1448
Contig N40: 981
Contig N50: 681
Median contig length: 309
Average contig: 526.77
Total assembled bases: 1325345
#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################
Contig N10: 3183
Contig N20: 1912
Contig N30: 1235
Contig N40: 856
Contig N50: 607
Median contig length: 308
Average contig: 500.83
Total assembled bases: 1235540
my desired output file should contain two columns
the first column should be like that:
################################
## Counts of transcripts, etc.
################################
Total trinity 'genes': 2478 2467
Total trinity transcripts: 2615 2516
Percent GC: 54.11 53.97
the second column should be like that:
########################################
Stats based on ALL transcript contigs:
########################################
Contig N10: 4830 3700
Contig N20: 2877 2225
Contig N30: 1729 1448
Contig N40: 1181 981
Contig N50: 819 681
Median contig length: 315 309
Average contig: 576.10 526.77
Total assembled bases: 1506492 1325345
#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################
Contig N10: 3232 3183
Contig N20: 1828 1912
Contig N30: 1181 1235
Contig N40: 835 856
Contig N50: 607 607
Median contig length: 306 308
Average contig: 497.60 500.83
Total assembled bases: 1233051 1235540
I tried this code
paste file_1 file_2 > outputfile.txt
but it does not get what I want
Built-in utilities cannot do this for you, as you're looking at a custom operation. You will need to write your own script. You may want to follow this approach:
:
as delimiter and print 1st field only:
as delimiter,paste
these output streams to the output stream from step 1.Try the above and fine tune to get your desired output.
EDIT: You mention two "columns" in your desired output, and separate blocks of lines into those two columns. Are you describing your requirement correctly? Do you need the stats in separate columns or just need separate columns for the two individual files' content?
i need the stats in separate columns
You're going to have to write custom awk/python code to do this. Try some scripting on your own and ask for help if you run into difficulties.