Picard CollectAlignmentMetrics Output
1
0
Entering edit mode
7.4 years ago

I am confused by the output that Picard CollectAlignmentMetrics is giving me. I think it is the format.

## METRICS CLASS    picard.analysis.AlignmentSummaryMetrics
CATEGORY    TOTAL_READS PF_READS    PCT_PF_READS    PF_NOISE_READS  PF_READS_ALIGNED    PCT_PF_READS_ALIGNED    PF_ALIGNED_BASES    PF_HQ_ALIGNED_READS PF_HQ_ALIGNED_BASES PF_HQ_ALIGNED_Q20_BASES PF_HQ_MEDIAN_MISMATCHES PF_MISMATCH_RATE    PF_HQ_ERROR_RATE    PF_INDEL_RATE   MEAN_READ_LENGTH    READS_ALIGNED_IN_PAIRS  PCT_READS_ALIGNED_IN_PAIRS  BAD_CYCLES  STRAND_BALANCE  PCT_CHIMERAS    PCT_ADAPTER SAMPLE  LIBRARY READ_GROUP
FIRST_OF_PAIR   1000000 1000000 1   0   1000000 1   99990119    999271  99917232    99487798    0   0.003925    0.003924    0.00009100  1000000 1   0   0.50084 0   0           
SECOND_OF_PAIR  1000000 1000000 1   0   1000000 1   99989751    999225  99912264    99481794    0   0.003919    0.003919    0.00009100  1000000 1   0   0.49916 0   0           
PAIR    2000000 2000000 1   0   2000000 1   199979870   1998496 199829496   198969592   0   0.003922    0.003922    0.000092    100 2000000 1   0   0.5 0   0

Is there a way to format this so that it is readable?

Thanks

Picard CollectAlignmentMetrics • 1.7k views
ADD COMMENT
0
Entering edit mode

You can add formatting by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

I'm not sure how the output should look like so I haven't changed your post.

ADD REPLY
0
Entering edit mode
7.4 years ago

using awk:

 curl -Ls "https://raw.githubusercontent.com/ewels/MultiQC_TestData/master/data/modules/picard/CollectMultipleMetrics/A1.sorted.dup.recal.all.metrics.alignment_summary_metrics" |\
awk '/^CATEGORY/ {split($0,header);n=1;next; } {if(n!=1) next; for(i=2;i<=NF;++i) printf("%s\t%s\t%s\n",$1,header[i],$i);}' |\
column -t

.

FIRST_OF_PAIR   TOTAL_READS                 48300305
FIRST_OF_PAIR   PF_READS                    48300305
FIRST_OF_PAIR   PCT_PF_READS                1
FIRST_OF_PAIR   PF_NOISE_READS              0
FIRST_OF_PAIR   PF_READS_ALIGNED            48290537
FIRST_OF_PAIR   PCT_PF_READS_ALIGNED        0.999798
FIRST_OF_PAIR   PF_ALIGNED_BASES            6040566328
FIRST_OF_PAIR   PF_HQ_ALIGNED_READS         47217927
FIRST_OF_PAIR   PF_HQ_ALIGNED_BASES         5910135054
FIRST_OF_PAIR   PF_HQ_ALIGNED_Q20_BASES     5804787451
FIRST_OF_PAIR   PF_HQ_MEDIAN_MISMATCHES     0
FIRST_OF_PAIR   PF_MISMATCH_RATE            0.00187
FIRST_OF_PAIR   PF_HQ_ERROR_RATE            0.001761
FIRST_OF_PAIR   PF_INDEL_RATE               0.000093
FIRST_OF_PAIR   MEAN_READ_LENGTH            125.537288
FIRST_OF_PAIR   READS_ALIGNED_IN_PAIRS      48246679
FIRST_OF_PAIR   PCT_READS_ALIGNED_IN_PAIRS  0.999092
FIRST_OF_PAIR   BAD_CYCLES                  0
FIRST_OF_PAIR   STRAND_BALANCE              0.44083
FIRST_OF_PAIR   PCT_CHIMERAS                0.001303
FIRST_OF_PAIR   PCT_ADAPTER                 0.000001
SECOND_OF_PAIR  TOTAL_READS                 48300305
SECOND_OF_PAIR  PF_READS                    48300305
SECOND_OF_PAIR  PCT_PF_READS                1
SECOND_OF_PAIR  PF_NOISE_READS              0
SECOND_OF_PAIR  PF_READS_ALIGNED            48250591
SECOND_OF_PAIR  PCT_PF_READS_ALIGNED        0.998971
SECOND_OF_PAIR  PF_ALIGNED_BASES            6025327891
SECOND_OF_PAIR  PF_HQ_ALIGNED_READS         47170746
SECOND_OF_PAIR  PF_HQ_ALIGNED_BASES         5894822755
SECOND_OF_PAIR  PF_HQ_ALIGNED_Q20_BASES     5697735258
SECOND_OF_PAIR  PF_HQ_MEDIAN_MISMATCHES     0
SECOND_OF_PAIR  PF_MISMATCH_RATE            0.002515
SECOND_OF_PAIR  PF_HQ_ERROR_RATE            0.002401
SECOND_OF_PAIR  PF_INDEL_RATE               0.000097
SECOND_OF_PAIR  MEAN_READ_LENGTH            125.429941
SECOND_OF_PAIR  READS_ALIGNED_IN_PAIRS      48246679
SECOND_OF_PAIR  PCT_READS_ALIGNED_IN_PAIRS  0.999919
SECOND_OF_PAIR  BAD_CYCLES                  0
SECOND_OF_PAIR  STRAND_BALANCE              0.559123
SECOND_OF_PAIR  PCT_CHIMERAS                0.001303
SECOND_OF_PAIR  PCT_ADAPTER                 0.000001
PAIR            TOTAL_READS                 96600610
PAIR            PF_READS                    96600610
PAIR            PCT_PF_READS                1
PAIR            PF_NOISE_READS              0
PAIR            PF_READS_ALIGNED            96541128
PAIR            PCT_PF_READS_ALIGNED        0.999384
PAIR            PF_ALIGNED_BASES            12065894219
PAIR            PF_HQ_ALIGNED_READS         94388673
PAIR            PF_HQ_ALIGNED_BASES         11804957809
PAIR            PF_HQ_ALIGNED_Q20_BASES     11502522709
PAIR            PF_HQ_MEDIAN_MISMATCHES     0
PAIR            PF_MISMATCH_RATE            0.002192
PAIR            PF_HQ_ERROR_RATE            0.002081
PAIR            PF_INDEL_RATE               0.000095
PAIR            MEAN_READ_LENGTH            125.483615
PAIR            READS_ALIGNED_IN_PAIRS      96493358
PAIR            PCT_READS_ALIGNED_IN_PAIRS  0.999505
PAIR            BAD_CYCLES                  0
PAIR            STRAND_BALANCE              0.499952
PAIR            PCT_CHIMERAS                0.001303
PAIR            PCT_ADAPTER                 0.000001
ADD COMMENT
0
Entering edit mode

Thank you. What does the curl do at the beginning of your code?

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

curl just gets a summary file i've found on the web

ADD REPLY
0
Entering edit mode

Sorry to ask another question, but I am confused as to how actually use the above command. I am new to linux. My filename is metrics.txt so would i put that at the end of the awk command?

ADD REPLY
0
Entering edit mode

o would i put that at the end of the awk command

yes or just 'cat'

cat your.file.metrics |\
awk '/^CATEGORY/ {split($0,header);n=1;next; } {if(n!=1) next; for(i=2;i<=NF;++i) printf("%s\t%s\t%s\n",$1,header[i],$i);}'  |\
column -t
ADD REPLY
0
Entering edit mode

Another question. It does not recognise the column -t section

ADD REPLY
0
Entering edit mode

Strange.. column is quite common.

but your can jyst remove the 'column -t' part. It's just here to reformat the text; or install 'column' https://unix.stackexchange.com/questions/330506/

ADD REPLY
0
Entering edit mode

Thanks, it seems to be working now. Is there a way to change the format in the actual text file? Not the version printed to the terminal

ADD REPLY

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6