Question

454Newblermetrics.Txt Format

2

Entering edit mode

14.5 years ago

Yannick Wurm ★ 2.5k

Assembly with newbler gives a summary file, 454NewblerMetrics.txt that according to documentation is in "454 parser file" format. It looks like a simple hash structure. If I want to write a parser for this, do I need to do it from scratch? Or does this format already have a real name?

/***************************************************************************
**
**      454 Life Sciences Corporation
**         Newbler Metrics Results
**
**      Date of Assembly: 2010/10/20 14:07:53
**      Project Directory: /home/dee/keller/UHTS/ywurm/2010-09-25-littleB/results/2010-10-12-newblerAssemblies/withoutIllumina/P_2010_10_14_09_12_45_runAssembly
**      Software Release: 2.3  (091027_1459)
**
***************************************************************************/

/*
**  Input information.
*/

runData
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/shotgun/roche/C1172SID11098_Z8_reg2_sff/FW1LDDZ02.sff";

        numberOfReads = 537847, 537843;
        numberOfBases = 173640497, 172588857;
    }
[…]
}

pairedReadData
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/paired/20kb/roche/PairedEnd_RunID26803066_sff_halfplate/FX0RNLM01.sff";

        numberOfReads = 602130, 878875;
        numberOfBases = 163374476, 142729366;
        numWithPairedRead = 286117;
    }
[…]
}

/*
**  Operation metrics.
*/

runMetrics
{
    totalNumberOfReads = 16521360; 
    totalNumberOfBases = 4540313420; 

    numberSearches   = 8409112;
    seedHitsFound    = 1847363485, 219.69;
    overlapsFound    = 1834648575, 218.17, 99.31%;
    overlapsReported = 841507634, 100.07, 45.87%;
    overlapsUsed     = 18834953, 2.24, 2.24%;
}

readAlignmentResults
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/shotgun/roche/C1172SID11098_Z8_reg2_sff/FW1LDDZ02.sff";

        numAlignedReads     = 409425, 76.12%;
        numAlignedBases     = 142627063, 82.64%;
        inferredReadError  = 1.20%, 1707897;
    }
[…]
}

pairedReadResults
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/paired/20kb/roche/PairedEnd_RunID26803066_sff_halfplate/FX0RNLM01.sff";

        numAlignedReads     = 327088, 37.22%;
        numAlignedBases     = 57601622, 40.36%;
        inferredReadError  = 1.65%, 947986;

        numberWithBothMapped  = 78632;
        numWithOneUnmapped    = 38015;
        numWithMultiplyMapped = 167737;
        numWithBothUnmapped   = 1733;
    }
[…]}

/*
** Consensus distribution information.
*/
consensusDistribution
{
    fullDistribution
    {
        signalBin =  0.0, 7517321;
[…]
}


/*
**  Alignment depths.
*/
alignmentDepths
{
          1 = 7175292;
[…]
    peakDepth           = 8.0;
    estimatedGenomeSize = "567.1 MB";
}

/*
**  Consensus results.
*/
consensusResults
{
    readStatus
    {
        numAlignedReads    = 11606683, 70.25%;
        numAlignedBases    = 3617704329, 79.68%;
        inferredReadError = 1.06%, 38389865;

        numberAssembled = 9954740;
        numberPartial   = 1651943;
        numberSingleton = 858542;
        numberRepeat    = 3751116;
        numberOutlier   = 305019;
        numberTooShort  = 0;
    }

    pairedReadStatus
    {
        numberWithBothMapped   = 1239514;
        numberWithOneUnmapped  = 324133;
        numberMultiplyMapped   = 855454;
        numberWithBothUnmapped = 14981;

        library
        {
            libraryName     = "FX0RNLM01.sff";
            pairDistanceAvg = 3078.3;
            pairDistanceDev = 769.6;
        }

[…]
    }

    scaffoldMetrics
    {
        numberOfScaffolds   = 14940;
        numberOfBases       = 344205862;

        avgScaffoldSize     = 23039;
        N50ScaffoldSize     = 241728;
        largestScaffoldSize = 2015989;
    }

    largeContigMetrics
    {
        numberOfContigs   = 108123;
        numberOfBases     = 336075598;

        avgContigSize     = 3108;
        N50ContigSize     = 5423;
        largestContigSize = 79674;

        Q40PlusBases      = 327642977, 97.49%;
        Q39MinusBases     = 8432621, 2.51%;
    }

    allContigMetrics
    {
        numberOfContigs = 145244;
        numberOfBases   = 346306838;
    }
}

assembly • 3.0k views

ADD COMMENT • link updated 14.5 years ago by Daniel Standage 4.1k • written 14.5 years ago by Yannick Wurm ★ 2.5k

score 2 · Answer 1 · 2010-10-22

I wrote a parser for these types of files about a year ago (can share code if you would like). The nice thing is that all of the metrics files are in the same format so I only needed to write a single parser.

I wrote the parser as part of a 454 sample submission and tracking system. Our sysadmins, who are now maintaining the system, required that I build the system using a PHP web framework called symfony (great system btw). So the parser I wrote is also in PHP. If you're comfortable with PHP, I can send the code. If not, it shouldn't be too hard to recreate in Perl or something else. I basically loaded everything into an XML parser, using a stack data structure to keep track of the current element (popping whenever I met a }). I then used Xpath to query the file for the data I was interested in.

Is there something better for this? I don't know. I remember doing a little searching before I wrote this class, but I don't remember finding much. If you do find something let us know.