Assembly with newbler gives a summary file, 454NewblerMetrics.txt that according to documentation is in "454 parser file" format. It looks like a simple hash structure. If I want to write a parser for this, do I need to do it from scratch? Or does this format already have a real name?
/***************************************************************************
**
** 454 Life Sciences Corporation
** Newbler Metrics Results
**
** Date of Assembly: 2010/10/20 14:07:53
** Project Directory: /home/dee/keller/UHTS/ywurm/2010-09-25-littleB/results/2010-10-12-newblerAssemblies/withoutIllumina/P_2010_10_14_09_12_45_runAssembly
** Software Release: 2.3 (091027_1459)
**
***************************************************************************/
/*
** Input information.
*/
runData
{
file
{
path = "/home/dee/keller/UHTS/454/littleB/shotgun/roche/C1172SID11098_Z8_reg2_sff/FW1LDDZ02.sff";
numberOfReads = 537847, 537843;
numberOfBases = 173640497, 172588857;
}
[…]
}
pairedReadData
{
file
{
path = "/home/dee/keller/UHTS/454/littleB/paired/20kb/roche/PairedEnd_RunID26803066_sff_halfplate/FX0RNLM01.sff";
numberOfReads = 602130, 878875;
numberOfBases = 163374476, 142729366;
numWithPairedRead = 286117;
}
[…]
}
/*
** Operation metrics.
*/
runMetrics
{
totalNumberOfReads = 16521360;
totalNumberOfBases = 4540313420;
numberSearches = 8409112;
seedHitsFound = 1847363485, 219.69;
overlapsFound = 1834648575, 218.17, 99.31%;
overlapsReported = 841507634, 100.07, 45.87%;
overlapsUsed = 18834953, 2.24, 2.24%;
}
readAlignmentResults
{
file
{
path = "/home/dee/keller/UHTS/454/littleB/shotgun/roche/C1172SID11098_Z8_reg2_sff/FW1LDDZ02.sff";
numAlignedReads = 409425, 76.12%;
numAlignedBases = 142627063, 82.64%;
inferredReadError = 1.20%, 1707897;
}
[…]
}
pairedReadResults
{
file
{
path = "/home/dee/keller/UHTS/454/littleB/paired/20kb/roche/PairedEnd_RunID26803066_sff_halfplate/FX0RNLM01.sff";
numAlignedReads = 327088, 37.22%;
numAlignedBases = 57601622, 40.36%;
inferredReadError = 1.65%, 947986;
numberWithBothMapped = 78632;
numWithOneUnmapped = 38015;
numWithMultiplyMapped = 167737;
numWithBothUnmapped = 1733;
}
[…]}
/*
** Consensus distribution information.
*/
consensusDistribution
{
fullDistribution
{
signalBin = 0.0, 7517321;
[…]
}
/*
** Alignment depths.
*/
alignmentDepths
{
1 = 7175292;
[…]
peakDepth = 8.0;
estimatedGenomeSize = "567.1 MB";
}
/*
** Consensus results.
*/
consensusResults
{
readStatus
{
numAlignedReads = 11606683, 70.25%;
numAlignedBases = 3617704329, 79.68%;
inferredReadError = 1.06%, 38389865;
numberAssembled = 9954740;
numberPartial = 1651943;
numberSingleton = 858542;
numberRepeat = 3751116;
numberOutlier = 305019;
numberTooShort = 0;
}
pairedReadStatus
{
numberWithBothMapped = 1239514;
numberWithOneUnmapped = 324133;
numberMultiplyMapped = 855454;
numberWithBothUnmapped = 14981;
library
{
libraryName = "FX0RNLM01.sff";
pairDistanceAvg = 3078.3;
pairDistanceDev = 769.6;
}
[…]
}
scaffoldMetrics
{
numberOfScaffolds = 14940;
numberOfBases = 344205862;
avgScaffoldSize = 23039;
N50ScaffoldSize = 241728;
largestScaffoldSize = 2015989;
}
largeContigMetrics
{
numberOfContigs = 108123;
numberOfBases = 336075598;
avgContigSize = 3108;
N50ContigSize = 5423;
largestContigSize = 79674;
Q40PlusBases = 327642977, 97.49%;
Q39MinusBases = 8432621, 2.51%;
}
allContigMetrics
{
numberOfContigs = 145244;
numberOfBases = 346306838;
}
}
Thanks Daniel, I agree that it should be relatively straightforward to create a parser from scratch. Thanks for the offer - my weapon of choice is ruby, so I'll keep you posted on that :)