Calculating Time From Submission To Publication / Degree Of Burden In Submitting A Paper
1
21
Entering edit mode
12.1 years ago
Ryan D ★ 3.4k

I was wondering if it would be possible to calculate some kind of a metric for the speed-of-publication for each journal. I'm not sure submitted and accepted dates are available for all papers, but I noticed in XML data there are fields like the following:

<PubMedPubDate PubStatus="received">
                <Year>2011</Year>
                <Month>12</Month>
                <Day>13</Day>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="accepted">
                <Year>2012</Year>
                <Month>4</Month>
                <Day>2</Day>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="aheadofprint">
                <Year>2012</Year>
                <Month>4</Month>
                <Day>2</Day>

Is there any type of tool which calculates the average time from which a paper is submitted to the time it is published? Or is there a way that this kind of information could be abstracted from this database to give a aggregate estimate of turn-around time? Has someone already done this? And--not to get too off topic--but what other kinds of measures would be useful to evaluate the degree of burden in submitting a paper?

EDIT: Pierre really took this to the next level in answering this question. The table he produced is very interesting and informative and his complete results are posted at figshare. Check it out. Or try it out.

pubmed publication • 8.2k views
ADD COMMENT
2
Entering edit mode

I would title this question as "Degree of burden in submitting a paper" :) !

ADD REPLY
2
Entering edit mode

It would be interesting to calculate results per journal and compare to what the publisher claims is turnaround time :)

ADD REPLY
0
Entering edit mode

That's a good point. There are a lot of claims about the speed of the review process made by journals but as far as I know there is no one who checks these facts. Our experience with some journals has certainly deviated a great deal from their claims.

ADD REPLY
1
Entering edit mode

I've played with my java program and uploaded the results on figshare: http://dx.doi.org/10.6084/m9.figshare.96403

ADD REPLY
1
Entering edit mode

Wish I had this when I was trying to calculate the embargo-induced delays in publication of the ENCODE papers http://caseybergman.wordpress.com/2012/09/05/the-cost-to-science-of-the-encode-publication-embargo/

ADD REPLY
0
Entering edit mode

Very useful idea!

ADD REPLY
0
Entering edit mode

This is an issue in the wet-lab world for sure: http://www.nature.com/news/2011/110427/full/472391a.html

I wonder if there is a similar phenomenon among bioinformatics journals. "Please provide tests of extra use cases..." that sort of thing. Anyone had that experience?

ADD REPLY
11
Entering edit mode
12.1 years ago

The following java program parses a pubmed XML from stdin and prints the difference of days beteen "received" and "accepted":

import java.io.InputStream;
import java.util.GregorianCalendar;
import java.util.concurrent.TimeUnit;

import javax.xml.namespace.QName;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.XMLEvent;




public class Biostar54473
    {
    private static class PubMedPubDate
        {
        int year;
        int month=-1;
        int day=-1;
        @Override
        public String toString() {
            String s=String.format("%04d", year);
            if(month!=-1)
                {
                s+="-"+String.format("%02d", month);
                if(day!=-1)
                    {
                    s+="-"+String.format("%02d", day);
                    }
                }
            return s;
            }
        long getTimeInMillis()
            {
            GregorianCalendar cal=new GregorianCalendar(
                    year,
                    month==-1?0:month-1,
                    month==-1 || day==-1?
                    1:day);
            return cal.getTimeInMillis();
            }
        }

    private void parse(InputStream in) throws Exception
        {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, Boolean.FALSE);
        factory.setProperty(XMLInputFactory.IS_VALIDATING, Boolean.FALSE);
        factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
        factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, Boolean.FALSE);
        XMLEventReader r= factory.createXMLEventReader(in);
        String PubStatus=null;
        PubMedPubDate curr=null;
        PubMedPubDate accepted=null;
        PubMedPubDate received=null;
        String MedlineTA=null;
        String pmid=null;
        String ArticleTitle=null;
        QName attPubStatus=new QName("PubStatus");
        while(r.hasNext())
            {
            XMLEvent evt=r.nextEvent();
            if(evt.isStartElement())
                {
                String name=evt.asStartElement().getName().getLocalPart();
                if(name.equals("PubmedArticle"))
                    {
                    pmid=null;
                    accepted=null;
                    received=null;
                    MedlineTA=null;
                    pmid=null;
                    ArticleTitle=null;
                    }
                else if(name.equals("ArticleTitle") && ArticleTitle==null)
                    {
                    ArticleTitle=r.getElementText().trim();
                    }
                else if(name.equals("PMID") && pmid==null)
                    {
                    pmid=r.getElementText().trim();
                    }
                else if(name.equals("MedlineTA") && MedlineTA==null)
                    {
                    MedlineTA=r.getElementText().trim();
                    }
                else if(name.equals("PubMedPubDate"))
                    {
                    curr=null;
                    Attribute att=evt.asStartElement().getAttributeByName(attPubStatus);
                    if(att!=null) PubStatus=att.getValue();

                    if("received".equals(PubStatus))
                        {
                        curr=new PubMedPubDate();
                        received=curr;
                        }
                    else if("accepted".equals(PubStatus))
                        {
                        curr=new PubMedPubDate();
                        accepted=curr;
                        }
                    else
                        {
                        curr=null;
                        }
                    }


else if(curr!=null && name.equals("Year"))
                    {
                    try { curr.year=Integer.parseInt(r.getElementText().trim()); } catch(Exception err) { curr=null;received=null;ok=false;}
                    }
                else if(curr!=null && name.equals("Month"))
                    {
                    String month=r.getElementText().trim().toLowerCase();
                    if(month.equals("jan") || month.equals("january")) month="1";
                    else if(month.equals("feb") || month.equals("february")) month="2";
                    else if(month.equals("mar") || month.equals("march")) month="3";
                    else if(month.equals("apr") || month.equals("april")) month="4";
                    else if(month.equals("may") || month.equals("may")) month="5";                    
                    else if(month.equals("jun") || month.equals("june")) month="6";
                    else if(month.equals("jul") || month.equals("july")) month="7";
                    else if(month.equals("aug") || month.equals("august")) month="8";
                    else if(month.equals("sep") || month.equals("september")) month="9";
                    else if(month.equals("oct") || month.equals("october")) month="10";
                    else if(month.equals("nov") || month.equals("november")) month="11";
                    else if(month.equals("dec") || month.equals("december")) month="12";
                    try { curr.month=Integer.parseInt(month); } catch(Exception err) { curr=null;accepted=null;ok=false;}
                    }
                else if(curr!=null && name.equals("Day"))
                    {
                    try { curr.day=Integer.parseInt(r.getElementText().trim()); } catch(Exception err) { curr=null;accepted=null;ok=false;}
                    }

                }
            else if(evt.isEndElement())
                {
                String name=evt.asEndElement().getName().getLocalPart();
                if(name.equals("PubmedArticle"))
                    {
                    if(received!=null && accepted!=null)
                        {
                        long n=accepted.getTimeInMillis()-received.getTimeInMillis();
                        System.out.println(
                                pmid+"\t"+
                                ArticleTitle+"\t"+
                                MedlineTA+"\t"+
                                received+"\t"+
                                accepted+"\t"+
                                TimeUnit.DAYS.convert(n, TimeUnit.MILLISECONDS)
                                );
                        }
                    ArticleTitle=null;
                    MedlineTA=null;
                    pmid=null;
                    curr=null;
                    received=null;
                    accepted=null;
                    }
                else if(name.equals("PubMedPubDate"))
                    {
                    curr=null;
                    }
                }
            }
        }
    public static void main(String[] args) throws Exception
        {
        System.out.println("#pmid\t"+
                "ArticleTitle\t"+
                "MedlineTA\t"+
                "Received\t"+
                "Accepted\t"+
                "DiffDays"
                );
        new Biostar54473().parseSystem.in);
        }

}

A 'verticalized' example for a few papers containing the word "Next generation Sequencing" in the title. You can read this in R# or whatever to get some stats about a journal, a subject, etc...

$ javac Biostar54473.java && cat pubmed_result.xml | java Biostar54473

>>>    2
$1    #pmid           23020966
$2    ArticleTitle    Transcriptome analysis using next-generation sequencing.
$3    MedlineTA       Curr Opin Biotechnol
$4    Received        2012-07-04
$5    Accepted        2012-09-04
$6    DiffDays        62
<<<    2

>>>    3
$1    #pmid           23000871
$2    ArticleTitle    Understanding pathogens in the era of next generation sequencing.
$3    MedlineTA       J Infect Dev Ctries
$4    Received        2012-09-13
$5    Accepted        2012-09-14
$6    DiffDays        1
<<<    3

>>>    4
$1    #pmid           22994565
$2    ArticleTitle    Accurate variant detection across non-amplified and whole genome amplified DNA using targeted next generation sequencing.
$3    MedlineTA       BMC Genomics
$4    Received        2012-01-30
$5    Accepted        2012-09-20
$6    DiffDays        233
<<<    4
(...)
>>>    253
$1    #pmid           18604217
$2    ArticleTitle    Alta-Cyclic: a self-optimizing base caller for next-generation sequencing.
$3    MedlineTA       Nat Methods
$4    Received        2008-03-10
$5    Accepted        2008-06-02
$6    DiffDays        83
<<<    253

>>>    254
$1    #pmid           18262675
$2    ArticleTitle    The impact of next-generation sequencing technology on genetics.
$3    MedlineTA       Trends Genet
$4    Received        2007-11-15
$5    Accepted        2007-12-17
$6    DiffDays        32
<<<    254
ADD COMMENT
1
Entering edit mode

The year/month/day are not always some valid integers. I've updated my code to catch the errors.

ADD REPLY
0
Entering edit mode

Fantastic. Thanks for such an awesome answer, Pierre.

ADD REPLY
0
Entering edit mode

This looks like it should work. I'm unfamiliar with java so much. I got an error: javac Biostar54473.java && cat pubmed_result.xml | java Biostar54473

pmid ArticleTitle MedlineTA Received Accepted DiffDays

Exception in thread "main" javax.xml.stream.XMLStreamException: ParseError at [row,col]:[132,2] Message: The markup in the document following the root element must be well-formed. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:591) at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83) at Biostar54473.parse(Biostar54473.java:63) at Biostar54473.main(Biostar54473.java:162)

Any ideas?

ADD REPLY
1
Entering edit mode

please run "xmllint pubmed_result.xml" to check your xml file.

ADD REPLY
0
Entering edit mode

Perfect. That showed my XML file was malformed. The new file worked perfectly. One way I can think to improve this would be to use an alternate date if one of those is not available. For instance, of 2608 Pubmed articles on "Next Generation Sequencing", I only get output for . This is because only 1114 have an entry for <PubMedPubDate PubStatus="received"> and <PubMedPubDate PubStatus="accepted">. This is still really great. And doing as Pierre said and loading the results into R can give a great idea of the average "degree of burden" in submitting a paper as Khadeer called it. :-) Masterful. Thanks again, Pierre.

ADD REPLY
2
Entering edit mode

will you prepare a manuscript indicating your results? keep us up to date!

ADD REPLY
7
Entering edit mode

Hopefully the reviewers do not request that you apply your method to the current paper, and thus enter an infinite recursion loop.

ADD REPLY
0
Entering edit mode

now seriously, I am sure this has been previously studied and reported in some of those bibliometrics journals. Who will be the first to find some of this papers? :)

ADD REPLY
0
Entering edit mode

That's hilarious. Really I had just wondered for my own sake of curiosity. I think our rather large group would like to know.

ADD REPLY

Login before adding your answer.

Traffic: 1948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6