Entering edit mode
8.1 years ago
Charles Plessy
★
2.9k
Hi Biostars, I am looking for an ontology that would describe mapping statistics typically produced by alignment pipelines, such as number of reads extracted, or mapped, or that are PCR duplicates, etc. I have not found anything with search engines...
My plan is to output quality-control files in Turtle format, for instance:
@prefix qc: <http://example.com/SuperDuperQcOntology/> .
<HeLa_cells_repl_1> qc:extracted 2674435 .
<HeLa_cells_repl_1> qc:mapped 1566239 .
<HeLa_cells_repl_1> qc:pcrdup 634533 .
<HeLa_cells_repl_2> qc:extracted 1406337 .
<HeLa_cells_repl_2> qc:mapped 989553 .
<HeLa_cells_repl_2> qc:pcrdup 373958 .
etc...
My hope is that it could benefit from SPARQL queries, while being easy to convert to tab-separated format for processing by simpler tools.