Hi,
We are developing some software with our Archer NGS kits that produces some CNV data and I was wondering what the best (or, more appropriately, most common) standard data file format is for CNV data? I hate to invent YAF (Yet Another Format) since we are already drowning in data file formats.
I found a list but that is mostly based on SNP arrays, and what we want to produce is NGS coverage-based data so not sure that most of these formats are relevant/appropriate.
- Affymetrix ChAS: Copy Number Segment Data (tsv)
- Affymetrix CNAG: Copy Number Data File (txt), Copy Number Segment Data (txt), LOH Segment Data (txt)
- Affymetrix CNAT: Affymetrix Copy number CNT file (cn.cnt)
- Affymetrix GTC 3: Copy Number or LOH Data File (cnchp|lohchp), Copy Number Segment Data (cn_segments|tsv)
- Agilent: Aberration and LOH Interval Report (tsv|xls), Agilent Interval-based aberration report (tsv|xls), Agilent Probe-based aberration report (tsv|xls)
- ArrayCGHBase: ArrayCGHBase aberration report (txt)
- BlueGnome: BlueFuse CGH Summary (xls)
- Illumina GenomeStudio: Copy Number Data File (txt), QuantiSNP GenomeStudio Plugin bookmark file (txt)
- Illumina KaryoStudio: KaryoStudio regions file (txt)
- Nexus Copy Number: Nexus regions file (txt)
- NimbleGen: NimbleGen data summary file (txt), NimbleGen Segtable file (txt)
- OGT CytoSure: OGT aberration report (txt)
- QuantiSNP: QuantiSNP result file (txt), QuantisSNP GenomeStudio Plugin bookmark file (txt)
Thanks
Thon
Enzymatics, Inc.
this format seems the most obvious and sensible. It's BED format, so can be manipulated by a lot of tools or simple scripting and conveys all the needed info.
I used also this simple and reduced format.
If needed, you can add fields such like "Probe_Values" and "Probe_p.values" concatenated.