Tool:csvtk - a cross-platform, efficient, practical and pretty CSV/TSV toolkit
2
10
Entering edit mode
8.3 years ago

Hi all,

I'd like to share my another practical toolkit, csvtk, after introducing SeqKit yesterday.

Introduction

Similar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data sicence.

People usually use spreadsheet softwares like MS Excel to do process table data. However it's all by clicking and typing, which is not automatically and time-consuming to repeat, especially when we want to apply similar operations with different datasets or purposes.

You can also accomplish some CSV/TSV manipulations using shell commands, but more codes are needed to handle the header line.

csvtk is convenient for rapid data investigation and also easy to be integrated into analysis pipelines. It could save you much time of writing Python/R scripts.

Features

  • Cross-platform (Linux/Windows/Mac OS X/OpenBSD/FreeBSD)
  • Light weight and out-of-the-box, no dependencies, no compilation, no configuration
  • Fast, multiple-CPUs supported
  • Practical functions supported by N subcommands
  • Support STDIN and gziped input/output file, easy being used in pipe
  • Most of the subcommands support unselecting fields and fuzzy fields, e.g. -f "-id,-name" for all fields except "id" and "name", -F -f "a.*" for all fields with prefix "a.".
  • Support common plots (see usage)

Subcommands

25 subcommands in total.

Information

  • headers print headers
  • stat summary of CSV file
  • stat2 summary of selected number fields

Format conversion

  • pretty convert CSV to readable aligned table
  • csv2tab convert CSV to tabular format
  • tab2csv convert tabular format to CSV
  • space2tab convert space delimited format to CSV
  • transpose transpose CSV data
  • csv2md convert CSV to markdown format

Set operations

  • head print first N records
  • sample sampling by proportion
  • cut select parts of fields
  • uniq unique data without sorting
  • freq frequencies of selected fields
  • inter intersection of multiple files
  • grep grep data by selected fields with patterns/regular expressions
  • filter filter rows by values of selected fields with artithmetic expression
  • filter2 filter rows by awk-like artithmetic/string expressions
  • join join multiple CSV files by selected fields

Edit

  • rename rename column names
  • rename2 rename column names by regular expression
  • replace replace data of selected fields by regular expression
  • mutate create new columns from selected fields by regular expression

Ordering

  • sort sort by selected fields

Ploting

  • plot see usage
    • plot hist histogram
    • plot box boxplot
    • plot line line plot and scatter plot

Download and install

csvtk is implemented in Golang programming language, executable binary files for most popular operating systems are freely available in release page.

Just download compressed executable file of your operating system, and uncompress it with tar -zxvf *.tar.gz command.

Or install via conda Install-with-conda Anaconda Cloud downloads

conda install -c bioconda csvtk

Learn More

CSV Golang TSV • 5.8k views
ADD COMMENT
0
Entering edit mode

csvtk has 25 subcommands now. Why not give it a try?

One can accomplish some CSV/TSV manipulations using shell commands, but more codes are needed to handle the header line.

ADD REPLY
1
Entering edit mode
7.8 years ago
Ram 44k

This will be super useful. What would it take to push it to homebrew?

ADD COMMENT
0
Entering edit mode

Someone called this too, I may push it when I'm free. But I use Linux and I'm not familiar with Mac OS X. That's would be great if someone pushes it.

ADD REPLY
0
Entering edit mode

Let's discuss this on Twitter soon - I'm on vacation now. I think the author cannot recommend their own tool, so we can check out the procedure later.

ADD REPLY
1
Entering edit mode

FYI, versions 0.4.4 thru 0.7.0 are available for Linux & OSX via bioconda

ADD REPLY
1
Entering edit mode
17 months ago

After 7 years, the number of subcommands is doubled, with more functions and features added.

Today, I just released another new version. It definitely worths a try.

ADD COMMENT

Login before adding your answer.

Traffic: 4549 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6