Hi all,
I'd like to share my another practical toolkit, csvtk, after introducing SeqKit yesterday.
- Documents: http://bioinf.shenwei.me/csvtk/ (Usage and Tutorial)
- Source code: https://github.com/shenwei356/csvtk
- Latest version:
Introduction
Similar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data sicence.
People usually use spreadsheet softwares like MS Excel to do process table data. However it's all by clicking and typing, which is not automatically and time-consuming to repeat, especially when we want to apply similar operations with different datasets or purposes.
You can also accomplish some CSV/TSV manipulations using shell commands, but more codes are needed to handle the header line.
csvtk
is convenient for rapid data investigation and also easy to be integrated into analysis pipelines. It could save you much time of writing Python/R scripts.
Features
- Cross-platform (Linux/Windows/Mac OS X/OpenBSD/FreeBSD)
- Light weight and out-of-the-box, no dependencies, no compilation, no configuration
- Fast, multiple-CPUs supported
- Practical functions supported by N subcommands
- Support STDIN and gziped input/output file, easy being used in pipe
- Most of the subcommands support unselecting fields and fuzzy fields, e.g.
-f "-id,-name"
for all fields except "id" and "name",-F -f "a.*"
for all fields with prefix "a.". - Support common plots (see usage)
Subcommands
25 subcommands in total.
Information
headers
print headersstat
summary of CSV filestat2
summary of selected number fields
Format conversion
pretty
convert CSV to readable aligned tablecsv2tab
convert CSV to tabular formattab2csv
convert tabular format to CSVspace2tab
convert space delimited format to CSVtranspose
transpose CSV datacsv2md
convert CSV to markdown format
Set operations
head
print first N recordssample
sampling by proportioncut
select parts of fieldsuniq
unique data without sortingfreq
frequencies of selected fieldsinter
intersection of multiple filesgrep
grep data by selected fields with patterns/regular expressionsfilter
filter rows by values of selected fields with artithmetic expressionfilter2
filter rows by awk-like artithmetic/string expressionsjoin
join multiple CSV files by selected fields
Edit
rename
rename column namesrename2
rename column names by regular expressionreplace
replace data of selected fields by regular expressionmutate
create new columns from selected fields by regular expression
Ordering
sort
sort by selected fields
Ploting
plot
see usageplot hist
histogramplot box
boxplotplot line
line plot and scatter plot
Download and install
csvtk
is implemented in Golang programming language, executable binary files for most popular operating systems are freely available in release page.
Just download compressed executable file of your operating system, and uncompress it with tar -zxvf *.tar.gz
command.
conda install -c bioconda csvtk
Learn More
- Detailed usage of subcommands
Some answer sovled by csvtk on Biostars
csvtk has 25 subcommands now. Why not give it a try?
One can accomplish some CSV/TSV manipulations using shell commands, but more codes are needed to handle the header line.