Hi,
I have a raw counts data with barcodes in columns and genes in rows, and a list of correspondance of barcodes and sample numbers
How I can map barcodes to sample numbers?
Hi,
I have a raw counts data with barcodes in columns and genes in rows, and a list of correspondance of barcodes and sample numbers
How I can map barcodes to sample numbers?
bar and bartable are borrowed from shenwei356 from above post:
bar=read.csv("bar.txt", sep="\t", header = T, strip.white = T,stringsAsFactors = F)
bartable=read.csv("table.tsv", sep="\t", header = T, strip.white = T, stringsAsFactors = F)
bartable
> bartable
gene ATAGTTCTCGT GAAGCAGTATG GAAGACTTGGT AAAAAAAAAA
1 gene1 0 0 3 0
2 gen1e2 0 0 0 0
> bar
Sample Barcode
1 sc1 CCTAGATTAAT
2 sc2 GAAGACTTGGT
3 sc3 GAAGCAGTATG
4 sc4 GGTAACCTGAC
5 sc5 ATAGTTCTCGT
for (i in colnames(bartable)){
if ( i %in% bar$Barcode){
colnames(bartable)[match(i,colnames(bartable))] = as.character(bar[which(bar$Barcode==i),][1])
}
}
> bartable
gene sc5 sc3 sc2 AAAAAAAAAA
1 gene1 0 0 3 0
2 gen1e2 0 0 0 0
Try csvtk, supporting the two files are tab-separated.
updated with v0.14.0 or later version
./csvtk rename2 -t -f -gene -p '(.+)' -r '{kv}' -k <(./csvtk cut -t -f 2,1 barcodes.tsv) -K ounts.tsv> result.tsv
Example:
$ cat barcodes.tsv
Sample Barcode
sc1 CCTAGATTAAT
sc2 GAAGACTTGGT
sc3 GAAGCAGTATG
sc4 GGTAACCTGAC
sc5 ATAGTTCTCGT
$ cat table.tsv
gene ATAGTTCTCGT GAAGCAGTATG GAAGACTTGGT AAAAAAAAAA
gene1 0 0 3 0
gen1e2 0 0 0 0
# note that, we must arrange the order of barcodes.tsv in KEY-VALUE
$ csvtk cut -t -f 2,1 barcodes.tsv
Barcode Sample
CCTAGATTAAT sc1
GAAGACTTGGT sc2
GAAGCAGTATG sc3
GGTAACCTGAC sc4
ATAGTTCTCGT sc5
# here we go!!!!
$ csvtk rename2 -t -k <(csvtk cut -t -f 2,1 barcodes.tsv) -f -1 -p '(.+)' -r '{kv}' --key-miss-repl unknown table.tsv
gene sc5 sc3 sc2 unknown
gene1 0 0 3 0
gen1e2 0 0 0 0
original answer
$ csvtk transpose -t table.tsv \
| csvtk replace -t -f gene -p '^(.+)$' -r '{kv}' -k <(csvtk cut -t -f 2,1 barcodes.tsv) -K \
| csvtk transpose -t \
> result.tsv
It's a little verbose, I will make csvtk rename2
supporting {kv}
soon so we can avoid using transpose
.
Thank you, I downloaded but there is only an executable thing named csvtk. I can't figure out how to deal with that
I set work directory to executable file but saying
dhcp179185:Downloads $ csvtk transpose -t counts.tsv \
> | csvtk replace -t -f gene -p '^(.+)$' -r '{kv}' -k <(csvtk cut -t -f 2,1 barcodes.tsv) -K \
>
-bash: csvtk: command not found
From this thread, try under R :
counts <- read.table(file="/path/to/counts.csv", sep="\t", header=TRUE, row.names=1)
samples <- read.table(file="/path/to/samples.csv", sep="\t", header=TRUE)
counts$id <- row.names(counts)
mdfa <- reshape2::melt(counts, id.vars = "id", variable.name = "Barcode")
reshape2::dcast(merge(samples, mdfa, by = "Barcode"), id ~ Sample, fun.aggregate = sum)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What are you thinking when you say :
Could you make a example of the expected results please
Are you using R, Python, Perl... ?
Your raw counts are in files, dataframe or matrix ?
Thank you, I am in R and mac OS. both data are in separate matrices. I expect that my raw counts file has sample names (h16.sc1, h16.sc2, etc) in columns instead of barcodes.