Mapping columns based on a list
3
0
Entering edit mode
6.5 years ago
Za ▴ 140

Hi,

I have a raw counts data with barcodes in columns and genes in rows, and a list of correspondance of barcodes and sample numbers

How I can map barcodes to sample numbers?

next-gen RNA-Seq • 1.8k views
ADD COMMENT
1
Entering edit mode

What are you thinking when you say :

How I can map barcodes to sample numbers?

Could you make a example of the expected results please

Are you using R, Python, Perl... ?

Your raw counts are in files, dataframe or matrix ?

ADD REPLY
0
Entering edit mode

Thank you, I am in R and mac OS. both data are in separate matrices. I expect that my raw counts file has sample names (h16.sc1, h16.sc2, etc) in columns instead of barcodes.

ADD REPLY
2
Entering edit mode
6.5 years ago

bar and bartable are borrowed from shenwei356 from above post:

bar=read.csv("bar.txt", sep="\t", header = T, strip.white = T,stringsAsFactors = F)
bartable=read.csv("table.tsv", sep="\t", header = T, strip.white = T, stringsAsFactors = F)
bartable
> bartable
    gene ATAGTTCTCGT GAAGCAGTATG GAAGACTTGGT AAAAAAAAAA
1  gene1           0           0           3          0
2 gen1e2           0           0           0          0
> bar
  Sample     Barcode
1    sc1 CCTAGATTAAT
2    sc2 GAAGACTTGGT
3    sc3 GAAGCAGTATG
4    sc4 GGTAACCTGAC
5    sc5 ATAGTTCTCGT

for (i in colnames(bartable)){
    if ( i %in% bar$Barcode){
        colnames(bartable)[match(i,colnames(bartable))] = as.character(bar[which(bar$Barcode==i),][1])
    }
}
> bartable
    gene sc5 sc3 sc2 AAAAAAAAAA
1  gene1   0   0   3          0
2 gen1e2   0   0   0          0
ADD COMMENT
1
Entering edit mode
6.5 years ago

Try csvtk, supporting the two files are tab-separated.

updated with v0.14.0 or later version

./csvtk rename2 -t -f -gene -p '(.+)' -r '{kv}' -k <(./csvtk cut -t -f 2,1 barcodes.tsv)  -K  ounts.tsv> result.tsv

Example:

$ cat barcodes.tsv 
Sample  Barcode
sc1     CCTAGATTAAT
sc2     GAAGACTTGGT
sc3     GAAGCAGTATG
sc4     GGTAACCTGAC
sc5     ATAGTTCTCGT

$ cat table.tsv 
gene    ATAGTTCTCGT     GAAGCAGTATG     GAAGACTTGGT     AAAAAAAAAA
gene1   0       0       3       0
gen1e2  0       0       0       0

# note that, we must arrange the order of barcodes.tsv in KEY-VALUE
$ csvtk cut -t -f 2,1 barcodes.tsv 
Barcode Sample
CCTAGATTAAT     sc1
GAAGACTTGGT     sc2
GAAGCAGTATG     sc3
GGTAACCTGAC     sc4
ATAGTTCTCGT     sc5

# here we go!!!!

$ csvtk rename2 -t -k <(csvtk cut -t -f 2,1 barcodes.tsv) -f -1 -p '(.+)' -r '{kv}' --key-miss-repl unknown table.tsv 
gene    sc5     sc3     sc2     unknown
gene1   0       0       3       0
gen1e2  0       0       0       0

original answer

$ csvtk transpose -t table.tsv \
    | csvtk replace -t -f gene -p '^(.+)$' -r '{kv}' -k <(csvtk cut -t -f 2,1 barcodes.tsv)  -K \
    | csvtk transpose -t \
    > result.tsv

It's a little verbose, I will make csvtk rename2 supporting {kv} soon so we can avoid using transpose.

ADD COMMENT
0
Entering edit mode

Sorry, is there for mac? I just notices download for windows and linux

ADD REPLY
0
Entering edit mode

Thank you, I downloaded but there is only an executable thing named csvtk. I can't figure out how to deal with that

I set work directory to executable file but saying

dhcp179185:Downloads $ csvtk transpose -t counts.tsv \
>     | csvtk replace -t -f gene -p '^(.+)$' -r '{kv}' -k <(csvtk cut -t -f 2,1 barcodes.tsv)  -K \
> 
-bash: csvtk: command not found
ADD REPLY
0
Entering edit mode

run

./csvtk xxx
ADD REPLY
0
Entering edit mode

Answer updated. It's much easier.

ADD REPLY
0
Entering edit mode
dhcp179185:Downloads $ csvtk-0.14.0
-bash: csvtk-0.14.0: command not found
dhcp179185:Downloads$

Sorry I don't know how to install that

ADD REPLY
1
Entering edit mode

You have to consider the ./ before the command. Which means "run the executable located in this folder".

ADD REPLY
0
Entering edit mode
6.5 years ago

From this thread, try under R :

counts <- read.table(file="/path/to/counts.csv", sep="\t", header=TRUE, row.names=1)
samples <- read.table(file="/path/to/samples.csv", sep="\t", header=TRUE)
counts$id <- row.names(counts)
mdfa <- reshape2::melt(counts, id.vars = "id", variable.name = "Barcode")
reshape2::dcast(merge(samples, mdfa, by = "Barcode"), id ~ Sample, fun.aggregate = sum)
ADD COMMENT

Login before adding your answer.

Traffic: 1611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6