command for to know the count of matching and not matching
1
0
Entering edit mode
3.0 years ago
harry ▴ 40

I have a file in which two-column present one is read and the other is circBase.

      read              circBase
hsa_circ_0000160    hsa_circ_0000160
hsa_circ_0000160    hsa_circ_0000160
hsa_circ_0000175    hsa_circ_0000175
hsa_circ_0000175    hsa_circ_0000175
hsa_circ_0000211    hsa_circ_0017614
hsa_circ_0000211    hsa_circ_0000211
hsa_circ_0000211    hsa_circ_0000211
hsa_circ_0000219    hsa_circ_0000219
hsa_circ_0000219    hsa_circ_0000219
hsa_circ_0000219    hsa_circ_0000219
hsa_circ_0000236    hsa_circ_0000236

I want to make a file with 3 columns in which the 1st column is a unique read name from the above read column The 2nd column is the count of the match with each other and the third column is the count of which does not match the first column. The below is an example of the output file which I want.

read    match   not_match
hsa_circ_0000160    2   0
hsa_circ_0000175    2   0
hsa_circ_0000211    2   1
hsa_circ_0000219    3   0
hsa_circ_0000236    1   0

I search for it but I can't find out how can I do this. So please can anyone help me with this issue. Thanks in advance

command • 1.3k views
ADD COMMENT
3
Entering edit mode
3.0 years ago

A solution with csvtk (please use binaries of this link)

$ cat t.tsv \          
    | csvtk mutate2 -t -n match     -L 0 -e '$read == $circBase ? 1 : 0' \
    | csvtk mutate2 -t -n not_match -L 0 -e '$read == $circBase ? 0 : 1' \
    | csvtk summary -t -g read -n 0 -f match:sum -f not_match:sum \
    | csvtk rename2 -t -f -read -p ':sum'

read    match   not_match
hsa_circ_0000160        2       0
hsa_circ_0000175        2       0
hsa_circ_0000211        2       1
hsa_circ_0000219        3       0
hsa_circ_0000236        1       0
ADD COMMENT
0
Entering edit mode

My t.tsv file looks like:

read,circBase
hsa_circ_0000160,hsa_circ_0000160
hsa_circ_0000175,hsa_circ_0000175
hsa_circ_0000175,hsa_circ_0000175
hsa_circ_0000211,hsa_circ_0017614
hsa_circ_0000211,hsa_circ_0000211
hsa_circ_0000211,hsa_circ_0000211
hsa_circ_0000219,hsa_circ_0000219
hsa_circ_0000219,hsa_circ_0000219
hsa_circ_0000219,hsa_circ_0000219
hsa_circ_0000236,hsa_circ_0000236

I used below command and showing some error:

cat t.tsv | /home/aclab/csvtk mutate2 -t -n match -L 0 -e '$read == $circBase ? 1 : 0'| /home/aclab/csvtk mutate2 -t -n not_match -L 0 -e '$read == $circBase ? 0 : 1' | /home/aclab/csvtk summary -t -g read -n 0 -f match:sum -f not_match:sum| /home/aclab/csvtk rename2 -t -f -read -p ':sum'

After this command run then it shows the below error:-

[ERRO] column "read" not existed in file: - [ERRO] xopen: no content [ERRO] xopen: no content [ERRO] xopen: no content

Can you please sort out this issue? Thanks in advance.

ADD REPLY
2
Entering edit mode

Just remove all -t for CSV file.

-t, --tabs specifies that the input CSV file is delimited with tabs.

ADD REPLY
0
Entering edit mode

Thanks now its working

ADD REPLY
1
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
upvote_bookmark_accept

ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6