Subset row-entries according to a list
1
0
Entering edit mode
2.2 years ago
bionix ▴ 10

Hello!

I want to subset a selected dataset (a list of entries) from a big data file. I have a list named "contig.list" that looks like this:

Contig_339241_4
Contig_1004621_3
Contig_1666_1
Contig_836268_32
Contig_1479_10
Contig_640297_1
Contig_365838_1
..

I want to subset the entries of this list from a big table named "function.tax.ranks" that looks like this:

Contig_339241_4 Taxonomy
Contig_339241_41    Taxonomy
Contig_339241_47    Taxonomy
Contig_1004621_3    Taxonomy
Contig_1004621_30   Taxonomy
Contig_1004621_39   Taxonomy
Contig_1666_1   Taxonomy
Contig_836268_32    Taxonomy
Contig_1479_10  Taxonomy
Contig_1479_100 Taxonomy
Contig_1479_100 Taxonomy
Contig_1479_107 Taxonomy
Contig_640297_1 Taxonomy
Contig_365838_1 Taxonomy
Contig_365838_16    Taxonomy
Contig_365838_17    Taxonomy
..

The resulting output should be:

Contig_339241_4 Taxonomy
Contig_1004621_3    Taxonomy
Contig_1666_1   Taxonomy
Contig_836268_32    Taxonomy
Contig_1479_10  Taxonomy
Contig_640297_1 Taxonomy
Contig_365838_1 Taxonomy

I have tried

grep -f contig.list function.tax.ranks > contig_taxa.txt

But the problem is the subsetting doesn't stop at the last digit, it extracts everything after that. For example, while my list has only "Contig_339241_4", I am getting additional output from "Contig_339241_41" and "Contig_339241_47" (basically all entries from Contig_339241_4[0-9]). How can I fix it?

Thank you very much in advance!

Regards, PSP

subset • 615 views
ADD COMMENT
1
Entering edit mode
2.2 years ago
GenoMax 147k

Have you tried adding -w to your command so it does a full word match.

ADD COMMENT
0
Entering edit mode

GenoMax thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 2597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6