Get all rows matching to list
1
1
Entering edit mode
4.0 years ago
Bioinfonext ▴ 470

Hi,

I have got a large tab-delimited data file like this;

OTU     Sample  Abundance   Kingdom      Phylum         Class         Order           Family
OTU1    1D_M2   0.000233111 Bacteria    Bacteroidetes   Bacteroidia Bacteroidales   Marinilabiliaceae
OTU1    1D_M1   9.96E-05    Bacteria    Bacteroidetes   Bacteroidia Bacteroidales   Marinilabiliaceae
OTU1    1D_R1   8.82E-05    Bacteria    Bacteroidetes   Bacteroidia Bacteroidales   Marinilabiliaceae   
OTU2    2W_R2   8.41E-06    Bacteria    Proteobacteria  Deltaproteobacteria Desulfobacterales   Desulfobulbaceae
OTU2    2W_M1   8.37E-06    Bacteria    Proteobacteria  Deltaproteobacteria Desulfobacterales   Desulfobulbaceae
OTU2    1D_R1   5.21E-06    Bacteria    Proteobacteria  Deltaproteobacteria Desulfobacterales   Desulfobulbaceae

I want to get all rows that match to list of OTU that in the separated file like this

OTU1
OTU2
OTU3..........

Like if I want to get OUT1 then it should give me output like this;

    OTU   Sample    Abundance   Kingdom      Phylum         Class         Order           Family
    OTU1    1D_M2   0.000233111 Bacteria    Bacteroidetes   Bacteroidia Bacteroidales   Marinilabiliaceae
    OTU1    1D_M1   9.96E-05    Bacteria    Bacteroidetes   Bacteroidia Bacteroidales   Marinilabiliaceae
    OTU1    1D_R1   8.82E-05    Bacteria    Bacteroidetes   Bacteroidia Bacteroidales   Marinilabiliaceae

Many thanks

R Bioconducter • 1.1k views
ADD COMMENT
2
Entering edit mode

try df[df$OTU==list,] or subset(df, OTU %in% list)

ADD REPLY
0
Entering edit mode

thank you so much, this code is working perfectly;

## read your data.frame in
df <- read.delim('soil.new.genus.relative.count.txt',sep='\t')

## create your list for matching
##your.list <- unique(df$OTU)
list <- list("ab3feea76dbe8d5688328e879961352c","16c306238059e7942361b356cb1fe8e0","4d4ee91e5bbea2b2ac79903df95ac919","418da7c18e8ca5e4fef06abbdbd2a55c","8e60d301122d7aa359eb6b0b00f37f62","fb991c397395f953464de888e90cfa9c","d71c327f54f6d2e4a6ecf5850a8d059f","d8e9b1d7677e9679bd818655c7fdd9cd","471688174511fedf079b2ce447e8fe6d","adc782d5c4173d56e0d5a74abadb73e3","29ba2f1a4ed84c982f33a8dd96cb2707","339ad5ba4905ffd4a692bd4430766cf4","a691c59fede09b1b48dbfa629ad8b0a1","bd2302ce018f3c596a4ccdbb47dbf2a0","df37cff8946279336357981f544316f5","42c64327f413a7e956921ead9e6cd50c","aed2016cb5b34512ababa75ee4f0f951","f3209372727b47240d46b5c23ca66059","1d4da1099aa9bf25526311cb611bb148","b657956cb832931182f6acf4e0f7e455","359e93139648e7b077316c58160a3a08","d8bc5de7908695c52ec4f75425f1ff5e","01add9ed426ff8519b767caef56d0942","b2137100989578bf9c643d4835076389","9f63daa5c7dbbf68c5f554a35059f2a4","f47757d1267b4d34e903a8c0ee82fe34","91bf6f1720f3875b5bc34550cbd12fa4","fd88fed64df1f69a1d179796193f727a","f1623ead3e6340f23f732988d40ff9f0","bfad6370d28182cc6304844e9bec7fb6","fc67529355587119c01c02a63f43fe9b","5ec8c16ff26835b40c4c5764369d51a2","13f195c7ed592a10db09c876e153b6e4","f7ff11244ec62b05834cb1c585dd3ecb","28628b5969f6f9e53e9684aa7411251b
")

df1<-subset(df, OTU %in% list)

write.table(df1,"soil.top.genera.txt")
ADD REPLY
1
Entering edit mode

Split your data into list, each list item will have data for 1 OTU, something like

d_split <- split(d, d$OTU)
ADD REPLY
1
Entering edit mode
4.0 years ago
## read your data.frame in
df <- read.delim('large_tab_delimited_data_file_like_this.tsv',sep='\t')

## create your list for matching
##your.list <- unique(df$OTU)
your.list <- list("OTU1","OTU2","OTU3")

## subset your df object by the OTU values in your list
df.subsets.list <- lapply(your.list,function(x){
subset(df,OTU %in% x)
##write.table(x,file=paste0('subset_',x,'.tsv),sep='\t',quote=FALSE,row.names=FALSE)
})

df.subsets.list

Not sure what you want to do with your output but if you uncomment that it will make new files with each subset

ADD COMMENT
0
Entering edit mode

thanks so much, I am using above code but getting this error;

<0 rows> (or 0-length row.names)

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  arguments imply differing number of rows: 12, 0
Calls: write.table ... as.data.frame -> as.data.frame.list -> do.call -> <Anonymous>
Execution halted

Code that used;

## read your data.frame in
df <- read.delim('soil.new.genus.relative.count.txt',sep='\t')

## create your list for matching
##your.list <- unique(df$OTU)
your.list <- list("ab3feea76dbe8d5688328e879961352c","16c306238059e7942361b356cb1fe8e0","4d4ee91e5bbea2b2ac79903df95ac919","418da7c18e8ca5e4fef06abbdbd2a55c","8e60d301122d7aa359eb6b0b00f37f62","fb991c397395f953464de888e90cfa9c","d71c327f54f6d2e4a6ecf5850a8d059f","d8e9b1d7677e9679bd818655c7fdd9cd","471688174511fedf079b2ce447e8fe6d","adc782d5c4173d56e0d5a74abadb73e3","29ba2f1a4ed84c982f33a8dd96cb2707","339ad5ba4905ffd4a692bd4430766cf4","a691c59fede09b1b48dbfa629ad8b0a1","bd2302ce018f3c596a4ccdbb47dbf2a0","df37cff8946279336357981f544316f5","42c64327f413a7e956921ead9e6cd50c","aed2016cb5b34512ababa75ee4f0f951","f3209372727b47240d46b5c23ca66059","1d4da1099aa9bf25526311cb611bb148","b657956cb832931182f6acf4e0f7e455","359e93139648e7b077316c58160a3a08","d8bc5de7908695c52ec4f75425f1ff5e","01add9ed426ff8519b767caef56d0942","b2137100989578bf9c643d4835076389","9f63daa5c7dbbf68c5f554a35059f2a4","f47757d1267b4d34e903a8c0ee82fe34","91bf6f1720f3875b5bc34550cbd12fa4","fd88fed64df1f69a1d179796193f727a","f1623ead3e6340f23f732988d40ff9f0","bfad6370d28182cc6304844e9bec7fb6","fc67529355587119c01c02a63f43fe9b","5ec8c16ff26835b40c4c5764369d51a2","13f195c7ed592a10db09c876e153b6e4","f7ff11244ec62b05834cb1c585dd3ecb","28628b5969f6f9e53e9684aa7411251b
")

## subset your df object by the OTU values in your list
df.subsets.list <- lapply(your.list,function(x){
subset(df,OTU %in% x)
##write.table(x,file=paste0('subset_',x,'.tsv),sep='\t',quote=FALSE,row.names=FALSE)
})

df.subsets.list


write.table(df.subsets.list,"soil.top.genera.txt")
ADD REPLY
0
Entering edit mode

Can you do head(df) and include what is shown. It could be that the file isn't importing correctly. Also your write.table command wont work you need to uncomment (i.e. remove ##) from the part of my code. It will then create files named:

ab3feea76dbe8d5688328e879961352c.txt

16c306238059e7942361b356cb1fe8e0.txt

etc...

Which will contain the subsets that match those OTUs.

ADD REPLY

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6