Question

Filtering count table

0

Entering edit mode

4.3 years ago

luca ▴ 70

Hi there, I have a filtering question based on R. I am doing some comparative transcriptomics between 5 different species (with 10 biological replicates per species). All species have been mapped to the same genome, so I have one unique count table in this form:

GeneName, Species1_rep1, Species1_rep2, .... Species1_rep10, Species2_rep1, Species2_rep2, ... Species5_rep10

I need to apply a filter to filter genes. I would like to keep genes that have at least 5 counts in at least 5 (out of 10) biological replicates of the same species. How can I do that?

Thanks in advance for any suggestion, All the best Luca

RNA-Seq count-table filtering counts • 660 views

ADD COMMENT • link updated 4.3 years ago by rpolicastro 13k • written 4.3 years ago by luca ▴ 70

score 0 · Answer 1 · 2020-07-25

0

Entering edit mode

4.3 years ago

rpolicastro 13k

Here's a tidyverse option.

library("tidyverse")

df %>%
  pivot_longer(starts_with("Species"), names_to = "samples", values_to = "counts") %>%
  separate(samples, c("species", "rep"), sep = "_") %>%
  group_by(GeneName, species) %>%
  filter(sum(counts >= 5) >= 5) %>%
  unite(samples, species, rep, sep = "_") %>%
  pivot_wider(GeneName, names_from = "samples", values_from = "counts") %>%
  drop_na

ADD COMMENT • link 4.3 years ago by rpolicastro 13k

0

Entering edit mode

Thanks @rpolicastro! Your solution works like a charm. The only thing i had to remove the last drop_na to keep all the genes and all the samples, otherwise I was losing some of them from the filtered count table!

Thanks again!!!!! Luca

ADD REPLY • link 4.3 years ago by luca ▴ 70