Filtering count table
1
0
Entering edit mode
4.3 years ago
luca ▴ 70

Hi there, I have a filtering question based on R. I am doing some comparative transcriptomics between 5 different species (with 10 biological replicates per species). All species have been mapped to the same genome, so I have one unique count table in this form:

GeneName, Species1_rep1, Species1_rep2, .... Species1_rep10, Species2_rep1, Species2_rep2, ... Species5_rep10

I need to apply a filter to filter genes. I would like to keep genes that have at least 5 counts in at least 5 (out of 10) biological replicates of the same species. How can I do that?

Thanks in advance for any suggestion, All the best Luca

RNA-Seq count-table filtering counts • 660 views
ADD COMMENT
0
Entering edit mode
4.3 years ago

Here's a tidyverse option.

library("tidyverse")

df %>%
  pivot_longer(starts_with("Species"), names_to = "samples", values_to = "counts") %>%
  separate(samples, c("species", "rep"), sep = "_") %>%
  group_by(GeneName, species) %>%
  filter(sum(counts >= 5) >= 5) %>%
  unite(samples, species, rep, sep = "_") %>%
  pivot_wider(GeneName, names_from = "samples", values_from = "counts") %>%
  drop_na
ADD COMMENT
0
Entering edit mode

Thanks @rpolicastro! Your solution works like a charm. The only thing i had to remove the last drop_na to keep all the genes and all the samples, otherwise I was losing some of them from the filtered count table!

Thanks again!!!!! Luca

ADD REPLY

Login before adding your answer.

Traffic: 1887 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6