dereplication process in metagenomics
1
1
Entering edit mode
8.3 years ago
sacha ★ 2.4k

Hi,

Could you explain briefly what's the goal of the "dereplication" process in a metagenomics analysis ?

metagenomics dereplication • 3.0k views
ADD COMMENT
1
Entering edit mode
8.3 years ago
sacha ★ 2.4k

I think I understood the purpose ! Derepliaction is not a filtering process ! This step is used to compute the abundance of reads.
For instance, I have the following fasta file :

> seq1 
ACGT
> seq2 
ACGT
> seq3 
TCGA
> seq4 
GGAC

After dereplication I will have :

> seq1_2
ACGT
> seq3_1
TCGA
> seq4_1
GGAC

Note that abondance value are written in the name of sequence

ADD COMMENT
0
Entering edit mode

For instance, using vsearch :

>A1
AGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCA
>A2
AGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCA
>A3
AGATACAGAGTAGAAAGTAGAGATCAGATGAGTAGATGATAGATACCCCCCCCCCCCTTTTTCCCCCCAAATAGTGTACCATATGGTGATTATGCG
>A4
AGATACAGAGTAGAGAGAGTAGAGATCAGATGAGTAGATGATAGATAGCACACCATTGGATTGTACGTATTAGTGTACCATATGGTGATTATGCG
>A5
AGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCA

The following command will convert to :

vsearch --derep_fulllength test.fa --output derep.fasta --sizeout

>A1;size=3;
AGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCAAGATACCA
AGATACCAAGATACCA
>A3;size=1;
AGATACAGAGTAGAAAGTAGAGATCAGATGAGTAGATGATAGATACCCCCCCCCCCCTTTTTCCCCCCAAATAGTGTACC
ATATGGTGATTATGCG
>A4;size=1;
AGATACAGAGTAGAGAGAGTAGAGATCAGATGAGTAGATGATAGATAGCACACCATTGGATTGTACGTATTAGTGTACCA
TATGGTGATTATGCG
ADD REPLY

Login before adding your answer.

Traffic: 1648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6