Question

Deconseq to remove human sequences

0

Entering edit mode

9.7 years ago

luigi.marongiu • 0

Hello,

I am using Deconseq to remove human sequences from fastq files generated by Myseq Illumina. I created the database with the human sequences using:

bwa64 index -p hs_ref_GRCh38_p2 -a bwtsw hs_ref_GRCh38_p2_split_PS.fa.fasta > bwa.log 2 >&1

but now I don't know how to remove these sequences from my actual paired files, let's call them myfile_1.fq and myfile_2.fq

could you give some hints?

Thank you.

deconseq cleaning • 3.7k views

ADD COMMENT • link updated 9.7 years ago by satshil.r ▴ 50 • written 9.7 years ago by luigi.marongiu • 0

Ram · Answer 1 · 2015-12-03

0

Entering edit mode

9.7 years ago

satshil.r ▴ 50

perl deconseq.pl -f myfile_1 -dbs hs_ref_GRCh38_p2 -i 90 -c 90 -out_dir <directory>

The -I 90 refers to an identity threshold:

Alignment identity threshold in percentage. The identity is calculated for the part of the query sequence that is aligned to a reference sequence. For example, a query sequence of 100 bp that aligns to a reference sequence over the first 50 bp with 40 matching positions has an identity value of 80%.

The -c 90 refers to the coverage threshold:

Alignment coverage threshold in percent. The coverage is calculated for the part of the query sequence that is aligned to a reference sequence. For example, a query sequence of 100 bp that aligns to a reference sequence over the first 50 bp with 40 matching positions has an coverage value of 50%.

You have to make sure you define your deconseq databases in the configuration file.

hs_ref_GRCh38_p2 => {name => 'hs_ref_GRCh38_p2',
                        db => 'hs_ref_GRCh38_p2'},

and make sure you define the database location:

use constant DB_DIR => "<DIR_WITH_BWA_DB_OUTPUT>";

Of course you have to adjust the settings, specifically the c and i thresholds to what you seem fit.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 9.7 years ago by satshil.r ▴ 50

0

Entering edit mode

Thank you very much, but it still a bit beyond me. So first of all, if I have two paired files, why there is only one in the command? Secondly, what configuration file shall I modify? Thirdly, the database location should go in the same config file? Should these modification be done verbatim? Cheers

ADD REPLY • link 9.7 years ago by luigi.marongiu • 0

0

Entering edit mode

I created the database with the human sequences using:

bwa64 index -p hs_ref_GRCh38_p2 -a bwtsw hs_ref_GRCh38_p2_split_PS.fa.fasta > bwa.log 2 >&1

This as created a series of files that I placed in a subfolder named refChr. The list of files is:

hs_ref_GRCh38_p2.amb hs_ref_GRCh38_p2.pac hs_ref_GRCh38_p2.sa
hs_ref_GRCh38_p2.ann hs_ref_GRCh38_p2.rbwt hs_ref_GRCh38_p2_split.fa
hs_ref_GRCh38_p2.bwt hs_ref_GRCh38_p2.rpac hs_ref_GRCh38_p2_split.fa.log
hs_ref_GRCh38_p2.fa hs_ref_GRCh38_p2.rsa hs_ref_GRCh38_p2_split_PS.fa.fasta

I then ran the following command to use Deconseq:

~$ perl /usr/bin/deconseq.pl -f fu_1.fq -dbs ./refChr/hs_ref_GRCh38_p2 -i 90 -c 90 -out_dir DECONSEQ
But I got the following error:
ERROR: database "./refChr/hs_ref_GRCh38_p2" does not exist in config file.

Try 'deconseq -h' for more information.
Exit program.

I tried with '/refChr/...' and 'refChr/...' and also with '...hs_ref_GRCh38_p2.fa' and '...hs_ref_GRCh38_p2.sa' but same error.

What would be the correct use of Deconseq with the human library to remove the human contaminants?

Thank you

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by luigi.marongiu • 0