perl deconseq.pl -f myfile_1 -dbs hs_ref_GRCh38_p2 -i 90 -c 90 -out_dir <directory>
The -I 90
refers to an identity threshold:
Alignment identity threshold in percentage. The identity is calculated for the part of the query sequence that is aligned to a reference sequence. For example, a query sequence of 100 bp that aligns to a reference sequence over the first 50 bp with 40 matching positions has an identity value of 80%.
The -c 90
refers to the coverage threshold:
Alignment coverage threshold in percent. The coverage is calculated for the part of the query sequence that is aligned to a reference sequence. For example, a query sequence of 100 bp that aligns to a reference sequence over the first 50 bp with 40 matching positions has an coverage value of 50%.
You have to make sure you define your deconseq databases in the configuration file.
hs_ref_GRCh38_p2 => {name => 'hs_ref_GRCh38_p2',
db => 'hs_ref_GRCh38_p2'},
and make sure you define the database location:
use constant DB_DIR => "<DIR_WITH_BWA_DB_OUTPUT>";
Of course you have to adjust the settings, specifically the c
and i
thresholds to what you seem fit.
Thank you very much, but it still a bit beyond me. So first of all, if I have two paired files, why there is only one in the command? Secondly, what configuration file shall I modify? Thirdly, the database location should go in the same config file? Should these modification be done verbatim? Cheers
I created the database with the human sequences using:
This as created a series of files that I placed in a subfolder named refChr. The list of files is:
I then ran the following command to use Deconseq:
I tried with '/refChr/...' and 'refChr/...' and also with '...hs_ref_GRCh38_p2.fa' and '...hs_ref_GRCh38_p2.sa' but same error.
What would be the correct use of Deconseq with the human library to remove the human contaminants?
Thank you