Hi there,
I'm working on a project for determining how this two protozoan organisms have evolved and in which genes they differ the most by looking at the SPNs.
I have done some previous analysis calling for variants by using some programs
# construct the FM-index for the reference genome (in this case the Parasite genome)
bwa index ref.fa
# Find the SA coordinates of the input reads
bwa aln ref.fa short_read.fq > aln_sa.sai
# Generate alignments in the SAM format given paired-end reads
bwa sampe ref.fa aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln-pe.sam
# Transform .sam files into .bam files
samtools view -b -o output.bam input.sam
# Sort that bam file
samtools sort -@ 15 -o outfile infile
# Remove duplicates
java -Xmx2g -jar picard.jar MarkDuplicates MAX_FILE_HANDLES=500 REMOVE_DUPLICATES=true I= sorted.bam O=marked_duplicates.bam
M=marked_dup_metrics.txt
# Create .bai file for each .bam file
bamtools index -in file.bam
# Run Freebayes (looking for SNPs)
freebayes -f refernce.fasta file1.bam file2.bam file3.bam > combined.freebayes.vcf
# Compress and index the vcf files
bgzip -c combined.freebayes.vcf > combined.freebayes.vcf.gz
tabix -p vcf combined.freebayes.vcf.gz
The problem now is that I ouwld like to use this .vcf file for determining the genes where there are more SPNs, the function of these genes and (if it is posible) determine how these two related organisms have evolved.
I know some programs like ANNOVAR http://annovar.openbioinformatics.org/en/latest/ or VEP https://www.ensembl.org/info/docs/tools/vep/index.html can be used in order to determine the effects of the variants (SNPs). Since I'm working on a conda environment, the problems I’m having are:
1) ANNOVAR cannot be installed in conda (or, at least, I haven’t found any command for installing it)
2) VEP can be installed in conda https://anaconda.org/bioconda/ensembl-vep but when installing it it says that the library that you need in order to do the analysis are not included so you need to run another command (vep_install) that it gives me errors each time I try to run it in my environment of conda.
vep_install -a cf -s plasmodium_relictum -y PRELSG -c /PATH/miniconda3/pkgs/ensembl-vep-98.3-pl526hecc5488_0/share/ensembl-vep-98.3-0/vep_install
and then
- getting list of available cache files
ERROR: No matching species found at /PATH/miniconda3/envs/salmon/bin/vep_install line 1121, <> line 1.
What I am doing wrong? I just want to have the library in oder to use VEP in a normal way...
Thank you so much. Following the instructions I have upload the cache file to where ./vep is installed and then run the command
after that I have tried to run the command for VEP as usual but I get the error :
What I am doing wrong?
You're using version 95 of the VEP. Plasmodium relictum didn't come into Ensembl until release 96, which means the VEP can't find the species name in its list of aliases. Please update to the latest VEP, which is 98.
Thanks. I have been able to run the VEP 98 version but still I have the same problem:
needs to be
--genomes
at the end, not--genome
I doesn't work with
--genomes
it gives the same errorSorry, I forgot there is an extra option for some of the bacteria and protist databases. Please add
is_multispecies 1
to your command. Because the database contains lots of strains, you also need to give the more specific genome name of--species plasmodium_relictum_gca_900005765
. I know there is only one Plasmodium relictum genome in the database at the moment, but this allows us to add more in future without changing the aliases.