Question

mapping reads using kallisto - rna seq analysis

0

Entering edit mode

2.5 years ago

bioinformatics ▴ 40

Hi,

I'm trying to map reads using kallisto for rna seq analysis (terminal on mac) and keep getting an error message:

Error: FASTA file not found Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz
kallisto 0.48.0
Builds a kallisto index

Usage: kallisto index [arguments] FASTA-files

Required argument:
-i, --index=STRING          Filename for the kallisto index to be constructed 

Optional argument:
-k, --kmer-size=INT         k-mer (odd) length (default: 31, max value: 31)
    --make-unique           Replace repeated target names with unique names

admins-Air:~ mesalie$ kallisto quant \
> -i Homo_sapiens.GRCh38.cdna.all.index \
> -o test \
> -t 8 \
> --single -l 250 -s 30 \
> SRR8668755_1M_subsample.fastq.gz

Error: kallisto index file not found Homo_sapiens.GRCh38.cdna.all.index
Warning: you asked for 8, but only 4 cores on the machine

Usage: kallisto quant [arguments] FASTQ-files

Required arguments:
-i, --index=STRING            Filename for the kallisto index to be used for
                              quantification
-o, --output-dir=STRING       Directory to write output to

Optional arguments:
    --bias                    Perform sequence based bias correction
-b, --bootstrap-samples=INT   Number of bootstrap samples (default: 0)
    --seed=INT                Seed for the bootstrap sampling (default: 42)
    --plaintext               Output plaintext instead of HDF5
    --fusion                  Search for fusions for Pizzly
    --single                  Quantify single-end reads
    --single-overhang         Include reads where unobserved rest of fragment is
                              predicted to lie outside a transcript
    --fr-stranded             Strand specific reads, first read forward
    --rf-stranded             Strand specific reads, first read reverse
-l, --fragment-length=DOUBLE  Estimated average fragment length
-s, --sd=DOUBLE               Estimated standard deviation of fragment length
                              (default: -l, -s values are estimated from paired
                               end data, but are required when using --single)
-t, --threads=INT             Number of threads to use (default: 1)
    --pseudobam               Save pseudoalignments to transcriptome to BAM file
    --genomebam               Project pseudoalignments to genome sorted BAM file
-g, --gtf                     GTF file for transcriptome information
                              (required for --genomebam)
-c, --chromosomes             Tab separated file with chromosome names and lengths
                              (optional for --genomebam, but recommended)
    --verbose                 Print out progress information every 1M proccessed reads

The commands I used are listed below:

bash ~/Miniconda3-latest-MacOSX-x86_64.sh -b -p $HOME/miniconda
source $HOME/miniconda/bin/activate
conda init zsh
conda info
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set offline false
conda create --name rnaseq 
conda activate rnaseq 
conda install -c bioconda kallisto
kallisto
conda install -c bioconda fastqc
conda install -c bioconda multiqc
conda activate rna seq 
kallisto index -i Homo_sapiens.GRCh38.cdna.all.index Homo_sapiens.GRCh38.cdna.all.fa
kallisto quant \
-i Homo_sapiens.GRCh38.cdna.all.index \
-o test \
-t 8 \
--single -l 250 -s 30 \
SRR8668755_1M_subsample.fastq.gz

Does anyone know how I might be able to correct this?

Thankyou

minconda seq analysis rna • 3.2k views

ADD COMMENT • link updated 2.5 years ago by mark.ziemann ★ 1.9k • written 2.5 years ago by bioinformatics ▴ 40

0

Entering edit mode

give the correct path of Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz to kallisto index

ADD REPLY • link 2.5 years ago by andres.firrincieli 3.8k

0

Entering edit mode

Thankyou for your response. How exactly do I do this?

ADD REPLY • link 2.5 years ago by bioinformatics ▴ 40

0

Entering edit mode

Should the command be kallisto index -i Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.index Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa

It gives an error message: Error: FASTA file not found Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa

kallisto 0.48.0
Builds a kallisto index

Usage: kallisto index [arguments] FASTA-files

Required argument:
-i, --index=STRING          Filename for the kallisto index to be constructed 

Optional argument:
-k, --kmer-size=INT         k-mer (odd) length (default: 31, max value: 31)
    --make-unique           Replace repeated target names with unique names

ADD REPLY • link 2.5 years ago by bioinformatics ▴ 40

0

Entering edit mode

Make sure Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa is in your working directory (or give the absolute path to be safe), and that the extension of the file is actually .fa.gz.fa as written in your code.

ADD REPLY • link 2.5 years ago by rpolicastro 13k

0

Entering edit mode

Ok thanks, I have now done this.

I still get an error message: [quant] fragment length distribution is truncated gaussian with mean = 250, sd = 30 Error: incompatible indices. Found version 3472328451435676990, expected version 10 Rerun with index to regenerate%

Do you know how I might correct this?

ADD REPLY • link 2.5 years ago by bioinformatics ▴ 40

0

Entering edit mode

You will need to regenerate the index with the kallisto index command.

ADD REPLY • link 2.5 years ago by mark.ziemann ★ 1.9k

0

Entering edit mode

Ok thankyou. If it takes a long time to load what should I do? I have tried on two macs

% kallisto index -i Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.index Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa

[build] loading fasta file Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 1554 target sequences
[build] warning: replaced 100005 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...

ADD REPLY • link 2.5 years ago by bioinformatics ▴ 40

0

Entering edit mode

It might take 10 - 20 minutes. Use a system monitor utility to watch out that you don't run out of memory. Good luck

ADD REPLY • link 2.5 years ago by mark.ziemann ★ 1.9k