mapping reads using kallisto - rna seq analysis
0
0
Entering edit mode
2.4 years ago

Hi,

I'm trying to map reads using kallisto for rna seq analysis (terminal on mac) and keep getting an error message:

Error: FASTA file not found Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz
kallisto 0.48.0
Builds a kallisto index

Usage: kallisto index [arguments] FASTA-files

Required argument:
-i, --index=STRING          Filename for the kallisto index to be constructed 

Optional argument:
-k, --kmer-size=INT         k-mer (odd) length (default: 31, max value: 31)
    --make-unique           Replace repeated target names with unique names

admins-Air:~ mesalie$ kallisto quant \
> -i Homo_sapiens.GRCh38.cdna.all.index \
> -o test \
> -t 8 \
> --single -l 250 -s 30 \
> SRR8668755_1M_subsample.fastq.gz

Error: kallisto index file not found Homo_sapiens.GRCh38.cdna.all.index
Warning: you asked for 8, but only 4 cores on the machine

Usage: kallisto quant [arguments] FASTQ-files

Required arguments:
-i, --index=STRING            Filename for the kallisto index to be used for
                              quantification
-o, --output-dir=STRING       Directory to write output to

Optional arguments:
    --bias                    Perform sequence based bias correction
-b, --bootstrap-samples=INT   Number of bootstrap samples (default: 0)
    --seed=INT                Seed for the bootstrap sampling (default: 42)
    --plaintext               Output plaintext instead of HDF5
    --fusion                  Search for fusions for Pizzly
    --single                  Quantify single-end reads
    --single-overhang         Include reads where unobserved rest of fragment is
                              predicted to lie outside a transcript
    --fr-stranded             Strand specific reads, first read forward
    --rf-stranded             Strand specific reads, first read reverse
-l, --fragment-length=DOUBLE  Estimated average fragment length
-s, --sd=DOUBLE               Estimated standard deviation of fragment length
                              (default: -l, -s values are estimated from paired
                               end data, but are required when using --single)
-t, --threads=INT             Number of threads to use (default: 1)
    --pseudobam               Save pseudoalignments to transcriptome to BAM file
    --genomebam               Project pseudoalignments to genome sorted BAM file
-g, --gtf                     GTF file for transcriptome information
                              (required for --genomebam)
-c, --chromosomes             Tab separated file with chromosome names and lengths
                              (optional for --genomebam, but recommended)
    --verbose                 Print out progress information every 1M proccessed reads

The commands I used are listed below:

bash ~/Miniconda3-latest-MacOSX-x86_64.sh -b -p $HOME/miniconda
source $HOME/miniconda/bin/activate
conda init zsh
conda info
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set offline false
conda create --name rnaseq 
conda activate rnaseq 
conda install -c bioconda kallisto
kallisto
conda install -c bioconda fastqc
conda install -c bioconda multiqc
conda activate rna seq 
kallisto index -i Homo_sapiens.GRCh38.cdna.all.index Homo_sapiens.GRCh38.cdna.all.fa
kallisto quant \
-i Homo_sapiens.GRCh38.cdna.all.index \
-o test \
-t 8 \
--single -l 250 -s 30 \
SRR8668755_1M_subsample.fastq.gz

Does anyone know how I might be able to correct this?

Thankyou

minconda seq analysis rna • 3.2k views
ADD COMMENT
0
Entering edit mode

give the correct path of Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz to kallisto index

ADD REPLY
0
Entering edit mode

Thankyou for your response. How exactly do I do this?

ADD REPLY
0
Entering edit mode

Should the command be kallisto index -i Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.index Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa

It gives an error message: Error: FASTA file not found Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa

kallisto 0.48.0
Builds a kallisto index

Usage: kallisto index [arguments] FASTA-files

Required argument:
-i, --index=STRING          Filename for the kallisto index to be constructed 

Optional argument:
-k, --kmer-size=INT         k-mer (odd) length (default: 31, max value: 31)
    --make-unique           Replace repeated target names with unique names
ADD REPLY
0
Entering edit mode

Make sure Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa is in your working directory (or give the absolute path to be safe), and that the extension of the file is actually .fa.gz.fa as written in your code.

ADD REPLY
0
Entering edit mode

Ok thanks, I have now done this.

I still get an error message: [quant] fragment length distribution is truncated gaussian with mean = 250, sd = 30 Error: incompatible indices. Found version 3472328451435676990, expected version 10 Rerun with index to regenerate%

Do you know how I might correct this?

ADD REPLY
0
Entering edit mode

You will need to regenerate the index with the kallisto index command.

ADD REPLY
0
Entering edit mode

Ok thankyou. If it takes a long time to load what should I do? I have tried on two macs

% kallisto index -i Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.index Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa

[build] loading fasta file Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 1554 target sequences
[build] warning: replaced 100005 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... 
ADD REPLY
0
Entering edit mode

It might take 10 - 20 minutes. Use a system monitor utility to watch out that you don't run out of memory. Good luck

ADD REPLY

Login before adding your answer.

Traffic: 1896 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6