Kallisto bustools index
1
0
Entering edit mode
3.5 years ago
Shawn ▴ 20

I am trying to build an index for a single nuc experiment using Kallisto, but I was wondering if someone could please help breakdown the following for kb ref

I am a bit confused on what exactly the significance of t2g.txt, cdna_t2c.txt, and intron_t2c.txt are

I am also not 100% sure about the difference between lamanno vs nucleus on the workflow

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa \
-c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 8 \
Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz
single-nuc Kallisto • 2.7k views
ADD COMMENT
0
Entering edit mode

Hi there,

First of all thank you for openning this issue, it helped me better understand the nature of the command parameters. But here is my issue :

I am buiding a mouse index using kb ref with the following command line :

kb ref -i index_mm_98.idx -g t2g.txt -f1 /home/younsi/cdna.fa -f2 ./introns.fa -c1 cDNA_t2c.txt -c2 introns_t2c.txt --workflow=lamanno ./Mus_musculus.GRCm38.cdna.all.fa ./Mus_musculus.GRCm38.98.gtf --overwrite

In my case, after running the kb ref, t2g.txt, cDNA_t2c.txt and introns_t2c.txt files are... EMPTY. As I look back at the kb --help, I understand that all of these parameters are supposed to be generated :

required arguments:

  -i INDEX              Path to the kallisto index **to be constructed.**

  -g T2G                Path to transcript-to-gene mapping **to be generated**

  -f1 FASTA             [Optional with -d] Path to the cDNA FASTA (lamanno, nucleus) or mismatch FASTA (kite) **to be generated**


required arguments for `lamanno` and `nucleus` workflows:

  -f2 FASTA             Path to the intron FASTA **to be generated**

  -c1 T2C               Path **to generate** cDNA transcripts-to-capture

  -c2 T2C               Path **to generate** intron transcripts-to-capture

There is something very unclear to me, what could I be doing wrong ? How did you solve you problem GenoMax ?

Thanks in advance for your help

ADD REPLY
0
Entering edit mode

Please create a new question rather than posting this as an answer to an existing question.

ADD REPLY
0
Entering edit mode

Also, you cross-posted here: https://github.com/pachterlab/kallistobustools/issues/44

(and I answered there)

ADD REPLY
1
Entering edit mode
3.5 years ago
dsull ★ 6.9k

You want to feed those files into kb count

Briefly, t2g.txt contains the transcripts-to-gene mappings, cdna_t2c.txt contains all the cDNA (spliced) transcripts, and intron_t2c.txt contains all the "intronic" (i.e. unspliced) transcripts.

nucleus is used for single-nucleus data while lamanno is used for RNA velocity. There are subtle differences between the two workflows (e.g. for nucleus, the spliced+unspliced matrices are added up while for RNA velocity, separate matrices are generated that can be fed directly into the velocyto workflow).

ADD COMMENT
0
Entering edit mode

Now I am a bit confused.

So when running

kb ref

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa \
-c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 8 \

does the t2g.txt, cdna.fa, intron.fa cdna_t2c.txt, intron_t2c.txt get generated?

One of the reasons I am confused is because I was sent files that were built using the comparative annotation toolkit with a few additional items and I haven't fully made sense of everything.

However, one of the things I am seeing is t2g.txt files, such as cDNA_introns_t2g.txt, introns_t2g.txt, cDNA_t2g.txt, cDNA.fa, introns.fa etc

So, part of me thought these are needed when building the index

ADD REPLY
1
Entering edit mode

Correct, all those files are generated via kb ref. They are not needed for building the index; rather, the index building step generates those files.

ADD REPLY

Login before adding your answer.

Traffic: 2239 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6