Question

Reference Transcriptome for Drosophila Melanogaster[orgn] with cellranger mkref

0

Entering edit mode

4.8 years ago

el24 ▴ 40

Hi all, I am new to bioinformatics, so I was wondering if someone can help me with some issues I have with cellranger. I'm trying to run cellranger count on Drosophila melanogaster data, but I need a transcriptome reference to run it. I use this link to create the transcriptome reference file using genome sequence (FASTA) and gene annotations (GTF). Based on that, in Ensembl, the recommended genome file to download is annotated as "primary assembly." In NCBI, it is "no alternative - analysis set." I couldn't find either of the titles on Ensemble or NCBI. I used a couple of different files (GTF and FASTA) on Flybase or NCBI, but I couldn't create a reference transcriptome using them as I got errors. Then, I tried below files, to create the reference:

ftp://ftp.ensemblgenomes.org/pub/metazoa/release46/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.28.dna.toplevel.fa.gz ftp://ftp.ensembl.org/pub/release-77/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.77.gtf.gz

I managed to create the reference file, but when I run cellranger count using this reference transcriptome, I get an error for different replicates. To be more specific, the error is "Low Fraction Reads Confidently Mapped To Transcriptome" that says I got "19.0%, but Ideal > 30%. This can indicate the use of the wrong reference transcriptome, a reference transcriptome with overlapping genes, poor library quality, poor sequencing quality, or reads shorter than the recommended minimum. Application performance may be affected."

Could you please tell me where I can find a reference transcriptome or where I can find a better GTF and FASTA files to create the reference myself? I appreciate your response, thanks!

gene software error • 3.5k views

ADD COMMENT • link updated 4.8 years ago by benformatics 4.0k • written 4.8 years ago by el24 ▴ 40

score 2 · Accepted Answer · 2020-02-02

2

Entering edit mode

4.8 years ago

benformatics 4.0k

You downloaded the dm3 GTF and used the dm6 genome (i.e. FASTA).

Please use the most current version (as of Feb 2020) and make sure you are using annotations than match your genome.

FASTA:

ftp://ftp.ensembl.org/pub/release-99/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.28.dna.toplevel.fa.gz

GTF:

ftp://ftp.ensembl.org/pub/release-99/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.28.99.chr.gtf.gz

ADD COMMENT • link 4.8 years ago by benformatics 4.0k

1

Entering edit mode

If it wasn't clear from the post.

Because your GTF file is in the old Drosophila genome (dm3) coordinate system and your .fasta file was the sequence for the newest Drosophila genome (dm6) - a huge number of genes' coordinates will be incorrect for your reference and are thus the most likely reason for your low fraction of mapped reads.

ADD REPLY • link 4.8 years ago by benformatics 4.0k

0

Entering edit mode

It was very clear, thank you very much for explaining the solution!

ADD REPLY • link 4.8 years ago by el24 ▴ 40

0

Entering edit mode

Thank you for your help! I got a warning after running cellranger count on two replicates (the third one worked just fine) using the files that you have mentioned. My warning says *"Low Fraction Reads in Cells which is because I got a 61.3%, but Ideal > 70%. Application performance may be affected. Many of the reads were not assigned to cell-associated barcodes. This could be caused by high levels of ambient RNA or by a significant population of cells with a low RNA content, which the algorithm did not call as cells. The latter case can be addressed by inspecting the data to determine the appropriate cell count and using --force-cells."*

Do you think it's a good idea to use --force-cells? I would really appreciate it if you have any recommendations to fix this.

ADD REPLY • link 4.8 years ago by el24 ▴ 40

1

Entering edit mode

Ideal is 100% but I frankly don't have much experience with 10X sequencing specifically. For other scRNA-seq technologies we see a huge variation in alignment %s. I would say if you are working with patient samples, especially in the case of disease, that the cell quality is often much lower. I personally would move forward with alignment rates over 50-60%. However, it would be wise to go in and make sure that there are good correlations between all the replicates. On the other hand if you are using something like cell lines... then this does seem a bit low.

If I was in your position, I would compare the results using "--force-cells" to the results without using it to see if I really believe in the added cells.

ADD REPLY • link 4.8 years ago by benformatics 4.0k

2

Entering edit mode

I would say if you are working with patient samples, especially in the case of disease

Since original question is about flies we can safely eliminate that possibility :-)

ADD REPLY • link 4.8 years ago by GenoMax 147k