Dear Biostars community,
I have to build a reference for cellranger-arc-2.0.2. However, my organism is too large fro cell ranger, therefore, I had to split fasta file into smaller size chr and consequently modify GTF file:
- I changed annotation in the 1st column to align with names from Fasta,
- I used AGAT tool to remove any redundancies in 3rd column (gene, transcript, exon),
- I left 'gene_id', 'transcript_id', 'gene_name' in 9th column,
but I still run into errors from cellranger-arc mkref
. The latest error:
['7H-0-328847192', 'IPK', 'exon', '92282224', '92282629', '.', '+', '.', 'gene_id HORVU.MOREX.r3.7HG0666460; transcript_id HORVU.MOREX.r3.7HG0666460.1; gene_name HvNIP2;']
on line 488171 specifies an 'exon' annotation for a transcript HORVU.MOREX.r3.7HG0666460.1, but there is no 'transcript' row in the GTF for HORVU.MOREX.r3.7HG0666460.1 that immediately precedes it.
Please fix your GTF and start again.
when I check my GTF file in this row:
7H-0-328847192 IPK exon 92282224 92282629 . + . gene_id HORVU.MOREX.r3.7HG0666460; transcript_id HORVU.MOREX.r3.7HG0666460.1; gene_name HvNIP2;
7H-0-328847192 IPK transcript 92282224 92282629 . + . gene_id HORVU.MOREX.r3.7HG0666460; transcript_id HORVU.MOREX.r3.7HG0666460.1; gene_name HvNIP2;
Any help or advice on how to make GTF suitable for cell ranger would be appreciated.
Thank you
Error mentions .... immediately precedes it.
In the GTF the transcript row is not really preceding. Its after exon. Just bringing to notice.
Sorting the GFF/GTF with one of the available tools may fix this. I think the 'normal' sort order of these tools is position, then gene > transcript > exon > CDS
thank you so much for your responses. I had no idea there is own order for 'gene > transcript > exon > CDS' , will take this into consideration @Juke34 , I did modify GTF file, I added a missing transcript row, not sure why AGAT missed it
I start from the beginning in more details:
Populate for missing features: agat_convert_sp_gxf2gxf.pl --gtf ${INGTF} --out ${GFFOUT}.gff
Convert to GTF format: agat_convert_sp_gff2gtf.pl --gff ${GFFOUT}.gff --out ${GTFOUT}.gtf
sed -i
missing transcriptbasically, is missing a 'transcript' row a normal behaviour for AGAT tool?
Could you post the complete GTF file on bitbucket or pastebin?
https://bitbucket.org/irkost/gft_file/src/main/ 12_GTF_AGAT_formatted_quotes_rm.gtf (file size 196M)
this file is the output from AGAT (agat_convert_sp_gxf2gxf.pl and agat_convert_sp_gff2gtf.pl) + inner quotes removed
Could you provide more lines before and after the record? Did you modify your file after using AGAT? The transcript sounds to be after instead to before the exon line in your case. AGAT is not suppose to make something like that.