Question

cell ranger custom gtf file

0

Entering edit mode

23 months ago

Arora • 0

I wish to make a custom gtf file using a multiline fasta file which has multiple transcripts. e.g.,

>NM_001282823.1 prolactin receptor (PRLR), mRNA
GCCAAGAGACTGGGAGTCAAAGAAAGTTTCTGAAATCAGTGGATTCTGCTTGAGAACAGAGCCTGGTTAT
>NM_001682822.1 SNAP25 (SNAP25), mRNA
GCCAAGAGACTGGGAGTCAAAGAAAGTTTCTGAAATCAGTGGATTCTGCTTGAGAACAGAGCCTGGTTAT
>NM_001287822.1 CACNA1F (CACNA1F), mRNA
GCCAAGAGACTGGGAGTCAAAGAAAGTTTCTGAAATCAGTGGATTCTGCTTGAGAACAGAGCCTGGTTAT

Is there a way I could make a gtf file using the commands below as mentioned by 10x (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_mr#marker), but in a way it would output a gtf file containing information for all fasta entries rather than adding one by one?

cat NM_001282823.1 | grep -v "^>" | tr -d "\n" | wc -c
echo -e 'NM_001282823.1\tunknown\texon\t1\t922\t.\t+\t.\tgene_id "NM_001282823.1"; transcript_id "NM_001282823.1"; gene_name "NM_001282823.1"; gene_biotype "protein_coding";' > NM_001282823.1.gtf

gtf 10x single-cell • 565 views

ADD COMMENT • link updated 23 months ago by Ram 45k • written 23 months ago by Arora • 0

0

Entering edit mode

Look for ways to loop over entries and write a GTF based on the FASTA header. BioPython might be useful here. 10X's method is not meant to be used to put together an entire GTF like you're doing right now, so that part is going to be on you.

ADD REPLY • link 23 months ago by Ram 45k