why count_matrix.csv generated by "prepDE.py3" showed unidentified IDs
0
0
Entering edit mode
23 months ago
Pegasus ▴ 120

Hi everyone,

I carried out a transcript assembly on my RNA-seq data using StringTie. The transcripts were mapped to a bacterial genome reference using bowtie2, and merged using the "prepDE.py3" script in Python3. This generated two output files: the gene_count_matrix and the transcript_count_matrix, which contain count data for genes and transcripts, respectively.

However, I encountered two issues:

  1. All gene IDs in the gene_count_matrix are labeled as "MSTRG". I am unsure if this labeling will affect the downstream analysis, as this issue has been raised before.

enter image description here

  1. All transcript IDs in the transcript_count_matrix are labeled as "GOHBADNI". This ID name appears to be customized, however, I analyzed my own data using my mac terminal and linux, and so I'm not sure where this ID name came from.

enter image description here

The IDs in the gene_count_matrix and transcript_count_matrix files don't match and unsure whether this could affect the downstream differential gene expression analysis (Ballgown, Deseq2, edgR)

I would greatly appreciate any help in resolving these issues.

Stringtie RNA-seq • 527 views
ADD COMMENT

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6