What is the proper output for a psiclass_vote.gtf file, and how does one add gene names and strand to gtf files?
2
0
Entering edit mode
6 days ago
DdogBoss ▴ 20

I am trying to run psiclass in a docker container on files that have been aligned with STAR. After running psiclass, I am trying to add gene names and strand to the resulting vote_gtf file. I run psiclass through the following bash script:

#!/bin/bash -l

# Define base directories
BASEDIR=/home/projects/Mouse1
ALNDIR=${BASEDIR}/align
WORKDIR=${BASEDIR}/psiclass
ANNOT=${BASEDIR}/Data/gencode.vM30.region.gtf

# Define software and tools
SWDIR=/home/software/psiclass
PSICLASS=psiclass  # Assuming psiclass is in your PATH
ADDGENENAME=${SWDIR}/add-genename
CMDDIR=${BASEDIR}
ADDSTRAND=${BASEDIR}/add-strand

# Create working directory if it doesn't exist
mkdir -p ${WORKDIR}

# Create bamlist file
BAMLIST=${WORKDIR}/bamlist.psiclass
ls ${ALNDIR}/*_Aligned.sortedByCoord.out.bam > ${BAMLIST}

 Run PsiCLASS assembly
${PSICLASS} --lb ${BAMLIST} \
    -o ${WORKDIR}/psiclass \
    -p 10 &> ${WORKDIR}/psiclass.log


#add annotated gene names
mkdir -p ${WORKDIR}/WithGeneNames
${ADDGENENAME} ${ANNOT} ${WORKDIR}/psiclass_gtf.list -o ${WORKDIR}/WithGeneNames

exit;

# import strand and remove no-strand entries from the reference annotation; optionally, remove unknown genes (grep -v "novel")
${ADDSTRAND} ${ANNOT} -r < ${WORKDIR}/WithGeneNames/psiclass_vote.gtf > ${WORKDIR}/WithGeneNames/psiclass_vote.withStrand.gtf

wait

`

All the other gtf files seem to show the proper output, but the psiclass_vote.gtf looks like this:

"",/home/projects/Exam1/psiclass/psiclass_vote.gtf
"",/home/projects/Exam1/psiclass/psiclass_sample_0.gtf
"",/home/projects/Exam1/psiclass/psiclass_sample_1.gtf
"",/home/projects/Exam1/psiclass/psiclass_sample_2.gtf
"",/home/projects/Exam1/psiclass/psiclass_sample_3.gtf
"",/home/projects/Exam1/psiclass/psiclass_sample_4.gtf
"",/home/projects/Exam1/psiclass/psiclass_sample_5.gtf

And the result is that downstream output file in the folder WithGeneNames does not show anything and the program exits with an error saying that the psiclass_vote.gtf has an improper format.

Other pertinent information include the bamfile list which looks like so:

/home/projects/Exam1/align/1mo_Rep1_Aligned.sortedByCoord.out.bam
/home/projects/Exam1/align/1mo_Rep2_Aligned.sortedByCoord.out.bam
/home/projects/Exam1/align/1mo_Rep3_Aligned.sortedByCoord.out.bam
/home/projects/Exam1/align/4mo_Rep1_Aligned.sortedByCoord.out.bam
/home/projects/Exam1/align/4mo_Rep2_Aligned.sortedByCoord.out.bam
/home/projects/Exam1/align/4mo_Rep3_Aligned.sortedByCoord.out.bam

And align is where all the bam file alignments are located. Is there a better way to create a vote.gtf file that doesn't use psiclass, and what would be the proper format for the vote.gtf file? Add-strand is a custom perl script. As mentioned previously, I have properly formatted gtfs for samples 0-5 but without names and strand.

psiclass RNA-seq • 526 views
ADD COMMENT
0
Entering edit mode
5 days ago
DdogBoss ▴ 20

Correct format looks like this:

7   PsiCLASS    exon    1068049 1068120 1000    +   .   gene_id "7.208"; transcript_id "7.208.0"; exon_number "1"; FPKM "1097.710907"; TPM "1499.749806"; cov "3.870968"; gene_name "Zfp84";
7   PsiCLASS    exon    1068049 1068120 1000    +   .   gene_id "7.208"; transcript_id "7.208.0"; exon_number "1"; FPKM "1550.234977"; TPM "2148.598280"; cov "6.451613"; gene_name "Zfp84";
7   PsiCLASS    exon    1068049 1068120 1000    +   .   gene_id "7.208"; transcript_id "7.208.0"; exon_number "1"; FPKM "596.653371"; TPM "869.313122"; cov "1.935484"; gene_name "Zfp84";
`

After merging all the sample gtf files, I was able to figure out how the consensus was generated by looking at the Vote.cpp file on the psiclass documentation.

ADD COMMENT

Login before adding your answer.

Traffic: 1339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6