Question

Express v_1.5.1 Error: Transcript IDs from MultiFASTA do not match with (SAM/BAM) file.

0

Entering edit mode

4.2 years ago

bioinfo89 ▴ 60

Hi All,

I am using express 1.5.1 to calculate the read counts for RNA Seq data. I have removed rRNA contamination from the raw data and I am using the reads to map the reference genome GRCh37. When I run express tool, it throws the following error:

WARNING: Could not connect to update server to verify current version. Please check at the eXpress website (http://bio.math.berkeley.edu/eXpress).
WARNING: Target 'ENST00000335137.3|ENSG00000186092.4|OTTHUMG00000001094.1|OTTHUMT00000003223.1|OR4F5-001|OR4F5|918|CDS:1-918|' exists in MultiFASTA but not alignment (SAM/BAM) file.

The command I am using is as follows:

express gencode.v19.pc_transcripts.fa tophat_out/accepted_hits.rmdup.bam --output-dir . --calc-covar

I understand from the error message that the transcript file I am using for quantification does not match with the alignment BAM file reference information. I am using the same reference genome file here. Does this step require me to map the reads to the transcript fasta before I do the quantification step? Any help would be highly appreciated.

Thank you!

rna-seq read counts • 1.2k views

ADD COMMENT • link updated 4.1 years ago by lakhujanivijay 5.9k • written 4.2 years ago by bioinfo89 ▴ 60

score 0 · Answer 1 · 2021-02-17

0

Entering edit mode

4.1 years ago

lakhujanivijay 5.9k

Hi bioinfo89

The error is invoked from this line in the source code (target.cpp).

Apparently, it is expecting the transcript fasta headers to match with BAM headers. Check if

grep "^> transcripts.fa and samtools view -H your.bam | grep "SQ" match.

ADD COMMENT • link 4.1 years ago by lakhujanivijay 5.9k