eXpress error gene length different
0
0
Entering edit mode
9.7 years ago
bharata1803 ▴ 560

Hello,

I aligned my sequence file to transcriptome sequence for chromosome 2 I downloaded from UCSC. I aligned it with bowtie2 and it seemed the alignment process was good without error. I ran eXpress software after that with the same Fasta file for aligning and the SAM result from the alignment as input. But, the eXpress software then gave an error about the length difference in a gene between FASTA file and SAM file. Can anyone help me? Thank you in advance for your help.

eXpress rna-seq • 2.8k views
ADD COMMENT
0
Entering edit mode

Do you have the full error message?

ADD REPLY
0
Entering edit mode

Can you double check that no transcript names appear twice? I quick way to do this would be as follows:

grep "^@SQ" alignments.sam | wc -l
grep "^@SQ" alignments.sam | cut -f 2 | sort | uniq | wc -l

Both of those should yield the same value. If they don't then you have duplicate names, which could presumably cause this problem.

ADD REPLY
0
Entering edit mode

Thank you for your reply. The error is like this :

Target '<some gene name>' differs in length between MultiFASTA and alignment (SAM/BAM) files (<some number> vs. <another number>).

I tried to check the FASTA file in IGV and IGV also gives error about length for some genes. It's a bit strange because I download it directly from UCSC. Probably my option when generating the transcriptome sequence is wrong. The head of the file looks like this:

>hg38_knownGene_uc010yim.2 range=chr2:38814-46588 5'pad=0 3'pad=0 strand=- repeatMasking=none
GGTCGGGAGGGGCGGGAACTGGGGAGTTTCGCCAATCGGTGCAGACTCCG
GTTCTCCCCAACCCCGGGCCTGGGCAGCTGGGCACTGGGGCGCCGTTCCC
CACCCGGCGCGGACTTCGGAACCGGCCCTGGGACTGACTGGCTCTACCAC
TCGAGCGCGTCTCCGCTGGACCCGGAACCCCGGTCGGTCCATTCCCCGCG

Btw, I generate the file the same way I generate GTF file from table browser in UCSC website but I choose the output format option to sequence not GTF. Is it right?

I also tried to upload the GTF file via custom tracks and download the sequence. This method works. The IGV doesn't give any error.

ADD REPLY
0
Entering edit mode

You should definitely try out Devon's code. It is likely that the header information might be the root of the problem.

(unrelated question here, how do you tag another user?)

ADD REPLY
0
Entering edit mode

If by tagging you mean to have someone notified as well, there's no way to do that in the current Biostars codebase (that I recall seeing at least). Many of us just tweet at each other at this point. We should really recommend this sort of thing for Biostars 3.0 though, since I like being able to just @someone and have them notified.

ADD REPLY
0
Entering edit mode

Btw, I want to ask a basic question. I only interested for a specific gene and I already know the position of the gene (in chromosome 2). Can I do the alignment and express call just using the chromosome 2 transcriptome sequence? I have done that yesterday and the result is only containing 19 gene. Is this method reliable to create a read count matrix for DESeq2 input?

ADD REPLY
0
Entering edit mode

You can do that, but keep in mind that it will lead to increased false-positive alignments and, therefore, increased noise. If you're only doing this to quickly check something then that's probably fine, but I would generally avoid aligning only to a specific chromosome or small genomic region.

ADD REPLY
0
Entering edit mode

Thank you for your advice. It's better to align to the whole sequence then.

ADD REPLY
0
Entering edit mode

The result is the same. So I think there is no duplicate names.

The error when I want to display the fasta file in IGV:

Fasta file has uneven line lengths in contig hg38_knownGene_uc031upc.1_0
ADD REPLY
0
Entering edit mode

Ah, that error is completely different than what I had thought then. Fasta indexing needs files where each line has a fixed maximum length and where any line with length less than the maximum is the last line of an entry. This can likely be changed with a combination of awk (to remove line endings) and then piping that to fold.

ADD REPLY

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6