Question

i meet an error when i run the cuffmerge

0

Entering edit mode

7.3 years ago

1165576001 • 0

I used tophat, cufflinks to analyse clean reads of RNA-seq, and get the transcriptome expression profile of my samples. annotation.gtf and genome.fa I used in these program all work well, the working codes are below:

Tophat：

/home/share/bin/tophat -p 8 -G /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \
    -o /home/jianglin/ljiang/XP/results/'d10803_L5_I371_thout1/' \
    /home/jianglin/ljiang/XP/goat_ref/goat /home/jianglin/ljiang/XP/data/'d10803_L5_I371.R1.clean.fastq' \
    /home/jianglin/ljiang/XP/data/'d10803_L5_I371.R2.clean.fastq'

Cufflink： /home/share/bin/cufflinks -p 8 -G /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \ -o /home/jianglin/ljiang/XP/results/'d10803_L5_I371_clout' \ /home/jianglin/ljiang/XP/results/'d10803_L5_I371_thout1/accepted_hits.bam'

assemblies.txt=

/home/jianglin/ljiang/XP/results/d4502_L6_I367_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d4501_L4_I366_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d4503_L6_I368_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10801_L4_I369_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10802_L5_I370_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10803_L5_I371_clout/transcripts.gtf

Cuffmerge：

/home/share/software/cufflinks-2.2.1.Linux_x86_64/cuffmerge –o /home/jianglin/ljiang/XP/results/merged_asm \
    -g /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \
    -s /home/jianglin/ljiang/XP/goat_ref/goat.fa \
    -p 8 \
    /home/jianglin/ljiang/XP/results/assemblies.txt

BUT when I worked in cuffmerge to create a single merged transcriptome annotation, the working panel had these error warn:

 [Sun Jan 28 17:15:19 2018] Beginning transcriptome assembly merge
-------------------------------------------

[Sun Jan 28 17:15:19 2018] Preparing output location /home/jianglin/ljiang/XP/results/merged_asm/
[Sun Jan 28 17:15:29 2018] Converting GTF files to SAM
[17:15:29] Loading reference annotation.
[17:15:33] Loading reference annotation.
[17:15:37] Loading reference annotation.
[17:15:41] Loading reference annotation.
[17:15:45] Loading reference annotation.
[17:15:49] Loading reference annotation.
[Sun Jan 28 17:16:01 2018] Quantitating transcripts
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
Command line:
cufflinks -o /home/jianglin/ljiang/XP/results/merged_asm/ -F 0.05 -g /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 8 /home/jianglin/ljiang/XP/results/merged_asm/tmp/mergeSam_filezEJpqP 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File /home/jianglin/ljiang/XP/results/merged_asm/tmp/mergeSam_filezEJpqP doesn't appear to be a valid BAM file, trying SAM...
[17:16:21] Loading reference annotation.
[17:16:23] Inspecting reads and determining fragment length distribution.
Processed 22612 loci.                       
> Map Properties:
>       Normalized Map Mass: 218274.00
>       Raw Map Mass: 218274.00
>       Fragment Length Distribution: Truncated Gaussian (default)
>                     Default Mean: 200
>                  Default Std Dev: 80
[17:16:36] Assembling transcripts and estimating abundances.
Processed 22612 loci.                       
[Sun Jan 28 17:21:09 2018] Comparing against reference file /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
No fasta index found for /home/jianglin/ljiang/XP/goat_ref/goat.fa. Rebuilding, please wait..
Error: sequence lines in a FASTA record must have the same length!
        [FAILED]

Did that mean I need to index the genome.fa again? but I tried and failed to overcome this problem. i'll appreciate it if someone can solve this problem, THANK YOU!! ^ ^~~

RNA-Seq • 6.2k views

ADD COMMENT • link updated 4.5 years ago by pkmolbio • 0 • written 7.3 years ago by 1165576001 • 0

2

Entering edit mode

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

Please stop using Tophat https://t.co/Es4ohxOEyx Cole and I developed the method in *2008*. It was greatly improved in TopHat2 then HISAT & HISAT2. There is no reason to use it anymore. I have been saying this for years yet it has more citations this year than last #methodsmatter
— Lior Pachter (@lpachter) December 2, 2017

ADD REPLY • link 7.3 years ago by WouterDeCoster 48k

1

Entering edit mode

Tip: When posting code, use the code sample button to make it easier to read.

"EOF marker is absent" means that your BAM file has been truncated. Did Tophat produce a BAM or a SAM file? How did you convert from sam to bam?

ADD REPLY • link 7.3 years ago by tiago211287 ★ 1.5k

0

Entering edit mode

@arup sorry i can't reply to you directly so i only reply to you in a new answer section. Do you mean that the genome.fa and annotation.gtf have a mismatch between them？ But why can i work fluently tophat and cufflinks by using the same GTF and FASTA files.

ADD REPLY • link 7.3 years ago by 1165576001 • 0

0

Entering edit mode

You can. See C: How do I ask a question on Biostars?

Now you can move this to where it belongs using the following steps:

Copy the contents of your reply from this answer (you can edit this answer (link opens in a new tab) and do a Select All -> Copy)
Click on Add Comment on arup's post here: A: i meet an error when i run the cuffmerge
Paste the copied text
Click on the green Add Comment button
Click on moderate back in your answer here: A: i meet an error when i run the cuffmerge
Choose Delete Post
Click on the blue Submit button.

ADD REPLY • link 7.3 years ago by Ram 45k

0

Entering edit mode

Whatever browser people are using in China seems to have this odd behavior (not being able to use ADD COMMENT/ADD REPLY on BioStars). This could be due to users keeping scripting completely off in browsers or else who knows ...

ADD REPLY • link 7.3 years ago by GenoMax 151k

0

Entering edit mode

That's odd. China's Internet policies are strange.

ADD REPLY • link 7.3 years ago by Ram 45k

0

Entering edit mode

It's not their browser. A: i meet an error when i run the cuffmerge (Or someone mod-moved it to that spot)

ADD REPLY • link updated 7.3 years ago by WouterDeCoster 48k • written 7.3 years ago by Ram 45k

0

Entering edit mode

I moved it since it was posted as a new answer. That is the only option (not optimal as we have discussed many times in past).

ADD REPLY • link 7.3 years ago by GenoMax 151k

0

Entering edit mode

It is possible that your reference file is wrapped at n characters for some sequences where as others are one long string of [ACTG],

ADD REPLY • link 7.3 years ago by GenoMax 151k

0

Entering edit mode

Please check if the fasta file formatted properly with no extra lines between separate sequences or at the end.

ADD REPLY • link 7.3 years ago by Arup Ghosh 3.3k

0

Entering edit mode

@arup you are right. Some people said the length of each line in fasta file should be consistent and avoid unnecessary blank line and newline， or the system would notice the error:sequence lines in a FASTA record must have the same length!.... BUT how can i detect the consistency of the length in my fasta file and correct these error？ THANK　YOU！！！

ADD REPLY • link 7.3 years ago by 1165576001 • 0

0

Entering edit mode

You can try the sed solution posted in this post to clean-up your fasta file: A: Useful Bash Commands To Handle Fasta Files

ADD REPLY • link 7.3 years ago by GenoMax 151k

0

Entering edit mode

You can try fastx-toolkit to make the file uniform.

fasta_formatter -i input.fa | fastx_trimmer -l 45 > output.fa

ADD REPLY • link 7.3 years ago by Arup Ghosh 3.3k

score 1 · Answer 1 · 2018-01-29

Most probably the fasta file you are using not formatted properly or version of GTF and FASTA is different resulting in the error.

Error: sequence lines in a FASTA record must have the same length!

Ref: http://seqanswers.com/forums/archive/index.php/t-14419.html

To remove unnecessary line breaks use

sed -i '/^$/d' input.fa >output.fa

To make the fasta file of uniform length use FastX-toolkit

fasta_formatter -i input.fa | fastx_trimmer -l 45 > output.fa

score 0 · Answer 2 · 2020-10-28

0

Entering edit mode

4.5 years ago

pkmolbio • 0

Pass the fasta file through Bioedit and save the file in fasta format. Then this passed fasta file can be used for the cuffmerge analysis.

ADD COMMENT • link 4.5 years ago by pkmolbio • 0