i meet an error when i run the cuffmerge
2
0
Entering edit mode
6.8 years ago
1165576001 • 0

I used tophat, cufflinks to analyse clean reads of RNA-seq, and get the transcriptome expression profile of my samples. annotation.gtf and genome.fa I used in these program all work well, the working codes are below:

Tophat:

/home/share/bin/tophat -p 8 -G /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \
    -o /home/jianglin/ljiang/XP/results/'d10803_L5_I371_thout1/' \
    /home/jianglin/ljiang/XP/goat_ref/goat /home/jianglin/ljiang/XP/data/'d10803_L5_I371.R1.clean.fastq' \
    /home/jianglin/ljiang/XP/data/'d10803_L5_I371.R2.clean.fastq'

Cufflink: /home/share/bin/cufflinks -p 8 -G /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \ -o /home/jianglin/ljiang/XP/results/'d10803_L5_I371_clout' \ /home/jianglin/ljiang/XP/results/'d10803_L5_I371_thout1/accepted_hits.bam'

assemblies.txt=

/home/jianglin/ljiang/XP/results/d4502_L6_I367_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d4501_L4_I366_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d4503_L6_I368_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10801_L4_I369_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10802_L5_I370_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10803_L5_I371_clout/transcripts.gtf

Cuffmerge:

/home/share/software/cufflinks-2.2.1.Linux_x86_64/cuffmerge –o /home/jianglin/ljiang/XP/results/merged_asm \
    -g /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \
    -s /home/jianglin/ljiang/XP/goat_ref/goat.fa \
    -p 8 \
    /home/jianglin/ljiang/XP/results/assemblies.txt

BUT when I worked in cuffmerge to create a single merged transcriptome annotation, the working panel had these error warn:

 [Sun Jan 28 17:15:19 2018] Beginning transcriptome assembly merge
-------------------------------------------

[Sun Jan 28 17:15:19 2018] Preparing output location /home/jianglin/ljiang/XP/results/merged_asm/
[Sun Jan 28 17:15:29 2018] Converting GTF files to SAM
[17:15:29] Loading reference annotation.
[17:15:33] Loading reference annotation.
[17:15:37] Loading reference annotation.
[17:15:41] Loading reference annotation.
[17:15:45] Loading reference annotation.
[17:15:49] Loading reference annotation.
[Sun Jan 28 17:16:01 2018] Quantitating transcripts
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
Command line:
cufflinks -o /home/jianglin/ljiang/XP/results/merged_asm/ -F 0.05 -g /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 8 /home/jianglin/ljiang/XP/results/merged_asm/tmp/mergeSam_filezEJpqP 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File /home/jianglin/ljiang/XP/results/merged_asm/tmp/mergeSam_filezEJpqP doesn't appear to be a valid BAM file, trying SAM...
[17:16:21] Loading reference annotation.
[17:16:23] Inspecting reads and determining fragment length distribution.
Processed 22612 loci.                       
> Map Properties:
>       Normalized Map Mass: 218274.00
>       Raw Map Mass: 218274.00
>       Fragment Length Distribution: Truncated Gaussian (default)
>                     Default Mean: 200
>                  Default Std Dev: 80
[17:16:36] Assembling transcripts and estimating abundances.
Processed 22612 loci.                       
[Sun Jan 28 17:21:09 2018] Comparing against reference file /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
No fasta index found for /home/jianglin/ljiang/XP/goat_ref/goat.fa. Rebuilding, please wait..
Error: sequence lines in a FASTA record must have the same length!
        [FAILED]

Did that mean I need to index the genome.fa again? but I tried and failed to overcome this problem. i'll appreciate it if someone can solve this problem, THANK YOU!! ^ ^~~

RNA-Seq • 5.7k views
ADD COMMENT
2
Entering edit mode

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLY
1
Entering edit mode

Tip: When posting code, use the code sample button to make it easier to read.

"EOF marker is absent" means that your BAM file has been truncated. Did Tophat produce a BAM or a SAM file? How did you convert from sam to bam?

ADD REPLY
0
Entering edit mode

@arup sorry i can't reply to you directly so i only reply to you in a new answer section. Do you mean that the genome.fa and annotation.gtf have a mismatch between them? But why can i work fluently tophat and cufflinks by using the same GTF and FASTA files.

ADD REPLY
0
Entering edit mode

You can. See C: How do I ask a question on Biostars?

Now you can move this to where it belongs using the following steps:

ADD REPLY
0
Entering edit mode

Whatever browser people are using in China seems to have this odd behavior (not being able to use ADD COMMENT/ADD REPLY on BioStars). This could be due to users keeping scripting completely off in browsers or else who knows ...

ADD REPLY
0
Entering edit mode

That's odd. China's Internet policies are strange.

ADD REPLY
0
Entering edit mode

It's not their browser. A: i meet an error when i run the cuffmerge (Or someone mod-moved it to that spot)

ADD REPLY
0
Entering edit mode

I moved it since it was posted as a new answer. That is the only option (not optimal as we have discussed many times in past).

ADD REPLY
0
Entering edit mode

It is possible that your reference file is wrapped at n characters for some sequences where as others are one long string of [ACTG],

ADD REPLY
0
Entering edit mode

Please check if the fasta file formatted properly with no extra lines between separate sequences or at the end.

ADD REPLY
0
Entering edit mode

@arup you are right. Some people said the length of each line in fasta file should be consistent and avoid unnecessary blank line and newline, or the system would notice the error:sequence lines in a FASTA record must have the same length!.... BUT how can i detect the consistency of the length in my fasta file and correct these error? THANK YOU!!!

ADD REPLY
0
Entering edit mode

You can try the sed solution posted in this post to clean-up your fasta file: A: Useful Bash Commands To Handle Fasta Files

ADD REPLY
0
Entering edit mode

You can try fastx-toolkit to make the file uniform.

fasta_formatter -i input.fa | fastx_trimmer -l 45 > output.fa
ADD REPLY
1
Entering edit mode
6.8 years ago

Most probably the fasta file you are using not formatted properly or version of GTF and FASTA is different resulting in the error.

Error: sequence lines in a FASTA record must have the same length!

Ref: http://seqanswers.com/forums/archive/index.php/t-14419.html

To remove unnecessary line breaks use

sed -i '/^$/d' input.fa >output.fa

To make the fasta file of uniform length use FastX-toolkit

fasta_formatter -i input.fa | fastx_trimmer -l 45 > output.fa
ADD COMMENT
0
Entering edit mode
4.1 years ago
pkmolbio • 0

Pass the fasta file through Bioedit and save the file in fasta format. Then this passed fasta file can be used for the cuffmerge analysis.

ADD COMMENT

Login before adding your answer.

Traffic: 2122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6