I used tophat, cufflinks to analyse clean reads of RNA-seq, and get the transcriptome expression profile of my samples. annotation.gtf and genome.fa I used in these program all work well, the working codes are below:
Tophat:
/home/share/bin/tophat -p 8 -G /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \
-o /home/jianglin/ljiang/XP/results/'d10803_L5_I371_thout1/' \
/home/jianglin/ljiang/XP/goat_ref/goat /home/jianglin/ljiang/XP/data/'d10803_L5_I371.R1.clean.fastq' \
/home/jianglin/ljiang/XP/data/'d10803_L5_I371.R2.clean.fastq'
Cufflink: /home/share/bin/cufflinks -p 8 -G /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \ -o /home/jianglin/ljiang/XP/results/'d10803_L5_I371_clout' \ /home/jianglin/ljiang/XP/results/'d10803_L5_I371_thout1/accepted_hits.bam'
assemblies.txt=
/home/jianglin/ljiang/XP/results/d4502_L6_I367_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d4501_L4_I366_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d4503_L6_I368_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10801_L4_I369_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10802_L5_I370_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10803_L5_I371_clout/transcripts.gtf
Cuffmerge:
/home/share/software/cufflinks-2.2.1.Linux_x86_64/cuffmerge –o /home/jianglin/ljiang/XP/results/merged_asm \
-g /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \
-s /home/jianglin/ljiang/XP/goat_ref/goat.fa \
-p 8 \
/home/jianglin/ljiang/XP/results/assemblies.txt
BUT when I worked in cuffmerge to create a single merged transcriptome annotation, the working panel had these error warn:
[Sun Jan 28 17:15:19 2018] Beginning transcriptome assembly merge
-------------------------------------------
[Sun Jan 28 17:15:19 2018] Preparing output location /home/jianglin/ljiang/XP/results/merged_asm/
[Sun Jan 28 17:15:29 2018] Converting GTF files to SAM
[17:15:29] Loading reference annotation.
[17:15:33] Loading reference annotation.
[17:15:37] Loading reference annotation.
[17:15:41] Loading reference annotation.
[17:15:45] Loading reference annotation.
[17:15:49] Loading reference annotation.
[Sun Jan 28 17:16:01 2018] Quantitating transcripts
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
Command line:
cufflinks -o /home/jianglin/ljiang/XP/results/merged_asm/ -F 0.05 -g /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 8 /home/jianglin/ljiang/XP/results/merged_asm/tmp/mergeSam_filezEJpqP
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File /home/jianglin/ljiang/XP/results/merged_asm/tmp/mergeSam_filezEJpqP doesn't appear to be a valid BAM file, trying SAM...
[17:16:21] Loading reference annotation.
[17:16:23] Inspecting reads and determining fragment length distribution.
Processed 22612 loci.
> Map Properties:
> Normalized Map Mass: 218274.00
> Raw Map Mass: 218274.00
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[17:16:36] Assembling transcripts and estimating abundances.
Processed 22612 loci.
[Sun Jan 28 17:21:09 2018] Comparing against reference file /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
No fasta index found for /home/jianglin/ljiang/XP/goat_ref/goat.fa. Rebuilding, please wait..
Error: sequence lines in a FASTA record must have the same length!
[FAILED]
Did that mean I need to index the genome.fa again? but I tried and failed to overcome this problem. i'll appreciate it if someone can solve this problem, THANK YOU!! ^ ^~~
You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.
Tip: When posting code, use the code sample button to make it easier to read.
"EOF marker is absent" means that your BAM file has been truncated. Did Tophat produce a BAM or a SAM file? How did you convert from sam to bam?
@arup sorry i can't reply to you directly so i only reply to you in a new answer section. Do you mean that the genome.fa and annotation.gtf have a mismatch between them? But why can i work fluently tophat and cufflinks by using the same GTF and FASTA files.
You can. See C: How do I ask a question on Biostars?
Now you can move this to where it belongs using the following steps:
Select All
->Copy
)Add Comment
on arup's post here: A: i meet an error when i run the cuffmergeAdd Comment
buttonmoderate
back in your answer here: A: i meet an error when i run the cuffmergeDelete Post
Submit
button.Whatever browser people are using in China seems to have this odd behavior (not being able to use
ADD COMMENT/ADD REPLY
on BioStars). This could be due to users keeping scripting completely off in browsers or else who knows ...That's odd. China's Internet policies are strange.
It's not their browser. A: i meet an error when i run the cuffmerge (Or someone mod-moved it to that spot)
I moved it since it was posted as a new answer. That is the only option (not optimal as we have discussed many times in past).
It is possible that your reference file is wrapped at
n
characters for some sequences where as others are one long string of[ACTG]
,Please check if the fasta file formatted properly with no extra lines between separate sequences or at the end.
@arup you are right. Some people said the length of each line in fasta file should be consistent and avoid unnecessary blank line and newline, or the system would notice the error:sequence lines in a FASTA record must have the same length!.... BUT how can i detect the consistency of the length in my fasta file and correct these error? THANK YOU!!!
You can try the
sed
solution posted in this post to clean-up your fasta file: A: Useful Bash Commands To Handle Fasta FilesYou can try fastx-toolkit to make the file uniform.