Entering edit mode
8.9 years ago
Yuka Takemon
▴
40
Hello,
The following is my script:
#!/bin/bash -l
#PBS -l nodes=1:ppn=20,walltime=24:00:00
module load cufflinks/2.2.1
##define directories and other variables
#directory to input cufflinks
dir_cufflinks=/pathto/Annotation/cufflinks_preqc
#dir to reference annotation
dir_ref_annotation=/pathto/genome_index/Mus_musculus/UCSC/mm10/Annotation/Genes
#dir to DNA seq for reference
dir_ref_genome_seq=/pathto/genome_index/Mus_musculus/UCSC/mm10/Sequence/Chromosomes
cuffmerge -o ${dir_cufflinks}/cuffmerge -g ${dir_ref_annotation}/genes.gtf -s ${dir_ref_genome_seq}/*.fa -p 20 ${dir_cufflinks}/all_transcripts.txt
I want to note that:
all_transcripts.txt
lists .gtf files that came out of cufflinks
-g genes.gtf
was previously used with cufflinks as a reference annotation
-s
contains individual Chr*.fa
files
However I am getting the following error:
[Mon Dec 14 17:33:45 2015] Beginning transcriptome assembly merge
-------------------------------------------
[Mon Dec 14 17:33:45 2015] Preparing output location /pathto/Annotation/cufflinks_preqc/cuffmerge/
Traceback (most recent call last):
File "/opt/compsci/cufflinks/2.2.1/cuffmerge", line 580, in <module>
sys.exit(main())
File "/opt/compsci/cufflinks/2.2.1/cuffmerge", line 538, in main
gtf_input_files = test_input_files(transfrag_list_file)
File "/opt/compsci/cufflinks/2.2.1/cuffmerge", line 268, in test_input_files
g = open(line,"r")
IOError: [Errno 2] No such file or directory: '>chr11'
Is there something obvious I am missing here? Any input/help is appreciated
Thanks Dan! It looks like it ran, with a new merged.gtf file. But I'm currious if you can help me understand the warning I got:
for each chromosome, each with
/chrX_GLXXXX_random{.fa,.fasta}
suffix. Is this something I can ignore?Do you have the those contigs in the directory as FASTA files? My guess would be that they are in your transcriptome GTF file but that you don't have FASTA sequences for them in your directory. Its probably best to have them, although in my experience there isn't much in the way of known genes/protein-coding transcripts on these contigs so it probably won't have a huge impact on your analysis but it is always better to be more complete.
I dug around some more and noticed that the chrX_GLXX_random appears in my .gtf that came out of cufflinks, but not in the reference .gtf. So this is most likely the issue I am having here.
Thanks for you help!