Question

RNA seq NGS data analysis

1

Entering edit mode

8.6 years ago

poonam.bi01 ▴ 20

hello I am runninh tophat for alingment the query sequence with the references genome sequences with this command and getting this error..

tophat2 -o SRR643880.sra_out --num-threads 5 --segment-length 18 GCF_000002495 SRR643880.fastq 

 Beginning TopHat run (v2.0.13)
-----------------------------------------------
 Checking for Bowtie
Bowtie version:2.2.4.0
Checking for Bowtie index files (genome)..   
Checking for reference FASTA file
Warning: Could not find FASTA file GCF_000002495.fa
Reconstituting reference FASTA file from Bowtie index
Executing: /usr/bin/bowtie2-inspect GCF_000002495 > SRR643880.sra_out/tmp/GCF_000002495.fa
Generating SAM header for GCF_000002495
Preparing reads
left reads: min. length=36, max. length=36, 12103256 kept reads (9047 discarded)
 Mapping left_kept_reads to genome GCF_000002495 with Bowtie2 
 Mapping left_kept_reads_seg1 to genome GCF_000002495 with Bowtie2 (1/2)
 Mapping left_kept_reads_seg2 to genome GCF_000002495 with Bowtie2 (2/2)
 Searching for junctions via segment mapping
Coverage-search algorithm is turned on, making this step very slow
Please try running TopHat again with the option (--no-coverage-search) if this step takes too much time or memory.
 Retrieving sequences for splices
 Indexing splices
Building a SMALL index
Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
Joining segment hits
Reporting output tracks
[FAILED]
Error running /usr/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SRR643880.sra_out/ --max-multihits 20 --max-seg-multihits 40 --segment-length 18 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p5 --no-closure-search --no-microexon-search --sam-header SRR643880.sra_out/tmp/GCF_000002495_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/usr/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 SRR643880.sra_out/tmp/GCF_000002495.fa SRR643880.sra_out/junctions.bed SRR643880.sra_out/insertions.bed SRR643880.sra_out/deletions.bed SRR643880.sra_out/fusions.out SRR643880.sra_out/tmp/accepted_hits SRR643880.sra_out/tmp/left_kept_reads.mapped.bam,SRR643880.sra_out/tmp/left_kept_reads.candidates.bam SRR643880.sra_out/tmp/left_kept_reads.bam
Warning: no input BAM records found.

RNA-Seq • 3.0k views

ADD COMMENT • link updated 8.6 years ago by GenoMax 147k • written 8.6 years ago by poonam.bi01 ▴ 20

2

Entering edit mode

@poonam.bi01: You are not running the latest versions of tophat/bowtie. Generally in case of tuxedo programs you should try and use the latest versions since these sorts of issues may have been addressed by new releases.
One more thing to check is you are not running out of storage space/hitting a quota.

ADD REPLY • link 8.6 years ago by GenoMax 147k

0

Entering edit mode

Where's the error? Did you get any outputs?

Are you referring to "Warning: Could not find FASTA file GCF_000002495.fa"?

ADD REPLY • link 8.6 years ago by jotan ★ 1.3k

1

Entering edit mode

i am getting the warning message Warning: no input BAM records found.

ADD REPLY • link 8.6 years ago by poonam.bi01 ▴ 20

1

Entering edit mode

Sorry, didn't see that at the end.

Did you get any output files? Tophat writes out temporary files.

ADD REPLY • link 8.6 years ago by jotan ★ 1.3k

GenoMax · Answer 1 · 2016-04-13

1

Entering edit mode

8.6 years ago

kanika.151 ▴ 160

Can you please write the command you gave for creating bowtie index?

Is it a single end data or paired end data?

Also, make sure that when you are running TopHat2 on your server you have enough memory available and space available. It can also be that the tmp files created had nothing inside it as the space is not available to store it.

ADD COMMENT • link 8.6 years ago by kanika.151 ▴ 160

0

Entering edit mode

bowtie2-build -f GCF_000005425.2_Build_4.0_genomic.fna GCF_000005425

i used this command for running bowtai. and there is no memory problem.

how to know about data is single end or paired end...??

ADD REPLY • link 8.6 years ago by poonam.bi01 ▴ 20

1

Entering edit mode

Paired end files have the naming convention filename_1 filename_2

If these are your own data, ask the person who created it.

If this is downloaded data, check the documentation.

If downloaded as an SRA, use --split-3 option with sra toolkit fastq-dump (fast-dump --split-3 filename.sra)

ADD REPLY • link 8.6 years ago by jotan ★ 1.3k

0

Entering edit mode

and for tophat i used command

tophat2 -o SRR643880.sra_out --num-threads 5  --segment-length 18 GCF_000002495 SRR643880.fastq     [ 6:53PM]

[2016-04-12 18:53:56] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2016-04-12 18:53:56] Checking for Bowtie
          Bowtie version:    2.2.4.0
[2016-04-12 18:53:56] Checking for Bowtie index files (genome)..
[2016-04-12 18:53:56] Checking for reference FASTA file
    Warning: Could not find FASTA file GCF_000002495.fa
[2016-04-12 18:53:56] Reconstituting reference FASTA file from Bowtie index
  Executing: /usr/bin/bowtie2-inspect GCF_000002495 > SRR643880.sra_out/tmp/GCF_000002495.fa
[2016-04-12 18:53:57] Generating SAM header for GCF_000002495
[2016-04-12 18:53:57] Preparing reads
     left reads: min. length=36, max. length=36, 12103256 kept reads (9047 discarded)
[2016-04-12 18:54:55] Mapping left_kept_reads to genome GCF_000002495 with Bowtie2 
[2016-04-12 18:57:57] Mapping left_kept_reads_seg1 to genome GCF_000002495 with Bowtie2 (1/2)
[2016-04-12 19:00:18] Mapping left_kept_reads_seg2 to genome GCF_000002495 with Bowtie2 (2/2)
[2016-04-12 19:02:07] Searching for junctions via segment mapping
    Coverage-search algorithm is turned on, making this step very slow
    Please try running TopHat again with the option (--no-coverage-search) if this step takes too much time or memory.
[2016-04-12 19:05:13] Retrieving sequences for splices
[2016-04-12 19:05:14] Indexing splices
Building a SMALL index
[2016-04-12 19:05:15] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
[2016-04-12 19:06:40] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
[2016-04-12 19:11:19] Joining segment hits
[2016-04-12 19:12:52] Reporting output tracks
    [FAILED]
Error running /usr/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SRR643880.sra_out/ --max-multihits 20 --max-seg-multihits 40 --segment-length 18 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p5 --no-closure-search --no-microexon-search --sam-header SRR643880.sra_out/tmp/GCF_000002495_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/usr/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 SRR643880.sra_out/tmp/GCF_000002495.fa SRR643880.sra_out/junctions.bed SRR643880.sra_out/insertions.bed SRR643880.sra_out/deletions.bed SRR643880.sra_out/fusions.out SRR643880.sra_out/tmp/accepted_hits SRR643880.sra_out/tmp/left_kept_reads.mapped.bam,SRR643880.sra_out/tmp/left_kept_reads.candidates.bam SRR643880.sra_out/tmp/left_kept_reads.bam
Warning: no input BAM records found.

ADD REPLY • link updated 8.6 years ago by GenoMax 147k • written 8.6 years ago by poonam.bi01 ▴ 20

score 1 · Answer 2 · 2016-04-13

1

Entering edit mode

8.6 years ago

kanika.151 ▴ 160

What do you have in your out directory?

Your data seems to be downloaded from SRA website and in their description they talk about the data being PE or SE? where PE stands for Paired end and SE stands for single end data.

your SRR643880.fastq can be left or right fastq files for PE data. If that is the case you need to find another part of it. I think it is a paired end file and it is the right fastq file. you need to find the left one as nothing got aligned to the genome which should come from left.fastq

command for PE data:

/opt/tophat2.10/tophat2 -p 8 -o tophat_out <indexsuffixfilename> <left.fastq> <right.fastq>

ADD COMMENT • link 8.6 years ago by kanika.151 ▴ 160

0

Entering edit mode

Try to keep discussion as comments (do not post a new answer) unless you are offering a new answer.

The dataset in question is not a PE dataset. SRR643880 is SE.

ADD REPLY • link 8.6 years ago by GenoMax 147k