Hi guys!
I'm new to RNAseq data analysis and related bioinformatic pipelines. I just aligned my PE-reads to genome using Tophat2:
tophat2 -p 4 -G ../Arabidopsis_thaliana.TAIR10.31.gtf ../Arabidopsis_thaliana.TAIR10.31.dna.genome PE.reads.1.fastq.gz PE.reads.2.fastq.gz
I then get the default output files: accepted_hits.bam etc.,
"alignment_summary.txt" says:
Left reads: Input: 37579106 Mapped : 35638921 (94.8% of input) of these: 703952 ( 2.0%) have multiple alignments (23 have >20) Right reads: Input : 37579106 Mapped : 32462213 (86.4% of input) of these: 614654 ( 1.9%) have multiple alignments (23 have >20) 90.6% overall read mapping rate.
Aligned pairs: 31803489 of these: 601459 ( 1.9%) have multiple alignments 5799 ( 0.0%) are discordant alignments 84.6% concordant pair alignment rate.
For some further downstream processing steps of my data I need sam instead of bam. That's why I wanted to use samtools to convert bam to sam (as already described in some threads here). I know that it is also possible to create a sam output by using "--no-convert-bam" but I forgot to add this in my tophat2 run.
So what I did: samtools view -h -o accepted_hits.sam accepted_hits.bam
Error message:
samtools view: writing to "accepted_hits_sam" failed: File too large samtools view: error closing "accepted_hits.sam": -1
I basically don't know why the file should be too large now. Did anyone face this problem before? Thanks in advance!
How about (if you are using an older version of samtools) :
samtools view -h accepted_hits.bam > accepted_hits.sam