Hi everyone !
This is my first time using Cufflinks.
I'm trying to get a fasta corresponding to the RNA-seq from different tissues from D. suzukii. I first used hisat2 in order to align my reads against a reference, here the command I used in Hisat2 :
hisat2 -p 25 --max-intronlen 50000 -x /media/DATAPART1/Documents/suzukii_assembly/annotation/indexes/suz_canu3 -1 '/media/DATAPART1/Documents/suzukii_assembly/annotation/EST_evidences/suz_ant.R1.fastq.gz','/media/DATAPART1/Documents/suzukii_assembly/annotation/EST_evidences/suz_ovi.R1.fastq.gz','/media/DATAPART1/Documents/suzukii_assembly/annotation/EST_evidences/suz_pro.R1.fastq.gz','/media/DATAPART1/Documents/suzukii_assembly/annotation/EST_evidences/suz_tar.R1.fastq.gz' -2 '/media/DATAPART1/Documents/suzukii_assembly/annotation/EST_evidences/suz_ant.R2.fastq.gz','/media/DATAPART1/Documents/suzukii_assembly/annotation/EST_evidences/suz_ovi.R2.fastq.gz','/media/DATAPART1/Documents/suzukii_assembly/annotation/EST_evidences/suz_pro.R2.fastq.gz','/media/DATAPART1/Documents/suzukii_assembly/annotation/EST_evidences/suz_tar.R2.fastq.gz' | samtools view -Sbo suzukii_rnaseq_ibdm.bam -
Then I sorted the bam file using this command :
samtools sort /media/DATAPART1/Documents/suzukii_assembly/annotation/suzukii_rnaseq_ibdm.bam /media/DATAPART1/Documents/suzukii_assembly/annotation/suzukii_rnaseq_ibdm_sorted.bam
Something that bothered me at this step is, first it took a long time but well, and second, the sorted file was shorter than the original file (37.2G -> 26.2G), and at this point I don't know if this was normal, but I tried anyway to launch Cufflinks using this sorted file.
At first, I used this command :
cufflinks --library-type fr-firststrand -o /media/DATAPART1/Documents/suzukii_assembly/annotation/cufflinks/hisat2 -p 25 /media/DATAPART1/Documents/suzukii_assembly/annotation/suzukii_rnaseq_ibdm_sorted.bam
But I get strange errors like this :
BAM record error: found spliced alignment without XS attribute
I read a post on Biostar (don't remember which one), and the problem could be the --library-type, so I tried with this option on :
--library-type fr-firststrand
And then it seems to work for a moment, but it froze at 14% of the sequence (I tried 2 times) :
You are using Cufflinks v2.2.1, which is the most recent release.
[17:08:27] Inspecting reads and determining fragment length distribution.
Processing Locus tig00000383:187849-193055 [*** ] 14%
Cufflinks didn't stop, it didn't throw any error, it is still doing stuff apparently (I'm watching it with htop), but it run all nigh, and I think that something went wrong...
So, is there something wrong with this particular locus ? Is the sorting step a problem ? Or I've just used the wrong library type ?
Thanks for your help !
Cheers,
Roxane
Small comment: It's indeed normal that a sorted bam is smaller than a non sorted bam, because better compression is possible with sorted records. Perhaps someone else will have a more intuitive explanation for this but it's nothing to worry about.