Hello,
I am trying my hand at RNA-Seq Analysis by going TopHat-> HTSeq-> edgeR
After TopHat, conversion of bam file to sam is recommended then to sort is recommended.
This is how I converted:
#convert bam file to sam file
samtools view -h -o out.sam in.bam
and this is how I sorted:
sort -s -k my_file.sam > my_file_sorted.sam
Although, my sorted.sam
file kept on giving me errors while running through HTSeq such as error when reading sam/bam file raised in count.py:84 for another file it would say 'seq' and 'qualstr' do not have the same length.
My HTSeq versions are uptodate. Also, I have PySam installed.
But, when I ran unsorted bam files or sam files through HTSeq then they were getting processed.
I want to know the significance of sorting a file and also why is HTSeq processing unsorted files?
my HTSeq command is as follows:
nohup time -p python -m HTSeq.scripts.count -f bam -s yes --idattr=ID hits.bam anno.gff3 &>mylog &
The only time I got an error for unsorted file was when they had separate pair end and single end reads in one file which separated by using
samtools view -bf 1 foo.bam > pair.bam
samtools view -bF 1 foo.bam > single.bam
Okay. I will try that too. Can it be used to sort bam files too?
Yes, it can. By the way, did you check if maybe your original files are already sorted? Some programmes sort files for you.
All ready sorted? as in? we ran through NGS QC then through TopHat...I don't remember sorting them...
How do I check for it?
If you get something like "SO:coordinate" then your files are already sorted.
With bam files:
How does samtools work?
the command I used was:
but it is giving me separate files in output:
Hi !
You just need to wait. samtools sort first creates several sorted files that are merged in one final file at the end of the process. It can take some time.
Yes it did! Thanks :)
Hi Carlo Yague, would you know how long samtools sort usually take? I feel like it would depend on how big my bam file is (437 GB), but is >3 days normal? Should I not be worried as long as the number of temp files is increasing/ there are temp output files? Thank you!
Hi, to be honest, I never tried to sort a file that large, but yeah, the larger the file, the longer it takes... In such a case, it becomes especially important to optimize the
-@
and-m
parameters of samtools (number of threads and available memory per thread). Hope this helps. Carlo.