Hey, I need to download BAM files of breast cancer cell lines from GEO/SRA. For example I will use SRR925780.
I tried to do it in 2 ways:
SRA run browser: http://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR925780. Where I need to download a separate file for each chromosome but the download is very fast (4 Gb in about 10 minutes) and the output file is a BAM file which means no other tool is needed.
SRA toolkit, following their manual, I run this command:
sam-dump SRR925780 | samtools view -bS - > SRR925780.bam
It takes about 3 hours to download and convert 100 Mb! The time diff is too big, I am wondering what am I doing wrong with the SRA toolkit ans samtools.
BTW I work with the latest SRA toolkit but the samtools version is old, it's the only one I found working for Windows: https://bow.codeplex.com/releases
So my questions are:
- Could it be the fastest way to download BAM files is manually via SRA run browser ?
- Is there a way to run a newer version of samtools on Windows?
Thanks!
You may be better off downloading the fastq files and doing the alignments yourself. EBI-ENA has the fastq files available directly without having to use SRA toolkit (e.g. http://www.ebi.ac.uk/ena/data/view/SRR925780 ).
That said if you are restricted to using windows then all bets are off.
Some SRA runs are based on custom reference sequences. Is it possible to retrieve the reference FASTAs from SRA and align reads to them to create BAMs? Otherwise you would need to retrieve the BAMs directly right?