Downloading BAM files GEO/SRA
0
1
Entering edit mode
8.4 years ago
ilobelo ▴ 10

Hey, I need to download BAM files of breast cancer cell lines from GEO/SRA. For example I will use SRR925780.

I tried to do it in 2 ways:

  1. SRA run browser: http://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR925780. Where I need to download a separate file for each chromosome but the download is very fast (4 Gb in about 10 minutes) and the output file is a BAM file which means no other tool is needed.

  2. SRA toolkit, following their manual, I run this command:

    sam-dump SRR925780 | samtools view -bS - > SRR925780.bam

It takes about 3 hours to download and convert 100 Mb! The time diff is too big, I am wondering what am I doing wrong with the SRA toolkit ans samtools.

BTW I work with the latest SRA toolkit but the samtools version is old, it's the only one I found working for Windows: https://bow.codeplex.com/releases

So my questions are:

  1. Could it be the fastest way to download BAM files is manually via SRA run browser ?
  2. Is there a way to run a newer version of samtools on Windows?

Thanks!

bam sra geo samtools sratoolkit • 12k views
ADD COMMENT
0
Entering edit mode

You may be better off downloading the fastq files and doing the alignments yourself. EBI-ENA has the fastq files available directly without having to use SRA toolkit (e.g. http://www.ebi.ac.uk/ena/data/view/SRR925780 ).

That said if you are restricted to using windows then all bets are off.

ADD REPLY
0
Entering edit mode

Some SRA runs are based on custom reference sequences. Is it possible to retrieve the reference FASTAs from SRA and align reads to them to create BAMs? Otherwise you would need to retrieve the BAMs directly right?

ADD REPLY

Login before adding your answer.

Traffic: 2039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6