Using sam-dump in SRA toolkit to download chr20.bam from NA12878
0
0
Entering edit mode
7.6 years ago
Covux ▴ 10

Hey,

I am trying to get aligned data from a NA12878 sample.

https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR1976036

I am just interested in Chromosome 20.

I tried downloading a file directly but after a while the downloading just stops and I get an message that the downloading failed. I do not get a reason why.

after many attempts I tried the SRA tool-kit.

but I do not manage to get it working.

I use the following command line

sam-dump --aligned-region 20:1-16444167 --output-file SRR1976040_chr20.sam SRR1976040

When I use the command it is doing something for a couple of seconds.

When I look into the file I see that the file does have data in it. but the file is only 948 bytes and I can't work with it.

I am new to using this kind of software and tried googeling but could not find a answer that I understand and could use.

maybe someone here could help me out.

Regards, Covux

  • edit

I am now downloading the whole bam file in the download section (NA12878_WGS_possorted_bam.bam) This file is 120 Gigs.

So far that file is downloading around 2,5Mbit/s

I also tried another download for just Chr20. this file is downloading around 50kb/s. and has an unknown amount of time left to be completed.

  • Edit 2

someone showed me the correct command line to use.

sam-dump --aligned-region chr20 --output-file SRR1976036_chr20.sam SRR1976036

I used " 20" instead of "chr20"

however while running the task I get the following message.

" sam-dump.2.8.2 sys: timeout exhausted while reading file within network system module - mbedtls_ssl_read returned -76 ( NET - Reading information from the socket failed "

but the command line is still running.

Edit 3

now using the correct commandl line I encounter the next problem.

2017-04-13T07:52:59 sam-dump.2.8.2 sys: error unknown while reading file within network system module - mbedtls_ssl_read returned -76 ( NET - Reading information from the socket failed )

Fun part is I have the same error message when I use the fastq-dump command. The IT guy is now also looking into it.

Edit 4

I now asked just for a fragment of chr20.

Using the following command

sam-dump --aligned-region chr20:2500000-2600000 --output-file SRR1976036_chr20.sam SRR1976036

I don't encounter any problems. So it could be very well that the size of the file is after all a problem as people mentioned here below.

edit 5-

the IT guy looked into it.

he said there is an issue with the bandwith because there is a lot of traffic on the network of ncbi. for that reason I get a lot of time out errors.

sratoolkit sam-dump • 6.4k views
ADD COMMENT
1
Entering edit mode
sam-dump --aligned-region chr20 --output-file SRR1976036_chr20.sam SRR1976036

That command line works for me (sratoolkit v.2.8.0).

@SQ     SN:chrUn_gl000249       LN:38502
@RG     ID:None SM:4754
@PG     PN:bwa  ID:bwa  VN:0.7.10-r789  CL:bwa mem -p -t 4 -M -R @RG\tID:None\tSM:4754 /mnt/opt/refdat
a/fasta/hg19/hg19.fa /mnt/analysis/marsoc/pipestances/4754/PHASER_SVCALLER_PD/4754/1004.0.1-0/PHASER_S
VCALLER_PD/PHASER_SVCALLER/_ALIGNER/TRIM_READS/fork0/chnk0/files/default.fastq
@PG     PN:10X longranger/attach_bcs    ID:attach_bcs   VN:1004.0.1
@PG     PN:10X longranger/mark_duplicates       ID:mark_duplicates      VN:1004.0.1
@PG     PN:10X longranger/attach_phasing        ID:attach_phasing       VN:1004.0.1
630013850       163     chr20   59993   60      88M     =       60001   96      GTGACTCAGATCCAGAGGTGGA
AGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACAG      B<B7BBFFFIFBFBF<B<FBB7BFFFFFBB
F'B<BFFFBF<<BB<F<BFF7<BFBBFFB<<BBBB<7<B<<<<BBB<<<'77BB'<B<      RG:Z:None       BX:Z:TATGCGAGGCTGTG-1NH:i:1   NM:i:8
ADD REPLY
0
Entering edit mode

genomax, thanks for your answer.

I think i have to look to my connection from the server. atleast I know the command line is oke now.

could the fact that I am from the EU be a problem?

i though is read something like that once on the NCBI website.

ADD REPLY
0
Entering edit mode

For the first problem (downloading stops), this may be caused by your settings. Sometimes there is a download size limit (ask your IT guys to fix that, I had that problem when I started my position here).

ADD REPLY
0
Entering edit mode

Which settings do i have to look in? in my browser or somewhere else?

ADD REPLY
0
Entering edit mode

Probably admin settings of your server (I am not an IT guy), like I said I had to ask my IT department to remove the size limit of downloading.

ADD REPLY
0
Entering edit mode

Are you sure, that your desirable file is aligned in this positions of chr20? If read is derived from another region, it's normal, that you got such a small file size – there is approximately no data. Downloading speed can occur because of ncbi-server's problems, like a lot of users and so on. I'm not IT guy, but got the same problem (my IT guys got the same too). Also, server can be located anywhere, so, the speed of your downloading can be such bad because of your location.

ADD REPLY
0
Entering edit mode

I work in the Netherlands. the downloading is usually very good.

i am also downloading from the FTP server of NCBI and i encounter no problem there.

i am not a 100% sure that the data is aligned to chr20. But i assumed that it is aligned and available because there is an option for downloading a SAM/BAM file for chr20.

currently I am also downloading the whole BAM file ( 120 gigs) maybe when that one is complete is can look how the data is aligned to chromosome 20 right?

ADD REPLY
1
Entering edit mode

currently I am also downloading the whole BAM file ( 120 gigs) maybe when that one is complete is can look how the data is aligned to chromosome 20 right?

Yes you can export data for chr20 using samtools view.

ADD REPLY
0
Entering edit mode

ahh thx for the tip.

however what if i want to download chr20 from NCBI directly?

it is a bit of a waste of space and time if I first have to download 120 gigs of I only want a section of it.

ADD REPLY
0
Entering edit mode

Why not download just the BAM for chr20 from the NCBI record page you have linked above?

Click on Alignments tab -->Select chr20 --> Change file format to BAM --> Save to File.

ADD REPLY
0
Entering edit mode

that doesn't work.

when i start downloading the file the file stops downloading.

quote:

" I tried downloading a file directly but after a while the downloading just stops and I get an message that the downloading failed. I do not get a reason why."

ADD REPLY
0
Entering edit mode

It seems to be working for me (did not download the entire file). Perhaps there is some local firewall (or other) restriction on your end.

ADD REPLY
0
Entering edit mode

Yes, that's what I am trying to tell him. When I started dowloading big files, I found out that there was a restriction of 20GB. I had to complain by the IT department to remove the restriction, which they then did.

ADD REPLY
0
Entering edit mode

The file for Chr20 is not 20Gigs but smaller ( the whole genome is 120 gigs) the BAM file for chr 20 is around 1 gigs compressed. I tried downloading the same file at my place last night and everything went fine.

I also used an incorrect command line but solved that later on with the help from a college.

With the correct command line I do get error messages that probably has to do something with the connection. the IT guy is now looking into it. :)

edit-

i am now struggling with:

2017-04-13T07:52:59 sam-dump.2.8.2 sys: error unknown while reading file within network system module - mbedtls_ssl_read returned -76 ( NET - Reading information from the socket failed )

ADD REPLY

Login before adding your answer.

Traffic: 2108 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6