Hey,
I am trying to get aligned data from a NA12878 sample.
https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR1976036
I am just interested in Chromosome 20.
I tried downloading a file directly but after a while the downloading just stops and I get an message that the downloading failed. I do not get a reason why.
after many attempts I tried the SRA tool-kit.
but I do not manage to get it working.
I use the following command line
sam-dump --aligned-region 20:1-16444167 --output-file SRR1976040_chr20.sam SRR1976040
When I use the command it is doing something for a couple of seconds.
When I look into the file I see that the file does have data in it. but the file is only 948 bytes and I can't work with it.
I am new to using this kind of software and tried googeling but could not find a answer that I understand and could use.
maybe someone here could help me out.
Regards, Covux
- edit
I am now downloading the whole bam file in the download section (NA12878_WGS_possorted_bam.bam) This file is 120 Gigs.
So far that file is downloading around 2,5Mbit/s
I also tried another download for just Chr20. this file is downloading around 50kb/s. and has an unknown amount of time left to be completed.
- Edit 2
someone showed me the correct command line to use.
sam-dump --aligned-region chr20 --output-file SRR1976036_chr20.sam SRR1976036
I used " 20" instead of "chr20"
however while running the task I get the following message.
" sam-dump.2.8.2 sys: timeout exhausted while reading file within network system module - mbedtls_ssl_read returned -76 ( NET - Reading information from the socket failed "
but the command line is still running.
Edit 3
now using the correct commandl line I encounter the next problem.
2017-04-13T07:52:59 sam-dump.2.8.2 sys: error unknown while reading file within network system module - mbedtls_ssl_read returned -76 ( NET - Reading information from the socket failed )
Fun part is I have the same error message when I use the fastq-dump command. The IT guy is now also looking into it.
Edit 4
I now asked just for a fragment of chr20.
Using the following command
sam-dump --aligned-region chr20:2500000-2600000 --output-file SRR1976036_chr20.sam SRR1976036
I don't encounter any problems. So it could be very well that the size of the file is after all a problem as people mentioned here below.
edit 5-
the IT guy looked into it.
he said there is an issue with the bandwith because there is a lot of traffic on the network of ncbi. for that reason I get a lot of time out errors.
That command line works for me (
sratoolkit v.2.8.0
).genomax, thanks for your answer.
I think i have to look to my connection from the server. atleast I know the command line is oke now.
could the fact that I am from the EU be a problem?
i though is read something like that once on the NCBI website.
For the first problem (downloading stops), this may be caused by your settings. Sometimes there is a download size limit (ask your IT guys to fix that, I had that problem when I started my position here).
Which settings do i have to look in? in my browser or somewhere else?
Probably admin settings of your server (I am not an IT guy), like I said I had to ask my IT department to remove the size limit of downloading.
Are you sure, that your desirable file is aligned in this positions of chr20? If read is derived from another region, it's normal, that you got such a small file size – there is approximately no data. Downloading speed can occur because of ncbi-server's problems, like a lot of users and so on. I'm not IT guy, but got the same problem (my IT guys got the same too). Also, server can be located anywhere, so, the speed of your downloading can be such bad because of your location.
I work in the Netherlands. the downloading is usually very good.
i am also downloading from the FTP server of NCBI and i encounter no problem there.
i am not a 100% sure that the data is aligned to chr20. But i assumed that it is aligned and available because there is an option for downloading a SAM/BAM file for chr20.
currently I am also downloading the whole BAM file ( 120 gigs) maybe when that one is complete is can look how the data is aligned to chromosome 20 right?
Yes you can export data for chr20 using
samtools view
.ahh thx for the tip.
however what if i want to download chr20 from NCBI directly?
it is a bit of a waste of space and time if I first have to download 120 gigs of I only want a section of it.
Why not download just the BAM for chr20 from the NCBI record page you have linked above?
Click on
Alignments
tab -->Selectchr20
--> Change file format toBAM
--> Save toFile
.that doesn't work.
when i start downloading the file the file stops downloading.
quote:
" I tried downloading a file directly but after a while the downloading just stops and I get an message that the downloading failed. I do not get a reason why."
It seems to be working for me (did not download the entire file). Perhaps there is some local firewall (or other) restriction on your end.
Yes, that's what I am trying to tell him. When I started dowloading big files, I found out that there was a restriction of 20GB. I had to complain by the IT department to remove the restriction, which they then did.
The file for Chr20 is not 20Gigs but smaller ( the whole genome is 120 gigs) the BAM file for chr 20 is around 1 gigs compressed. I tried downloading the same file at my place last night and everything went fine.
I also used an incorrect command line but solved that later on with the help from a college.
With the correct command line I do get error messages that probably has to do something with the connection. the IT guy is now looking into it. :)
edit-
i am now struggling with: