For a study of VNTR copies I'm looking for high-coverage public-accessible WGS data such as 1KG,HGDP and SGDP. Since downloading the whole cram file if one just needs one chromosome can take a long time, I wonder if it is possible to subset the cram file to a single-chromosome bam file and just download this part of the data. If that is not possible I would like to ask for the fastest way to achieve this. Worst case would be download the whole file and subset locally but I would like to avoid this.
Best, Bernd
Great. Fantastic that samtools supports streaming remote files!
When I try to run the example from their webpage it fails with "Protocol not supported"
Make sure you are using a newer version of
samtools
. Works withv.1.21
which is the latest. Yoursamtools
is from 2021.So it looks like it was just the samtools version. Thanks for the hint.
Subsetting a streamed cram file still seems to take a very long time on my end.
I do this:
samtools view ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR323/ERR3239482/NA12775.final.cram -r chrX --reference GRCh38_full_analysis_set_plus_decoy_hla.fa
This already runs for roughly 20 minutes. I would assume this should be done in less than 10?
`
wrong parameter.
-r, --read-group STR ...are in read group STR
you want
use 'https://' instead of 'ftp://'