Hey Guys, I'm quite new to bioinformatics and i've been struggling to work with large bam files (80Gb) for the last few weeks. Can anyone tell me if it is possible to split these bam files into smaller ones by chromosomal location in Galaxy?
Hey Guys, I'm quite new to bioinformatics and i've been struggling to work with large bam files (80Gb) for the last few weeks. Can anyone tell me if it is possible to split these bam files into smaller ones by chromosomal location in Galaxy?
It might be possible in principle, but therefore you'd have to install Galaxy locally because the public instance will never allow you to upload 80G. Given that this would be more complex than to install Samtools and you might have to install Samtools in addition anyway, I'd go with Ketils advice.
Edit: please see comments below, it is possible to use 250GB. Unfortunately, I didn't find a tool that would allow to run samtools view with the given parameters to filter by chromosome, it might be possible somehow (e.g. by installing a new tool, or building a workflow bam2sam-> filter data -> sam2bam, would be very inefficient and use up 250 GB ) but I didn't find how that would work.
Thanks for your answer.
I'm getting my genome alignment from ftp and when I use the command
samtools view -bh ftp://ftp-mouse.sanger.ac.uk/current_bams/PWK_PhJ.bam chr11
it gives the error: segmentation fault
. But if I just try to print the same sequence, it works!
Do you know why i'm having this problem?
80GB files can be loaded into the public Galaxy. You will have to use FTP, but it can be done. The quota for registered users on usegalaxy.org is 250GB (see http://wiki.g2.bx.psu.edu/Main#User_data_and_job_quotas). Of course, you may run out of space pretty quickly, once you are into your analysis.
Thanks Dave, but even by loading to Galaxy, like Michael said, I don't know how it is possible to do that in Galaxy either. And it is true that I would run out of space. Michael, i'm using the last one, 0.1.18 right? i'm sorry for all these questions but in my lab none has ever analyzed NGS data and I'm starting it with no specific bioinformatics/programming background.
Leandro, please open a new question where you describe your samtools problem with an 80GB BAM file, and/or send an email to samtools-help@lists.sourceforge.net including exact command, command version, link to the data, output, the output of 'uname -a', and your machine memory specs. Maybe the authors of samtools have more insight.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
"samtools view" can easily extract the alignments for specific regions and convert them back to BAM, but I don't know how to do it in Galaxy.