I mapped reads for SNP calling to a full 10Gb reference. I've reduced this reference to only those contigs that have SNPs of interest mapped to. I would now like to reduce the bam file so I can easily visualise in IGV.
From samtools the method of doing this seems to do samtools view on the bam file with the list of locations. My question is how to generate this file? Could I adapt a samtools faidx of my reduced contig reference file which has output like below to a bed file format?
scaff1 3970 29 79 80 scaff2 8501 4079 79 80
The first column is the name of the scaffold, then length of contig?, then start position? If this is the case could I not just move the third column to column position 2 and add column 2 and 3 for the end position column for a bed format file?
About the fai file Can you please tell me where I find information about .fai file format?
It's not clear what you want to do with this file . You can reduce the calling of mpileup with the option
-l file.bed
; Can also reduce the size of the BAM withsamtools view -L region.bed your.bam
Yes sorry it was samtools view, realised this in a lecture. how do you create the bed file?
Looks like the -t setting can use the index.fai file, so I'll give that a try.
The -t option will replace the header in the output when you view the file, which is probably not what you want.
I try this:
but the file is the same size as what it started so don't think it is working.
The only thing I can think of is my bed file is wrong?
(name_of_scaffold) (0) (length)
How many scaffolds are in the bed file you're passing to samtools and how many are in the assembly? The bed file should have a much smaller number of them.
approx 600 scaffolds in bed file and 10 million in reference that was used to map to bam file.
That's odd. Does the same thing happen if your bed file contains just 1 or 2 contigs? BTW, your two commands could be merged to
samtools view -L rk1.bed -b -o A6_genome.sort.rmdup_filtered.bam A6_genome.sort.rmdup.bam
, which will also probably be a bit faster.I got round it by converting the bam file to sam and using tablet to visualise with my reduced reference which worked fine. I'll keep going trying to figure it out because I can't see why it doesn't work.
Thanks, please do report back here in the comments if you find out why that's not producing the expected behaviour.
If you just want to be able to visualize the alignments in IGV, then just sort and index the BAM file. IGV won't need to load the whole thing into memory.
Doesn't load, not sure if bam file is too big but only 1.5 gb.
Did it give an error message (start IGV from the command line if you don't normally)?