So, I had my WGS done by Veritas Genetics and it was transferred to the PGP. I will not reveal the account, however, for privacy reasons. Having difficulty finding good software, I came across NCBI Genome Workbench, which seemed complete and accessible.
The PGP account gives a VCF file, as well as several BAM files. As the VCF file is smaller, I tried that first. I tried loading it in Genome Workbench and it went through the process, but after 20 minutes it crashed my computer and I was forced to reboot my system. I tried 3 more times with the same result. I cannot find any sort of troubleshooting documentation for gbench. So it looks like using VCF files was out. For whatever reason, gbench can't work with them.
So I downloaded all the BAM files, and tried working with that. It says I needed samtools to index them, so I browsed to samtools and selected it. And then it asks me some more questions, I just told it to save the graph files. It asks me what indexes I wanted to load, so I just selected all of them. This is where the first issue appears, however, as it seems like all the indexes are repeated for every BAM file I'm adding (1 per chromosome). It seems like perhaps gbench expects everything to be in 1 BAM file. But either way, I thought it was just a fluke or something and continued. And then it ran. After maybe 30 minutes, it was done. I then tried to open the graph files in the Graphical Sequence View file, as the tutorial said to do, but now it says, "Graphical view: failed to retrieve sequence for id lcl|chr1."
It is really difficult to work with these kinds of files, apparently. Apparently, something happened with gbench. Either I did something wrong or it was buggy.
Can anyone help? Neither Help nor Tutorial provides any troubleshooting that I can make sense of.
You should use a genome browser like IGV to view the BAM files. BAM files need to be sorted/indexed with samtools. I am not sure what OS you are using but you would need access to unix (or cygwin/virtual machine with unix on Windows) to run samtools.
Edit: Were you following this tutorial when working with NCBI Genome Workbench?
Yes, I used that Tutorial.
I got to here when it said it couldn't load the sequence:
I have samtools-1.3.1 for Windows. I did not do anything with it, though. It was gbench that did; it just wanted me to tell it where the program was.
You would need to make sure that you are using the exact genome reference build that the PGP project used for generating that bam file with IGV (or any other genome browser). Unfortunately different genome builds use different nomenclature for reference genomes (e.g. chr 1 or 1 UCSC/Ensemble). The reference chromosome identifiers have to exactly match when you try to open a bam file with a genome viewer.
Note: The id for chromosome you posted in the original post seems some other variant so you need to figure out what reference build/sequence was used and try to get that from PGP for use in IGV.
Could you help me for a moment. I make sure to load the chr#.bam file and to select the combobox ("Select a chromosome to view" in the ToolTip) for that same chr# when I look at a segment. I think this is correct.
Now, as for the reference genome. It defaults to Human hg19. When I look at the "RefSeq Genes", however, it's always all blank. What does this mean? Does this mean the correct reference genome is being used?
And another question, how do you view the allele for both pairs? If the allele is heterozygous, then does that show in that field that is above the allele in "Sequence"? I see some of the bases in a field immediately above the main allele.
2017-01-23 18:33 GMT-08:00 genomax2 on Biostar mailer@biostars.org:
You can view the VCF file in IGV as well. Use this help page.
This is on chromosome 1 (both the chr1.bam is loaded and chr1 is selected as the chromosome).
Notice the 3rd allele from the left, which is A, and then above it is C. Does that mean 1 copy of the allele is A and the other is C?
Also, note how the RefSeq Genes are blank, even though Human hg19 is loaded (which is the default).
So is this right, or can I still not be sure I'm using the right reference?
The "blank" part you are referring to represents sequence that has perfectly matched the reference. You can access various display option in the right-click menu (and choose to show all bases if that is what you want).
If there is a base that differs from the reference then it would be shown (like the example of C above). Since only one read seems to have that C it could either be a sequencing error in that particular read or a real allele. Most likely the former.
And how does it show the base pairs, so you can know what they are when the allele is heterozygous?