I have several BAM files from different species for the chromosome where the gene I want to study is located, so how do I select the region where this gene is supposed to be located?
Thank you
I have several BAM files from different species for the chromosome where the gene I want to study is located, so how do I select the region where this gene is supposed to be located?
Thank you
Hello,
BEDTools: Intersect BAM alignments with intervals will perform this operation. For the second input use a BED/Interval file that contains the gene location *based on the same reference genome as the first BED input. UCSC could be a source for such a BED file, along with others. See the tools under Get Data** for built-in data fetching options or locate the data elsewhere and upload.
*If this not known, it is possible that the LiftOver function could be used to transform coordinates from one genome build to another (even across genomes). The output will be based on genome alignment concordance, so this is not necessarily the actual location of the gene in the target genome. That said, it can be a useful method to gene hunt. LiftOver can be used in Galaxy or at UCSC in web format or line-command using UCSC's data and tools. Only genomes from UCSC have this specific flavor of coordinate mapping data.
Another approach is to use Multiple sequence alignment (MAF) data and gene hunt from there. This is very useful if of the target genomes happen to be included in UCSC's suite of databases, but other MAF data can be used. By "gene hunt" I mean that the gene location is not known in other genomes, but the original gene of interest is, and that known gene is in a genome that has a track named Conservation at UCSC or can be found in another MAF data source that includes both genomes.
How-to access Conservation data: Extract the gene (specifically transcript(s)) in BED format from the UCSC Table Browser (a keyword search by gene name is possible). Use that as input with the tools in the group Fetch Alignments/Sequences. Some genomes have MAF data built in, but you can also upload MAFs to use with the tools (these accept input from the history, making each very flexible). The MAF does not have to come from UCSC, but must meet the MAF file format specification.
From either approach, downstream data conversion (extract sequences, etc) is all possible within Galaxy and most on the public Main server at http://usegalaxy.org. The Galaxy Main Tool Shed has even more tools for use in a local or cloud Galaxy.
For an example of using the Fetch Alignments/Sequences tools along with some downstream manipulations, see protocol #5 in this prior publication (includes a video): https://usegalaxy.org/u/galaxyproject/p/using-galaxy-2012
Few links:
Jen, Galaxy team (who also formerly worked at UCSC :) )
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Am I correct in guessing that you want the consensus sequence of the gene given the alignments? BTW, you might want to post this on the galaxy site.
I have the sequence of the model my samples are mapped on, I want to do variant calling on a certain gene, but don't know how to trim my files to the region where it's supposed to be.