Hi All, I'm just starting an MSc in computational biology, after completing a BSc in computer science, so my questions might seem trivial or just utter nonsense. Anyhow, what I want to do is to look at the entire human genome and ask some questions (Hopefully good ones :) ). The data will be taken from the following sites:
I'm encountering difficulties from the get-go. As I learned the data comes in various formats: sra,bam,bed,wig. I thought that each of the files is a different coding for the extracted dna, under my experiment of interest. Well...where is the dna? To be more specific I am having trouble grasping the different file formats:
- bam is said to hold sequence aligned information, but the information as I understand is extracted from a single source and not compared to another, so what gives?
- bed holds a list of features (am I correct to understand features as genes?), and their locations on some chromosome. First of all, I want the entire genes that are active, not just to a specific chromosome, so how do I obtain that? Secondly, can I assume that the represented features are the active genes for the cell type?
- I have no idea what are the rest of the formats do, and how they suite my goals.
So, how can I receive an entire map of the active genes of a certain cell (like fetal brain cell)?
Thanks,
Thanks! That helped to clear things up :)