I do the exome sequencing and some of them do not mapped to human genome, I read some paper that said they mapped them to virus genome, I wonder if exome data can do the same thing? I know that many non-coding data will miss, but I just want to see if there are some virus genome in exome data.
ADD COMMENT
• link
updated 13.2 years ago by
ALchEmiXt
★
1.9k
•
written 13.2 years ago by
Liyf
▴
300
1
Entering edit mode
If you have the sequence of your virus, you can certainly map back to it and let us know about the results. Is there something that is stopping you from trying?
In fact, I am very busy in other research. I am not familiar to use BWA, I even not use once.This idear comes to me just when I read some whole genome sequencing paper. So if you all say that it is wasting time, I shall do not try. But as you all think it is worth to try, I will do it and when the result is out, I will tell you. Thanks. Maybe, it will last a long time, because I am doing other things right now and the data is not ready completely.
What fraction of reads are you talking about? How did you isolate the molecules you are sequencing? (i.e. was it exome capture with specific probes or was it an oligo-dT based method?). If you expect all of your reads from a human sample to map to the human genome you will never be happy because there are too many opportunities for DNA from other sources to be present in your samples. It can come from viruses, or other pathogens or non-human symbionts. Depending on how your cells were prepared it can come from other organisms that were in the media, or for instance if it was a tissue sample other organisms that may be associated with the tissue (care to guess how many critters are on the surface of your skin?). It can come from contaminant DNA that comes along with some of the enzymes used in library preparation. It can come from other molecules in the lab that your lab mates are studying (often people find sequence reads from genes studied in the lab popping up in their samples - contamination artifacts).
However, in most cases these contaminants will make up a small portion of the reads, and so can usually be ignored as par for the course. However if your goal is to detect which viruses are present in human samples, then why not add a virus mapping step to your alignment pipeline?
i do the whole human exome sequencing. Because the disease is related to virus infection, so I also want to map it to virus genome.
What is more, I am just afraid that exome is not continuous, even there are virus genomes, it will split up, and can not map back to virus genomes.
Thanks.
How about adding the contaminant candidates (e.g. your virus) as 'additional chromosomes' to your
reference genome and then mapping? I would also add 'usual contaminants' and phiX.
If the contaminant genome is circular, you may need to add 200bp from the beginning of the sequence, to the end (in order to be able to map reads at the end of the sequence) .
If you are using bwa to map, make sure that the total length (of the reference) is not over 4GB.
Not sure how the exact experiment is en how you did the mapping but quite some data can contain artefacts, or contaminants as indicated above. From a representative sample you can easily do a bowtie mapping to potential contaminants or to some viral sequences you may have around.
I also strongly suggest you to map againast a so-called contaminants database of sequences (commonly used adapters and such) since often sequence data is full of them (if you are unlucky).
You can have a look at these tools that basically use bowtie to get the idea; fastq_screen
and fastQC.
If you have the sequence of your virus, you can certainly map back to it and let us know about the results. Is there something that is stopping you from trying?
In fact, I am very busy in other research. I am not familiar to use BWA, I even not use once.This idear comes to me just when I read some whole genome sequencing paper. So if you all say that it is wasting time, I shall do not try. But as you all think it is worth to try, I will do it and when the result is out, I will tell you. Thanks. Maybe, it will last a long time, because I am doing other things right now and the data is not ready completely.