Hello.
My aim is to find out correlated mutations within a single paired reads. For example, I need to know if the sequence ID X, that has mutation at position lets say 800, also has a mutation at position at 1100. So I managed to get bam and sam files containing only reads that span the regions I am interested in. I have the fasta sequences and I used Translator X to translate those into protein fasta.
Now I know what I was expecting to get back and when I loaded these into Clustal Omega to get an alignment. This doesnt work that well. There are gaps and sequenced that were just badly translated. I looked at the badly translated sequences in the fasta file I get from the Translator X and they are already there. When I looked at the nucleotide fasta, these are fine. Is there a way I can feed my reference sequence into an alignment tool so I can get the protein sequences translated and aligned correctly?
Does anybody have any experience with this type of analysis?
I don't fully understand your question.
If you have a reference sequence and your reads are covering the region you are interested in completely why is there a need to look at protein translations?
Hi, I know there is a mutation present (sometimes) in some of the reads. I also know that there is a mutation (sometimes again) a bit further down the genome. I want to see if that second mutation is only present when the first one is present. In other words, these mutations are hierarchical. I have the sam and bam file that only contains the reads that span both of the regions.
Now I just want to somehow count either nucleotide (or protein) variants in those reads. Something like this:
etc.
I am just not sure how to go about it
Use bam-readcount to get this information.
Hi,
this only gives me a count at each position. I need to see if they are correlated. Like this:
etc.
Sorry to bother you, but do you have any other suggestion? This one wont work due to the reasons below.
You can probably do LD/Correlation analysis using PLINK (not my area of strength). This is only a pointer for you to consider.
Do you specifically want to find reads which contain multiple mutations, or are you just interested in co-localised mutations?
Hi, I need to know that the mutations came from a single paired read. There are particular regions I have in mind.
If the pair of reads you are looking at flanks the regions of interest then they represent a fragment that spans the region. Unless you have reads that go through the region of interest you have not way of confirming that a particular mutation is present in those fragments.
You will need to use sanger sequencing to confirm that the mutation exists using the original sample.