Question

Any user friendly way to find rare mutations in whole genome raw?

1

Entering edit mode

5.6 years ago

alekkolomiec-pvl ▴ 10

Is there any user friendly way to find rare mutations in the individual human whole genome sequencing raw data? (from Dante, 30x coverage).

To be more specific, I want to find mutations from this paper:

https://docs.google.com/document/d/1EkRMuD6J0-zyMY3MegKhz7Th0hzhzVXg6DWg2Hm6l2Y/edit?usp=sharing (it's very short, less than one page).

But I'm confused. In their paper they just talk about genes (RAD21, B3GAT2, SMC3, SCN11A , SCN5A, SCN9A, SCN10A, SCN11A, TRPA1), but not mutations? Can we find that diseased-mutations which they talking about in genes, which their list? Or is there not enough data in the paper/study for this?

And if there is not enough data, which data I need to request from authors?

Or those genes have kind a "gold standard sequences" and if that sequence different from standard - there is "diseased" gene? (I have only very basic genetics and bioinformatics knowledges)

sequencing genome sequence next-gen snp • 2.8k views

ADD COMMENT • link updated 5.6 years ago by swbarnes2 15k • written 5.6 years ago by alekkolomiec-pvl ▴ 10

0

Entering edit mode

Please do not cross-post to BioStars , Bioinformatics SE and Reddit:
https://bioinformatics.stackexchange.com/questions/11100/any-user-friendly-way-to-find-rare-mutations-in-whole-genome-raw
https://www.reddit.com/r/bioinformatics/comments/elivcf/any_user_friendly_way_to_find_rare_mutations_in/

ADD REPLY • link 5.6 years ago by Emily 24k

0

Entering edit mode

I am currently preparing a manual how to analyse your WGS DNA yourself ( Call for clinical bioinformaticians who work with humans (discussion) - germline genome analysis ), however, your task IS NOT standard and it has to be done by a qualified pair of bioinformatician-data interpretation specialist. I am going to cover only standard tasks. As a PhD in this field myself, I went to a Medical Doctor when I found a "shitty" mutation - so I guess you see that, when we speak about health impact of mutations, it is not games anymore.

Dante Labs provides raw vcf files with point mutations so you do not need to "find" them, you need just to interpret them.

Actually I really recommend you to read the first 2 parts of my manual (links are there, inside the post). Let me know if something is not clear in the description and I will add clarifications.

ADD REPLY • link 5.6 years ago by German.M.Demidov ★ 3.0k

score 3 · Answer 1 · 2020-01-07

3

Entering edit mode

5.6 years ago

ATpoint 89k

What exactly do you want to do? Did you sequence your own genome and now try to self-diagnose yourself?

Whole-genome data need to be aligned to a reference genome followed by a process called variant calling. Please browse google and this forum for suggestions towards a workflow. This typically is done on a powerful workstation or server as the size of 30x WGS requires some computational power. There are platforms such as Galaxy to perform analysis in an interactive way (https://usegalaxy.org) that might be interesting for you, not sure if it handles data of that size.

Towards this paper snippet, you will need to check the DOI for the actual journal link and then see if they provide any mutational data. This is probably just a summary, try to get the full paper. If it is restricted by paywall portals like sci-hub might help.

These genes are not special by themselves, maybe interesting for the topic they work on. There is nothing like a "gold-standard gene" or "sequence". If a gene is disease-associated depends on the biological content and cannot generally be answered. One particaular gene might be a potent disease driver in one cell type and fully irrelevant in a second one. You will need to read the entire paper to find out why they looked at these specific genes.

I strongly suggest to get expert help for whatever you are trying to do. Genetics and bioinformatics (or science per se) can have quite some pitfalls if one naively dives into any analysis without proper background. Please, before contacting the authors (given that this entire topic does not seem to be your core expertise) talk to an expert, make sure the question you ask is sound, terminology and content fulfills science standards and actually makes sense.

ADD COMMENT • link 5.6 years ago by ATpoint 89k

0

Entering edit mode

Thanks alot for the answer.

I'm a medical student and we have rare case and I'm trying to help in collaboration with physicians. The patient is already diagnosed. The problem is that this disease is treated only if aetiology is found. More than half of the reasons are genetic. There is a series of genes (about 30, these are different studies, I gave only one paper as an example) that can cause this disease. The problem is that in our location, geneticists are not directly connected to bioinformaticians. Bioinformaticians do not make diagnoses, and geneticists cannot solve this purely technical problem, a vicious circle is obtained. The problem is further complicated by the fact that rare mutations are not covered by insurance and the optimal solution in this case is WGS / WES (it also raises the issue of the relevance of personalized medicine nowadays).

I have the opportunity to turn to bioinformatician for help, but I have absolutely no background in bioinformatics and the purpose of the initial post, incl. also in trying to formulate a task for bioinformatician.

Based on your answers, I will try to formulate the questions more specifically:

How long can this problem (the search for rare mutations already described earlier in studies in about 30 genes) be solved by a person without a background in bioinformatics, including the time spent on acquiring knowledge? A week? Month? Six months?
How long does it take for bioinformatician? (junior)
You wrote about server... How much server time is required to solve a similar problem? (at least approximately) (medium server, i7, 64 GB DDR4, SSD, GTX 1080). Will there be a big time difference for WGS compared to WES?

I am writing these questions in order to find out the possibility of solving this problem in the current conditions and limitations.

ADD REPLY • link 5.6 years ago by alekkolomiec-pvl ▴ 10

1

Entering edit mode

I cannot say that I fully understand the situation but a couple of thoughts:

Why not letting the experienced physicians and bioinformaticians do their expert work? Why do they need a student as a mediator here? Whenever a patient is involved you should consider the legal consequences of your doing. Things need to be legally approved and procedures and involved individuals must be tested/licensed and verified. You probably try to help a patient, which is honorable, but please be sure to always bring together experts so they can interact instead of trying to work in field you have little/no knowledge of.

This seems to be about finding some mutations in a WGS sample. So give the data to a clinically-experienced bioinformatician, who will extract you the variant information. I am not familiar on how things are handled in a clinical context in detail but probably such variants that are treatment-relevant (if they indeed are) need to be confirmed by a second clinical-approved method, such as Sanger sequencing in a certified facility. Not sure if WGS from a company is clinically-approved, probably not.

Hardware: What you describe is a PC, not a server. WGS takes many hours on multicore servers, so don't even try with this setup. Actually this all reads like you are dealing with topics outside your competence. Again, I say this in your best interest and with 0% intend to be offensive, but please let expert bioinformaticians and physicians handle that work. If they really need to collaborate with each other to help this patient then arrange a meeting with the relevant PIs and physicians. Discuss the issues and find a solution together. Be aware things may have legal implications. Please talk to your superiors. An online community is not the place for this. This must be handled by the involved experts of each discipline.

ADD REPLY • link 5.6 years ago by ATpoint 89k

0

Entering edit mode

Towards this paper snippet, you will need to check the DOI for the actual journal link and then see if they provide any mutational data. This is probably just a summary, try to get the full paper. If it is restricted by paywall portals like sci-hub might help.

Can you reveal this point in more detail? Are there any services that have a single database of related/connected genes with DOI? Or do you mean manually? (in fact, the link that I indicated in the original post is the full version of the paper, not abstract. And as far as I understand, they did not point to specific mutations, but only to genes, and a question arose in this connection).

One particaular gene might be a potent disease driver in one cell type and fully irrelevant in a second one

They used WES, without specifying which biomaterial was used. Isn't biomaterial the default for - WES - saliva?

These genes are not special by themselves, maybe interesting for the topic they work on. There is nothing like a "gold-standard gene" or "sequence".

I will try to rephrase the question:

Suppose these genes in 99% of the population have a certain sequence. Suppose there is a single database, which indicates that 99% of the population have this sequence. If the sequence differs from the "standard", then at least we should pay attention to this fact and perhaps this will turn to the fact that this mutation will be disease-related. Or is this idea fundamentally incorrect?

You will need to read the entire paper to find out why they looked at these specific genes.

In the original post there was a link to the full version of the paper. They did not specify this in the full version.

Here is a link to the same publication in the original (a clipping from the journal, it also contains other articles, which is why I posted only the necessary one in the original post): https://drive.google.com/file/d/1fCGlwZJMdy6N8hVotm7obwjIFoL25_or/view?usp=sharing (hope this link not violate this website rules. If it is, moderators can delete this link)

I search the issue more deeply and realized that the confusion probably arose because this paper was a summary of the conference

(I google name of the paper and found note on Researchgate, which stated that it is Conference Paper, from Conference: Digestive Disease Week).

And in conference they talk about specific variants: https://i.fiery.me/zgZij.png https://i.fiery.me/bH4ES.png

Hope this clarify situation.

Thanks again for the answers.

ADD REPLY • link 5.6 years ago by alekkolomiec-pvl ▴ 10

0

Entering edit mode

Please don't post google drive links. Visitors to this site will have no idea what that link may contain. Please find the original paper (you can search PubMed or Europe PMC) and post that link to replace the links above).

ADD REPLY • link 5.6 years ago by GenoMax 153k

0

Entering edit mode

I subscribe to genomax point of view. Please read this manual https://link.medium.com/gaRFlUQ562 and the next part too. I tried to explain the concepts there. This is not a manual "how to make diagnosis yourself", it is manual "how not to loose mind when your genetic councellor appointment is in 6 months and you have" high risk" variant", however, it contains the theoretical minimum for understanding how it works.

ADD REPLY • link 5.6 years ago by German.M.Demidov ★ 3.0k

0

Entering edit mode

Hardware: What you describe is a PC, not a server. WGS takes many hours on multicore servers, so don't even try with this setup.

Hours it’s not a lot, actually. I tried to find out if it would take many days / weeks / months. We have a free machine (configurations above), which can be allocated up to a month.

And are we really talking about the same thing? The task is: there is a VCF file, we need to find these variants: https://i.fiery.me/bH4ES.png https://i.fiery.me/zgZij.png (or confirm their absence)

ADD REPLY • link 5.6 years ago by alekkolomiec-pvl ▴ 10

0

Entering edit mode

Why not letting the experienced physicians and bioinformaticians do their expert work? Why do they need a student as a mediator here?

This is part of my "in field" student work.

bring together experts so they can interact instead of trying to work in field you have little / no knowledge of.

I do it in collaboration with physicians.

This seems to be about finding some mutations in a WGS sample. So give the data to a clinically-experienced bioinformatician, who will extract you the variant information.

As I said before, in our university (and clinics on university base) geneticists and bioinformaticians are not directly related. Also, the bioinformaticians that work for us are not clinically-experienced, so now I’m kind of a connecting link between physicians and bioinformaticians and trying to make a specification in technical language (because I have some more technical background than physicians ), and the purpose of the initial post, incl. also in trying to formulate a task for bioinformatician or do it myself, it it’s not too complicated.

In fact, this is a new area for us, in which we are now writing a small scientific paper

ADD REPLY • link 5.6 years ago by alekkolomiec-pvl ▴ 10

1

Entering edit mode

The problem is that in our location, geneticists are not directly connected to bioinformaticians. Bioinformaticians do not make diagnoses, and geneticists cannot solve this purely technical problem, a vicious circle is obtained.

This is precisely the reason a group of experts participate in these types of genomic/diagnostic case management. This group generally includes physicians, geneticists, genetic counselors and bioinformaticians. The group will collectively discuss the results to consider the possibilities and then come up with a strategy. No one person is really in a position to solve this problem. Legally only the physician has the right to authorize the diagnosis and discuss it with patient. In many cases genetic counselors will do this in practice since one needs to be cognizant of emotional/social consequences that can result from the diagnosis.

The problem is further complicated by the fact that rare mutations are not covered by insurance and the optimal solution in this case is WGS / WES (it also raises the issue of the relevance of personalized medicine nowadays).

It is unclear what you mean by this. So does the insurance cover sequencing if a standard protocol/kit is being followed? Has this already been done which is why you now suspect that a "rare mutation" may be involved?

There is no guarantee that person you are referring to is going to have a mutation (rare or otherwise) in the list of genes you are referring to. If this is a multi-genic condition then it may be the cumulative effect of more than one mutation that could be resulting in the consequence.

Assuming proper consent from patient is obtained, technical part of prepping the samples and sequencing can be completed in a matter of couple of weeks. Even the analysis of the data should not take long. Interpretation of those results is where things can take a long time and may generate results that pose more questions than offer a solution. As of now list of SNP's that are "actionable/targetable" still remains relatively small.

ADD REPLY • link 5.6 years ago by GenoMax 153k

0

Entering edit mode

It is unclear what you mean by this. So does the insurance cover sequencing if a standard protocol / kit is being followed? Has this already been done which is why you now suspect that a "rare mutation" may be involved?

Insurance does not cover WGS / WES, either. But for the patient it will be infinitely expensive to do standard tests for each individual gene, therefore, the possibility of WGS / WES considered; after specific disease-related mutations (if any) are identified, these genes will be separately analyzed in our clinic. WES is already used in clinical practice in Europe, for example, in Italy.

Has this already been done which is why you now suspect that a "rare mutation" may be involved?

This is done in other clinics. The problem is that the patient has checked almost all possible causes. Most of the genes that cause this disease are not covered by insurance. And these genes a lot.

There is no guarantee that person you are referring to is going to have a mutation (rare or otherwise) in the list of genes you are referring to

Of course everyone understands this. This is just an attempt.

If this is a multi-genic condition then it may be the cumulative effect of more than one mutation that could be resulting in the consequence.

Chronic intestinal pseudo-obstruction is not a multi-genic condition. But mutations in several different genes can cause it.

Assuming proper consent from patient is obtained, technical part of prepping the samples and sequencing can be completed in a matter of couple of weeks

Is it for bioinformaticians or for a person who has never done this before? (assuming we have only raw VCF and 0 knowledge in bioinformatics). The only thing which we need is find this particular variants (not interpret them, just find that patient have them): https://i.fiery.me/bH4ES.png https://i.fiery.me/zgZij.png

Thanks for the answers.

ADD REPLY • link 5.6 years ago by alekkolomiec-pvl ▴ 10

0

Entering edit mode

The only thing which we need is find this particular variants (not interpret them, just find that patient have them)

If all you are interested in is finding if the SNP's that you are interested in are in the VCF file you have then you should be able to use one of the tools mentioned in this thread: Extracting specific SNPs from vcf file

You will want to make certain you know which genome build your VCF was made from and that the list of SNP's you are interested in is from same build.

ADD REPLY • link 5.6 years ago by GenoMax 153k

score 1 · Answer 2 · 2020-01-10

If Dante returns a vcf, then yes, there's your "user-friendly" answer. That's the file that will contain the variants.

However, if you are asking if there's a user-friendly way to interpret what a particular variant means, a user-friendly way to intelligently change treatment based on that knowledge, no there isn't.

I see that the company presumes to advise people about their diets and fitness, those claims are, I'm afraid, garbage. Some clinical conclusions can be soundly drawn from some variants, but I wouldn't trust anything this company told me about them at all.

If you do this, maybe you will see a variant which other researchers have studied extensively, and have good evidence showing clinical consequences for having it, but there's a good chance you will see variants that no one has studied, whose clinical relevance no one can easily predict.

(Of course, there are also a lot of regulatory hurdles to be crossed...This company might not do the lab work to a diagnostic level of accuracy, their policies of data handling may or may not meed the standard for medical records, certainly you'd have to be trained in how to safely and securely handle someone's medical records.)