Variant Discovery Paper To Repeat?
2
1
Entering edit mode
11.1 years ago
Alice ▴ 320

Hello, biostars! I would like to reproduce variant calling research, because i'm new in NGS-data analysis and need some training. I've already repeated tutorials with fake data, but it's not enough.
Last sunday i wasted on searching and googling, but could not find anything. Many SNP calling papers about transcriptome analysis, but i need wgs. Another papers interesting, but with >20Gb of reads (it's too much for me to analyse). And for some papers there are no data available on-line, or authors use their own scripts. So, I hope, you understand me. Can you suggest something?

snp samtools bowtie2 vcftools variant-calling • 2.8k views
ADD COMMENT
0
Entering edit mode

Might be best to browse the projects on the SRA, since you can easily find if the project was published, and if they are doing variant calling. That way you are guaranteed to have the reads available to you!

ADD REPLY
0
Entering edit mode

Thank you! I've already browsed http://trace.ddbj.nig.ac.jp/ All projects are too big or for transcriptome analysis.

ADD REPLY
1
Entering edit mode
11.1 years ago
JacobS ▴ 990

Let me elaborate a bit more on my comment...

I would suggest going directly to NCBI's SRA database, where there are hundreds of studies to choose from. Following this link, you can search for keywords. I searched for "genomic SNP" and found a bunch of whole genome, DNA-Seq studies with reads and publications available. For example, one of the first results is for cow genome resequencing for variant discovery, found here. When talking about SNP analysis of whole genome sequencing, the projects are going to be big, and if they are not, they probably aren't great projects. Unless we're talking about bacteria, and maybe that would be a good starting place for you. See this study for genome sequencing and SNP calling for many strains of Streptococcus, and the published results of the work here.

In the bacterial example above, the materials and methods of the publication wasn't very in depth, and I fear that may be true of most publications these days. Many times the bioinformatics materials and methods is left at, "...followed the recommended protocols of the manufacturer/developer/software." This is a real problem in reproducible data today! It may be worth your time to look at some of the Nature protocol articles about easy SNP calling pipelines. If you are real lucky, maybe you can find a SNP study in which the workflow was saved as a Galaxy pipeline for the maximum ability to reproduce the materials and methods.

Hope this helps!

EDIT:

Here's one of the Nature protocol's I was talking about: "Genotype and SNP calling from next-generation sequencing data" Besides that, I would look into the two most generally used methods, namely SAMTools pileup and GATK, if you need more help with methods.

ADD COMMENT
0
Entering edit mode

Thank you for your suggestions! I've already found what i need on sra.dnanexus.com with "snp"-key word. As you mentioned above, i've mentioned it too, that huuuge amount of papers describe variant calling like "we used samtools and our own scripts for filtering". That's all, it is not reproducible at all.

ADD REPLY
0
Entering edit mode
11.1 years ago

This will be a nice paper to learn about NGS analysis.

Sequencing and characterization of the FVB/NJ mouse genome

http://genomebiology.com/2012/13/8/R72

They have used BWA, Samtools and GATK and have explained steps performed in a detailed way. Data can be downloaded from links available in MGP (http://www.sanger.ac.uk/resources/mouse/genomes/).

PS: Use ascp transfer protocol to download the data. It will download data within 10 minutes.

ADD COMMENT
0
Entering edit mode

Thank you for your comment, i've already found interesting paper! But now I will think about your variant. I found that paper http://genome.cshlp.org/content/22/3/508.long#sec-14

ADD REPLY
0
Entering edit mode

Just checked your paper. They have used MAQ which is outdated and very slow. I would suggest you to use BWA.

ADD REPLY
0
Entering edit mode

Oh, but i will not use MAQ! It is more interesting to repeat analysis with other tools, i used samtools and will annotate with snpeff. I decided try this paper because of clear idea and data-size.

ADD REPLY

Login before adding your answer.

Traffic: 2929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6