Question

Single Gene Sequencing And Mutation Analysis

2

Entering edit mode

13.7 years ago

Travis ★ 2.9k

I may have to do some work on a study involving sequencing of a single human gene (22 KB) and subsequent SNP/Indel/SV analysis in approx 60 individuals. I just want to check if the approach seems sensible (some questions included). The study itself may be nonsensical (single gene/small group) but unfortunately I have no control over that.

- Sequence using primer walking/traditional Sanger sequencing (is the output format from AB 3730s FASTQ nowadays?)

- Align each sample separately to genome (BWA-SW okay for these long reads?)

- Call SNPs/Indels separately for each sample (SAMtools or GATK - is the GATK workflow overkill for long, accurate reads like these?)

- Look for SVs (SVDetect or Breakdancer?)

- Annotate known/novel/consequence etc (Annovar or something similar)

Thanks in advance.

sanger snp structural next-gen sequencing alignment • 8.4k views

ADD COMMENT • link updated 13.7 years ago by Leonor Palmeira 3.9k • written 13.7 years ago by Travis ★ 2.9k

0

Entering edit mode

I hope the 22kb is the genomic size, not the size of exons you want to screen.

ADD REPLY • link 13.7 years ago by Darked89 4.7k

0

Entering edit mode

22 kb is the full size of the gene - introns and exons. The sequence provider seems to think Sanger is the best option (the project is tied to one gene - it can't go beyone that for reasons of consent) but if there's a better option I'd be glad to hear it.

ADD REPLY • link 13.7 years ago by Travis ★ 2.9k

0

Entering edit mode

22 kb is the full size of the gene - introns and exons. The sequence provider seems to think Sanger is the best option (the project is tied to one gene - it can't go beyond that for reasons of consent) but if there's a better option I'd be glad to hear it.

ADD REPLY • link 13.7 years ago by Travis ★ 2.9k

score 2 · Answer 1 · 2011-08-11

With Sanger the conventional mutation screen was often restricted to coding exons/splice sites. This means PCR with primers ca 50bp inside introns, then sequencing of PCR products. The whole coding sequence should be covered on both strands, with resequencing of failed sequences. Fairly labor intensive.

As for the software, the old good option was Staden with pregap4/gap4. This allows for automatic tagging of SNPs and viewing all traces.

Also I do not think that switching from ABI/SCF format to FASTQ is a good idea for heterozygous PCR fragments with potential indels. You can recover information about 1bp indel, say at position 100, but in FASTQ all what you will get will be very low scores/Ns for bases 100+.

score 2 · Answer 2 · 2011-08-11

It seems to me that doing some NGS on this project would be like buying a bazooka to kill a fly. You will probably end up paying a large amount of money, and getting way too much coverage for your purpose. You might think that a huge coverage is not a problem, but it is for many assemblers. So you would probably then randomly pick 10% of your data to do your mapping... it's your call.

Having said that, I'll try to answer some of your questions:

Sequence using primer walking/traditional Sanger sequencing (is the output format from AB 3730s FASTQ nowadays?)

The ABI Sanger 3730 outputs .ab1 files (chromatograms) which contain the trace values. You can then base-call and quality-call these, say with 'phred' for instance. You could then remove low quality bases at the beginning and end of sequences with 'ttuner' if needed.

I have no experience with primer walking, but as I read you have a reference genome, wouldn't it be better to design primers along your 22kb region, so that you can sequence everything (with redundancy and without worries on the length of your fragments) and then assemble these reads together?

Align each sample separately to genome (BWA-SW okay for these long reads?)

Call SNPs/Indels separately for each sample (SAMtools or GATK - is the GATK workflow overkill for long, accurate reads like these?)

Look for SVs (SVDetect or Breakdancer?)

Annotate known/novel/consequence etc (Annovar or something similar)

All these tools are NGS-aimed tools. Although 'bwa-sw' (and others) should be fine for these long reads, I would suggest, like #darked89, to use tools that are more Sanger-focused such as Staden/gap4 or MIRA/gap4.

score 0 · Answer 3 · 2011-08-10

0

Entering edit mode

13.7 years ago

Swbarnes2 ★ 1.6k

22 kb is a lot of sanger sequencing; any chance you could next-gen sequence it instead?

At some point, you are going to want to look at the trace files themselves, at least at the sites of potential non-SNP variations, so you will also need some kind of software for that.

ADD COMMENT • link 13.7 years ago by Swbarnes2 ★ 1.6k

0

Entering edit mode

RE Next gen - see my comment above. Why would inspecting trace files be necessary with consensus sequences generated from such an accurate technology?

ADD REPLY • link 13.7 years ago by Travis ★ 2.9k

0

Entering edit mode

You are looking at 22 kb of sequence. So if your sequencing data is only 99.99% accurate, that's two false positives a sample. It would be very tricky for a computer to correctly diagnose a heterozygous indel, but a person can do it easier looking at traces. (It would be easy to spot with next-gen sequencing)

ADD REPLY • link 13.7 years ago by Swbarnes2 ★ 1.6k

0

Entering edit mode

I'm waiting on further proposal detail from the sequencing provider but we would be sequencing both strands, and hopefully with some degree of redundancy.

ADD REPLY • link 13.7 years ago by Travis ★ 2.9k