Single Gene Sequencing And Mutation Analysis
3
2
Entering edit mode
13.3 years ago
Travis ★ 2.8k

I may have to do some work on a study involving sequencing of a single human gene (22 KB) and subsequent SNP/Indel/SV analysis in approx 60 individuals. I just want to check if the approach seems sensible (some questions included). The study itself may be nonsensical (single gene/small group) but unfortunately I have no control over that.

- Sequence using primer walking/traditional Sanger sequencing (is the output format from AB 3730s FASTQ nowadays?)

- Align each sample separately to genome (BWA-SW okay for these long reads?)

- Call SNPs/Indels separately for each sample (SAMtools or GATK - is the GATK workflow overkill for long, accurate reads like these?)

- Look for SVs (SVDetect or Breakdancer?)

- Annotate known/novel/consequence etc (Annovar or something similar)

Thanks in advance.

sanger snp structural next-gen sequencing alignment • 8.0k views
ADD COMMENT
0
Entering edit mode

I hope the 22kb is the genomic size, not the size of exons you want to screen.

ADD REPLY
0
Entering edit mode

22 kb is the full size of the gene - introns and exons. The sequence provider seems to think Sanger is the best option (the project is tied to one gene - it can't go beyone that for reasons of consent) but if there's a better option I'd be glad to hear it.

ADD REPLY
0
Entering edit mode

22 kb is the full size of the gene - introns and exons. The sequence provider seems to think Sanger is the best option (the project is tied to one gene - it can't go beyond that for reasons of consent) but if there's a better option I'd be glad to hear it.

ADD REPLY
2
Entering edit mode
13.3 years ago
Darked89 4.7k

With Sanger the conventional mutation screen was often restricted to coding exons/splice sites. This means PCR with primers ca 50bp inside introns, then sequencing of PCR products. The whole coding sequence should be covered on both strands, with resequencing of failed sequences. Fairly labor intensive.

As for the software, the old good option was Staden with pregap4/gap4. This allows for automatic tagging of SNPs and viewing all traces.

Also I do not think that switching from ABI/SCF format to FASTQ is a good idea for heterozygous PCR fragments with potential indels. You can recover information about 1bp indel, say at position 100, but in FASTQ all what you will get will be very low scores/Ns for bases 100+.

ADD COMMENT
0
Entering edit mode

Hmm the Sanger route sounds problematic (despite the fact the sequence provider recommended it). I'm trying to go down the target enrichment/NGS route now. The x coverage will be huge though!

ADD REPLY
2
Entering edit mode
13.3 years ago

It seems to me that doing some NGS on this project would be like buying a bazooka to kill a fly. You will probably end up paying a large amount of money, and getting way too much coverage for your purpose. You might think that a huge coverage is not a problem, but it is for many assemblers. So you would probably then randomly pick 10% of your data to do your mapping... it's your call.

Having said that, I'll try to answer some of your questions:

  • Sequence using primer walking/traditional Sanger sequencing (is the output format from AB 3730s FASTQ nowadays?)

The ABI Sanger 3730 outputs .ab1 files (chromatograms) which contain the trace values. You can then base-call and quality-call these, say with 'phred' for instance. You could then remove low quality bases at the beginning and end of sequences with 'ttuner' if needed.

I have no experience with primer walking, but as I read you have a reference genome, wouldn't it be better to design primers along your 22kb region, so that you can sequence everything (with redundancy and without worries on the length of your fragments) and then assemble these reads together?

  • Align each sample separately to genome (BWA-SW okay for these long reads?)

  • Call SNPs/Indels separately for each sample (SAMtools or GATK - is the GATK workflow overkill for long, accurate reads like these?)

  • Look for SVs (SVDetect or Breakdancer?)

  • Annotate known/novel/consequence etc (Annovar or something similar)

All these tools are NGS-aimed tools. Although 'bwa-sw' (and others) should be fine for these long reads, I would suggest, like #darked89, to use tools that are more Sanger-focused such as Staden/gap4 or MIRA/gap4.

ADD COMMENT
1
Entering edit mode

@Travis: "sequencing providers [...] are clueless." Unfortunately, this is so true... Maybe your best input on this, would be to browse the 'Material and Methods' of papers doing the same king of study you are looking to achieve. This is usually a good starting point.

ADD REPLY
0
Entering edit mode

Good input thanks. I must say the major problem here is getting good advice from the sequencing providers. Single gene studies seem to be a forgotten art. They are clueless and their advice/suggestions/quotes are disparate. I'm leaning towards Sanger and some traditional tools at the moment.

ADD REPLY
0
Entering edit mode

Good input thanks. I must say the major problem here is getting good advice from the sequencing providers. They are clueless and their advice/suggestions/quotes are disparate. Single gene studies seem to be a forgotten art. I'm leaning towards Sanger and some traditional tools at the moment.

ADD REPLY
0
Entering edit mode
13.3 years ago
Swbarnes2 ★ 1.6k

22 kb is a lot of sanger sequencing; any chance you could next-gen sequence it instead?

At some point, you are going to want to look at the trace files themselves, at least at the sites of potential non-SNP variations, so you will also need some kind of software for that.

ADD COMMENT
0
Entering edit mode

RE Next gen - see my comment above. Why would inspecting trace files be necessary with consensus sequences generated from such an accurate technology?

ADD REPLY
0
Entering edit mode

You are looking at 22 kb of sequence. So if your sequencing data is only 99.99% accurate, that's two false positives a sample. It would be very tricky for a computer to correctly diagnose a heterozygous indel, but a person can do it easier looking at traces. (It would be easy to spot with next-gen sequencing)

ADD REPLY
0
Entering edit mode

I'm waiting on further proposal detail from the sequencing provider but we would be sequencing both strands, and hopefully with some degree of redundancy.

ADD REPLY

Login before adding your answer.

Traffic: 1847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6