Finding Repeat Genes In Unused Reads
1
1
Entering edit mode
13.0 years ago
Lee Katz ★ 3.2k

Hi, I would like to find the sequence of a repeat gene in my WGS reads. I have raw reads from both 454 and Illumina, and I have a fasta file of several alleles/variants of a given gene. I know that this gene exists in the genome from a PCR reaction/gel.

Is there a standard strategy to uncover the repeat sequence in a pseudocontig? As in, a consensus of the repeat? Has someone already done this in a software suite?

Thank you for any and all help!

repeats assembly • 2.1k views
ADD COMMENT
0
Entering edit mode

I put my tentative strategy as an answer but I am still wondering what others have done.

ADD REPLY
1
Entering edit mode
13.0 years ago
Lee Katz ★ 3.2k

I have thought through it a little, and I think that my best method right now is to look at the Newbler output, especially the reads labeled as repeat, and try assembling them by themselves. Also, I might include those labeled as singletons. However, I am not sure how I would use the Illumina reads yet if at all.

So, my tentative strategy:

  1. extract repeat reads (sfffile/sffinfo)
  2. blast my alleles against reads to pick out relevant reads, using a small word size and liberal e value
  3. extract those reads from the SFF file (sfffile)
  4. assemble the relevant reads (using Minimo? Or Newbler again?)
ADD COMMENT
0
Entering edit mode

Seems sound to me. You could try the above but also by adding in your alleles to the assembly process (step 4).

ADD REPLY

Login before adding your answer.

Traffic: 1415 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6