Phred/Phrap pipeline starting with FASTA file of paired-end reads and using a reference sequence
1
0
Entering edit mode
9.2 years ago
DaniCee ▴ 20

Hello everyone,

this is my first question here, and I am still quite new with this topic. I need to assemble short reads guided (or not) by a reference sequence using Phrap.

I have a FASTA file with 50bp paired-end reads (I also have it in SAM, BAM, and FASTQ formats) mapping to a full reference sequence I have in FASTA format as well. I obtained my read maps with Bowtie2 and samtools.

I explicitly want to use Phrap to obtain a full-length assemble of the reads and compare it to the reference via a pairwise alignment with Needle. I want to do it using the reference as guide, and not using it as well.

I have been sent the Phred and Phrap programs, but I am quite lost. I have tried Phrap alone with no quality file, but I get many short contigs instead of one long one.

I understand I should follow the whole Phred -> Phd2fasta -> CrossMatch -> Phrap protocol, but I do not seem to find my way around it. It seems Phred uses a chromatogram file as input, but I do not know how to obtain it.

So my question is how should I follow the Phred/Phrap protocol starting with a FASTA file (SAM, BAM, or FASTQ) with 50bp reads mapping a reference FASTA file, as inputs? I want to obtain a contig that spans the full length of the reference (using the reference and not using it as input).

Thanks a lot!

phred reference assembly phrap fasta • 2.8k views
ADD COMMENT
0
Entering edit mode
9.2 years ago

50bp reads will not give you a good assembly no matter what you do, unless you are trying to assemble a tiny virus.

You might, possibly, get a better assembly using Spades, which is very easy to use. There's no point in using OLC/String Graph assemblers on such tiny reads. But unless you are working on a virus (and often, even then, as viruses can be hard to assemble), you will not get a 1 contig assembly from 50bp reads. Certainly, never for a bacteria. You'd be lucky to get a 1000-contig assembly of a bacteria using 50bp reads.

What kind of organism are you trying to assemble? And why are you using 50bp reads?

ADD COMMENT
0
Entering edit mode

I am just trying to assemble VDJ combinations, not whole genome; I run bowtie2 with all the cell reads against a certain combination. I wasn't getting totally bad results with velvet, but I was getting a perfect assembly with codoncode, which uses phred and phrap, that's why I wanted to use phred and phrap to automate the process... how should I use phred and phrap? I will look into spades too.

ADD REPLY
0
Entering edit mode

How can I run Phred/Phrap with a FASTQ/FASTA/BAM/SAM file as input and with a reference FASTA sequence?

ADD REPLY
0
Entering edit mode

I am trying to convert my input FASTA/FASTQ file into a chromatogram SCF or ABI file using BioPerl as indicated in this other thread C: Converting A Dna Sequence To Abi Or Scf Format but this approach does not work... any clue?

ADD REPLY
0
Entering edit mode

Can anyone help with this? Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6