Hi all, Long time lurker and dumb grad student - I created an account to ask for guidance.
I received 454 sequencing results and I am at a loss of where to start the analysis. I have worked through several tutorials such as Mothur and Qiime2 in preparation but the files I received are not similar to those you begin with in these tutorials.
My results were returned in .ab1, .scf, .phd.1, and .seq formats. I was able to convert the .scf files to .fasta to work with a more familiar format and do some preliminary BLAST searches. But that is where I'm stuck.
When looking at the Mothur tutorial for 454, it starts with sff files that I don't have. Qiime2 begins with .fna and .qual files that I didn't receive.
Am I missing something? I used a very popular commercial DNA sequencing business so I know this is just a byproduct of my ignorance. Unfortunately my advisor won't spend the money to have a proper bioinformatician do this so I have to learn.
Thank you for taking the time to read this and any advice you toss this way.
Are you certain this is 454 data? These appear to be sanger sequencing data file formats. How many files/reads do you have?
gosh I am such an idiot! Yes these are Sanger Sequencing files - not 454. I have nothing to blame that on except lack of sleep and pandemic brain.
Are you sure you have 454 data? All file formats you mention are commonly used for Sanger sequencing, not for 454. I never dealt with 454 data, but I think the default output was .sff (standard flowgram format), which was commonly processed directly by some tools, or converted to fastq. I am not aware of a 454 - ab1 converter.
It would help if you provide more details. What did you sequence, amplicons from isolates, or metagenomic amplicons? Maybe you should contact the sequencing provider and ask for the technical details.
My primary tool to work with Sanger sequences is Staden. It is a bit cumbersome and has a somewhat steep learning curve, but it is the most complete open source tool. If you have access (e.g., institutional license), there are some wonderful tools like Sequencher and others.
Definitively that looks like Sanger data, you will need to convert the *.ab1 files to fasta, you can use BioPerl/BioPython for that.