Question

Starting the assembly with .fastq file

0

Entering edit mode

8.5 years ago

Raghul ▴ 200

Hi Members, We gave the samples for sequencing & we have obtained the .fastq file & .fastq zip file. I want to assemble & annotate the sequences. It is low/moderate coverage sample. So what can I do next? I have windows computer. Is it enough? Please do not be irritated by the simplicity of the question! I have to learn & complete the project! There was an excel sheet containing the following information (What can I infer from this anybody)

Experiment Name MISEQ RUN 75
Workflow GenerateFASTQ
Application FASTQ Only
Assay TruSeq LT
Description P178_P128_P271_P150 Chemistry Default

Thanks Raghul

Assembly NGS genome sequence windows • 6.7k views

ADD COMMENT • link 8.5 years ago by Raghul ▴ 200

0

Entering edit mode

Mammalian genome de novo assembly with 301 reads of MiSeq data in fa format (no qualities?)? This sounds like Mission Impossible. Could you please tell the species' name, print out first few lines of your fa and fa zip files as well sizes of these files or number of lines. Thank you

ADD REPLY • link 8.5 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Sorry it is a .fastq file

ADD REPLY • link 8.5 years ago by Raghul ▴ 200

score 1 · Answer 1 · 2017-03-24

1

Entering edit mode

8.5 years ago

Philipp Bayer 8.9k

Have you looked inside the .fa file? Normally genome sequencing reads are delivered as fq/fastq files, a .fa(.fasta) is usually an assembly, not sequencing results. If the file starts with '>' then I guess it's a fasta and you got a finished assembly - if it starts with '@' it's a fastq and you got sequencing reads.

Are you sure you got 301 reads? That's nothing!

If you have fastq files and want to annotate them, there are myriad tutorials out there for you on genome assembly, depending on the complexity of your organism, your chemistry etc. you may want to change, like this one: https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/genome-assembly-velvet

For annnotation there are different pipelines depending on, again, the complexity of your organism. For prokaryotes look at prokka - for fungal genomes look at funannotate - for everything else have a look at the MAKER pipeline.

ADD COMMENT • link 8.5 years ago by Philipp Bayer 8.9k

0

Entering edit mode

Hi Philipp Sorry for the late reply. I had some network issues & I am working in a small town in Ethiopia. Still looking into the files. Thanks! Raghul

I have fastq files(mammalian genome). So I want a assembler for Windows. Can anybody suggest an appropriate tool! Thanks

ADD REPLY • link 8.5 years ago by Raghul ▴ 200

0

Entering edit mode

I have fastq files(mammalian genome). So I want a assembler for Windows.

I don't think there are any software packages that will run on windows that can handle a human genome. You will also need a lot of RAM (hundreds of GB) to assemble a human genome. Since you appear to have MiSeq data it is unlikely that you have enough data to assembly a human genome.

Your best bet to start analyzing your data is to align it to existing reference. Even for that you are going to need anywhere between 6 to >30 GB of free RAM. You are likely to have very low coverage (if this is a whole genome sequencing experiment).

ADD REPLY • link 8.5 years ago by GenoMax 154k