Question

fastq to fasta conversion

1

Entering edit mode

6.8 years ago

monicasteffi ▴ 10

This is a very basic question. I converted fastq file to fasta using the following commend

 seqtk seq -a combined_RP.fastq.gz > RP.fasta

When I open the fasta file, I got the follwoing lines:

>E00502:101:ZA170417199:8:1101:27904:1731 1:N:0:ATGGGC
GTCNGTGAACTAGAAAATTTCTTGAAGTTGGAACCGCAAGTATTTGTTACCAATCCTCCTCAAAGTAGTATATGGCAAGAACTT....
>E00502:101:ZA170417199:8:1101:28168:1731 1:N:0:ATGAGC
TTGNAGTTTCAGTCAAAATCTAACTATTAAAATAAGGAATTTAAAACCTTACTCGCGCAGCATCCCGATCGCGGTGAGGTCAC...
>E00502:101:ZA170417199:8:1101:28716:1731 1:N:0:ATGAGC
AATNGGTTTTACTTTAATTTCTCTACTTCTATACTCTGTACATAATGTAATTAAGGGTGAATGAAGGGGTCACTAACAC....

My next step will be ab initio gene identification. Can I proceed with the same fasta file? or How do I get a fasta file with continuous stretch of sequences. Thank you in advance

fasta fastq GO • 5.1k views

ADD COMMENT • link updated 5.9 years ago by swbarnes2 14k • written 6.8 years ago by monicasteffi ▴ 10

0

Entering edit mode

You know have reads, which are not the full genome fasta. You first need to perform a de novo assembly to combine all reads in one fasta genome, then you can do gene identification.

ADD REPLY • link 6.8 years ago by WouterDeCoster 47k

0

Entering edit mode

I have a similar issue. I used BWA to generate a mapped genome assembly and called a consensus sequence (.fq). I wanted to run QUAST on my mapped assembly, but couldn't get quast to work with my bam files so I tried to convert my fq file into a fasta file and got a similar result. Although it looks like multiple reads or sequences, I believe it may be my data mapped to scaffolds? The names (associated with '>') match the reference scaffold names. I too had hoped to get a continuous stretch of sequences to input into quast, but am not sure how...

ADD REPLY • link 5.9 years ago by jcolella.jc • 0

0

Entering edit mode

and called a consensus sequence (.fq)

How?

ADD REPLY • link 5.9 years ago by WouterDeCoster 47k

score 2 · Answer 1 · 2018-03-15

2

Entering edit mode

6.8 years ago

lakhujanivijay 5.9k

Reads are short sequencing runs of chunks of the actual sequence stretch you wanted to have the sequence for. consider example for paired end reads

The middle sequence is the chunk of DNA you want to sequence. R1 and R2 are reads

R1
-------------->
--------------------------------------------------
                                    <------------- R2

Hence, first step is to assemble the reads into contigs / scaffolds which represent your assembly. Only after that you can predict the genes. Reads are too short stretches to be considered for gene prediction.

ADD COMMENT • link 6.8 years ago by lakhujanivijay 5.9k

0

Entering edit mode

Thank you for the reply. Ive run bowtie with reference. And I got sam file. Can I proceed with sam file further ?

ADD REPLY • link 6.8 years ago by monicasteffi ▴ 10

1

Entering edit mode

It appears to me that you are new to NGS. Please confirm. That will help us to direct you

I recommend you to go through this paper

ADD REPLY • link 6.8 years ago by lakhujanivijay 5.9k

score 0 · Answer 2 · 2019-02-08

You have successfully converted your fastq of short reads into a fasta of short reads. Conversion is not your problem

But that's not what you want. You want to map your fastqs, which I guess you've done, and use some other software that will take all the reads and their mapping coordinates and make a consensus sequence. You need to look for software that will make a consensus sequence from your .bam