Question

ways to concatenate individual DNA sequences together to form complete sequence

0

Entering edit mode

5.2 years ago

abdul.karim • 0

Hi,

I am very new in the field of Genomics. So I apologize for a very basic question I am about to ask.

I have raw DNA sequences for many samples. For a single sample, the DNA seq is chopped into fixed sized fragments and stored in FASTQ format.

For instance, sample A DNA sequence is chopped into 101562193 fragments each with a length of 151.

Is there any way I can concatenate the fragments in right order to reconstruct the whole DNA string?

Or that is not possible?

RNA-Seq rna-seq sequencing gene Assembly • 1.6k views

ADD COMMENT • link updated 5.2 years ago by swbarnes2 14k • written 5.2 years ago by abdul.karim • 0

score 2 · Accepted Answer · 2019-09-06

Hi @abdul.karim

It's not that easy as concatenating them to reconstruct the original DNA sequence. What you have is the result of sequencing a sample with an NGS sequencer (most probably an Illumina one) and each of your fragments is called a read. You should start by mapping them into the genome, that is, finding the most probable part of the genome where the original molecule that was sequenced came from. To do that you need a read mapper. Take a look at BWA as a widely used one.

However, I would recommend you to read a few tutorials and to seek help from colleagues before starting with that. This would help you get up to speed much faster and avoid the many common errors we all did at the beginning.

score 2 · Accepted Answer · 2019-09-06

2

Entering edit mode

5.2 years ago

swbarnes2 14k

Is there any way I can concatenate the fragments in right order to reconstruct the whole DNA string?

Sure you could concatenate them, but as bernatgel said, this is almost certainly output from an Illumina sequencer, and the reads are unplaced position-wise. Simply concatenating them would be nonsense.

If you have a reference genome that is a close match, you could align the reads to it, and make a consensus sequence.

If you have no reference at all, you can try to assemble the reads, which will almost certainly not give you a single resulting sequence, but many many contigs.

ADD COMMENT • link 5.2 years ago by swbarnes2 14k

0

Entering edit mode

Thank you for the help. Would you please explain what is reference genome? by aligning, do you mean that I compare each of the read with the reference genome?

ADD REPLY • link 5.2 years ago by abdul.karim • 0

2

Entering edit mode

What tutorials have you looked at? I bolded key words so you would know what to Google.

ADD REPLY • link 5.2 years ago by swbarnes2 14k