Question

First thing to do when you receive your sequenced genome of interest.

6

Entering edit mode

10.9 years ago

Medhat 9.8k

As it is clear in the title I am asking for advice

What is the first thing that I shall do when I receive the sequenced data for the genome am interested in,

like for example is there is a tool to check that the whole genome where sequenced probably? "there was no missing parts that was not sequenced". is the sequence good and I can proceed to the other steps? or I need to repeat something "like if some regions are not covered for example", and what is the other thing that I need to but in my consideration from your expertise and point of view.

Thanks in advance,

Assembly genome sequence next-gen • 4.0k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Medhat 9.8k

Ram · Answer 1 · 2014-05-29

4

Entering edit mode

10.9 years ago

Istvan Albert 102k

map it (or a subset of it) against the genome that will tell you right away what the data looks like

ADD COMMENT • link 10.9 years ago by Istvan Albert 102k

1

Entering edit mode

What if I do not have reference genome?

ADD REPLY • link 10.9 years ago by Medhat 9.8k

1

Entering edit mode

well that's trickier, you can always try a closely related species,

also look for and remove contaminants, we just had two situations where not dealing with contamination right away led to setbacks.

ADD REPLY • link 10.9 years ago by Istvan Albert 102k

0

Entering edit mode

about removing contamination did you meant using blast for example to decide if there is other sequences than the expected specious and then remove it? or I miss understood?

If I rightly understood "How to remove it if there is such thing?"

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 10.9 years ago by Medhat 9.8k

1

Entering edit mode

yes align it with a short read aligner (but not blast because you want to remove only reads that match very closely) to the contaminant then export from the alignment the unaligned reads into a new fastq file.

ADD REPLY • link 10.9 years ago by Istvan Albert 102k

score 4 · Answer 2 · 2014-05-29

I will do a quality check first with FastQC or simple stats with R, sometimes data is bad from the beginning. After that, as Istvan said, you can map your sequences to the genome and compute the average coverage (BWA, STAR or your prefer mapper) and check uncovered or suspicious regions such as high coverage in repetitive regions.

Ram · Answer 3 · 2014-05-29

3

Entering edit mode

10.9 years ago

xb ▴ 420

Check the quality of the sequencing reads before mapping(?), for instance, using fastx and trim (adapters/primers, if any) accordingly. Then map!

ADD COMMENT • link updated 5.3 years ago by Ram 45k • written 10.9 years ago by xb ▴ 420

score 3 · Answer 4 · 2014-05-29

3

Entering edit mode

10.9 years ago

mikhail.shugay 3.5k

FastQC first, then if it is genome re-sequencing map and check coverage metrics, for de-novo genome assemble contigs and check metrics like N50 contig size

ADD COMMENT • link 10.9 years ago by mikhail.shugay 3.5k

Ram · Answer 5 · 2014-06-03

2

Entering edit mode

10.9 years ago

lexnederbragt ★ 1.3k

If you do not have a reference genome, it is hard to find regions not covered. A few tips:

run SGA's preqc, this will tell you quite a bit about your genome and dataset
run assemblies and use tools like blobology to assess species content
if your species is a vertebrate, run CEGMA to check whether the gene space of your assembly seems complete (helps also to choose between assemblies)
if it is a bacteria, on the other hand, run iMetAMOS, it does much of the above in an automated fashion.

ADD COMMENT • link updated 5.3 years ago by Ram 45k • written 10.9 years ago by lexnederbragt ★ 1.3k

0

Entering edit mode

+1 very informative but, do you have any idea if I'm dealing with plant genome?

ADD REPLY • link 10.9 years ago by Medhat 9.8k

Ram · Answer 6 · 2014-05-31

1

Entering edit mode

10.9 years ago

Prakki Rama ★ 2.7k

Adapters can make an assembly a real mess. Sometimes, there might be partial adapters also present in the reads. So, trimming them atleast within our scope is better. If adapters are not known, requesting them from the sequencing center and running experiments is fruitful.

ADD COMMENT • link 10.9 years ago by Prakki Rama ★ 2.7k

0

Entering edit mode

If you are working with Illumina, you can check against this: http://supportres.illumina.com/documents/documentation/chemistry_documentation/experiment-design/illumina-customer-sequence-letter.pdf

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 10.9 years ago by Biomonika (Noolean) 3.2k