Question

How to get plasmid or costruct sequence from fastq file in c.elegans Whole-genome Sequencing?

0

Entering edit mode

6.5 years ago

GiV17 ▴ 50

Hi,

I have a fastq file generated from a whole genome sequencing by C. elegans samples and I want to find in which site (or sites) a construct/plasmid has been integrated in the genome, comparing my sequence with a published genome in Ensembl (WB235).

I had a C. elegans strain with a construct integrated in the genome. From the construct design, I expect it to integrate into chromosome III of C. Elegans. So, by using a tipical SNPs and indels workflow from Varscan, I managed to identify Variants in all samples. Furthermore, I also analyzed the CNVs through BreakDance tool only in chr III.

But my question is: How can I find where the construct is integrated? I would like the position (Start and End) respect to a genome reference. It's possible?

Thanks for all.

sequencing sequence R • 3.5k views

ADD COMMENT • link updated 6.5 years ago by Joe 21k • written 6.5 years ago by GiV17 ▴ 50

1

Entering edit mode

If I were you, I think I would do something like below.

Extract flanking sequences of plasmid sequences from whole genome sequencing fastq files.
Blat the flanking sequences.

ADD REPLY • link 6.5 years ago by mbk0asis ▴ 700

0

Entering edit mode

Sorry, but I didn't understand... I have the construct sequence in fasta format, but I would like to get How this sequence is integrated in several sample fastq files. I don't have a flanking sequences... so How can I do?

ADD REPLY • link 6.5 years ago by GiV17 ▴ 50

1

Entering edit mode

If the plasmids were integrated into the genome and you ran whole genome sequencing, certain reads will have hybrid sequences (part of plasmid + part of genomic sequence).

Then, you go though your reads possessing the plasmid sequences, and some of them will have the hybrid sequences which are the "flanking sequence" I mentioned.

Extract those reads. Remove plasmid sequences. Then, align the flanking sequences on genome to obtain the genomic coordination.

ADD REPLY • link 6.5 years ago by mbk0asis ▴ 700

0

Entering edit mode

Hi mbk0asis, so, to sum up, you advise me: 1- To map fastq files on Construct and to extract unmapped Paired end reads 2- To take the unmapped reads and to remap respect to genome of C. elegans and so I'll obtain the genomic coordination of the unmapped? Thanks

ADD REPLY • link 6.5 years ago by GiV17 ▴ 50

0

Entering edit mode

Map the fastq on construct.

Extract mapped reads.

Trim off the plasmid sequences.

Map the remaining part of genomic sequence to reference genome.

That's my rough idea.

ADD REPLY • link 6.5 years ago by mbk0asis ▴ 700

score 1 · Answer 1 · 2018-06-19

Here's an approach which might work:

Align all your reads in your fastq to your known plasmid sequence (you might need to experiment with some stringency).
de novo assemble your remaining reads to see if you regenerate the complete plasmid sequence.
Hopefully, the reads which span the very edges of the plasmid in the genome will be retained in your new assembly (assuming they weren't thrown out by the mapping step if it was too stringent.
Take the flanks of your new assembly, which with any luck will be a nice single contig containing a small amount of joining sequence.
BLAST (or similar) your flanking sequences back against the reference/target genome which will give you the positiions of insertion.

The only problem I can forsee (other than not enough reads being retained after alignment as I mentioned), is if the flanking sequences are quite repetitive, in which case you might end up identifing multiple places within the genome.

Now here's what I would actually have done:

Design some primers internal to your plasmid pointing out along the genome (if you know how the plasmid integrates), then just send the DNA+Primer for Sanger sequencing for about $3. Basically end-sequence your joins, and you'd get more than enough sequence back that way to be certain of where the plasmid is.

score 0 · Answer 2 · 2018-06-19

0

Entering edit mode

6.5 years ago

Lisa Ha ▴ 120

You can get the flanking sequences by extracting the reads that contain the beginning or end of the construct. Then you can search for the flanking sequences in the genome. To verify the position, you can do a PCR on the strain containing the construct.

ADD COMMENT • link 6.5 years ago by Lisa Ha ▴ 120

0

Entering edit mode

I Lisa Ha, Thank you for your answer, but I would like a bioinformatics tool to find the position of the construct respect to genome... I do not know if this is possible...

ADD REPLY • link 6.5 years ago by GiV17 ▴ 50

0

Entering edit mode

You are unlikely to find a tool 'ready made' for this. You're going to have to get your hands dirty.

ADD REPLY • link 6.5 years ago by Joe 21k

0

Entering edit mode

Of course... it is clear! In fact I do not want a single tool but at least an idea for a strategy.

ADD REPLY • link 6.5 years ago by GiV17 ▴ 50

0

Entering edit mode

I doubt there is a single tool that does exactly what you want. You're going to have to put in a bit of effort. You can use grep on the command line to extract the reads that contain parts of your construct. Then map these to the genome and look at where the reads align (and stop aligning) with a genome viewer, something like IGV or the online Ensembl browser.

ADD REPLY • link 6.5 years ago by Lisa Ha ▴ 120