Hi,
I have a fastq file generated from a whole genome sequencing by C. elegans samples and I want to find in which site (or sites) a construct/plasmid has been integrated in the genome, comparing my sequence with a published genome in Ensembl (WB235).
I had a C. elegans strain with a construct integrated in the genome. From the construct design, I expect it to integrate into chromosome III of C. Elegans. So, by using a tipical SNPs and indels workflow from Varscan, I managed to identify Variants in all samples. Furthermore, I also analyzed the CNVs through BreakDance tool only in chr III.
But my question is: How can I find where the construct is integrated? I would like the position (Start and End) respect to a genome reference. It's possible?
Thanks for all.
If I were you, I think I would do something like below.
Sorry, but I didn't understand... I have the construct sequence in fasta format, but I would like to get How this sequence is integrated in several sample fastq files. I don't have a flanking sequences... so How can I do?
If the plasmids were integrated into the genome and you ran whole genome sequencing, certain reads will have hybrid sequences (part of plasmid + part of genomic sequence).
Then, you go though your reads possessing the plasmid sequences, and some of them will have the hybrid sequences which are the "flanking sequence" I mentioned.
Extract those reads. Remove plasmid sequences. Then, align the flanking sequences on genome to obtain the genomic coordination.
Hi mbk0asis, so, to sum up, you advise me: 1- To map fastq files on Construct and to extract unmapped Paired end reads 2- To take the unmapped reads and to remap respect to genome of C. elegans and so I'll obtain the genomic coordination of the unmapped? Thanks
Map the fastq on construct.
Extract mapped reads.
Trim off the plasmid sequences.
Map the remaining part of genomic sequence to reference genome.
That's my rough idea.