Question

How to perform sequence validation of NGS data?

1

Entering edit mode

5.9 years ago

abdul.suboor123 • 0

I have done analysis of circDNA NGS data, now I am validating the circDNA sequence reads with sanger sequence, but this is my 5th time I am trying on notepad++, in "find" option I put the junction site "five bases from start and five bases from end" of my circDNA sequence read, but it doesn't match. Please tell me the better option to validate the sequence. Thanks.

circDNA-seq • 3.2k views

ADD COMMENT • link 5.9 years ago by abdul.suboor123 • 0

1

Entering edit mode

I think the best way would be to use Python or Perl to manage your text file as you need to.

Please, could you provide example of what you want to do ?

ADD REPLY • link 5.9 years ago by Bastien Hervé 5.9k

0

Entering edit mode

My objective is to validate my circDNA data, If there is any other way please tell me. Can you tell me about the script use for validation through python or perl?

ADD REPLY • link 5.9 years ago by abdul.suboor123 • 0

0

Entering edit mode

What do you mean by validate your circDNA ?

Could you post an example of what you are currently trying :

I put the junction site "five bases from start and five bases from end" of my circDNA sequence read

ADD REPLY • link 5.9 years ago by Bastien Hervé 5.9k

0

Entering edit mode

For example: I have sequenced my circDNA samples of maize crop, I have analysed the data with two softwares CIRI and CIRCexplorer2 and it resulted that I have 150 circDNAs from CIRI and 188 from CIRCexplorer results. Now I want to confirm that if the sequence of circDNA is really a maize line sequence or not, for this purpose through bedtools i visulalize the reads and then I took some junciton reads from the circDNA results, and prepare primers with the help of SnapGene and primer3 plus, I ran PCR with primers and control DNA, perform the sanger sequencing. Now after this I search with notepad++ about the juncntion site presence in the sanger sequence, but I haven't found junction site sequence.

ADD REPLY • link 5.9 years ago by abdul.suboor123 • 0

0

Entering edit mode

I want to confirm that if the sequence of circDNA is really a maize line sequence or not

Did you try using blast ?

This part need a full explanation (note that notepad++, wordpad... should be avoid to read large files as fastq) :

Now after this I search with notepad++ about the juncntion site presence in the sanger sequence, but I haven't found junction site sequence.

Which junction site ? What size ? What do you want to do with these junctions ?

ADD REPLY • link 5.9 years ago by Bastien Hervé 5.9k

0

Entering edit mode

I have tried blast and it shows that the sequence more than 90% belong to maize genome, but I want to validate its cicularity, whether it is circled or not??I know that the sequence is right, but acutally need to confirm the circularity of the DNA sequence which resulted after the application of the above two software. I am confused because my professor and my lab mate she applied it on circRNA, they told me to apply this validation method. And the junction site it the site where the two ends ofthe sequence meet and creat a circle,

ADD REPLY • link 5.9 years ago by abdul.suboor123 • 0

1

Entering edit mode

As an example, if let's say, you know that :

Start of the expected circular DNA : AAAAA

End of the expected circular DNA : CCCCC

Read1 : GCTATATAAAAACCCCCGCTAGCGT

Read2 : AAAAAGCATGCTAGCTATTACCCCC

Read3 : GCATGCAAAAACGTATGCTACCCCC

Read4 : GTCAGTCGATCGATGCGTGTCCCCC

Read5 : AAAAAGTCAGTCGATCGATGCGTGT

Read1 is circular, others aren't ?

ADD REPLY • link 5.9 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Yes you are right, if AAAAACGCGCGCGCCCCC is circDNA sequence, with snapgene when select the circular option, it create a junction site, means it connect the start and end of the circDNA sequence, i.e. junction site from the sequence is CCCCCAAAAA. These ten basis I want to confirm from the sanger sequence but how that's what I want to know????

ADD REPLY • link 5.9 years ago by abdul.suboor123 • 0

0

Entering edit mode

Could you share example data (sanger sequence and output from snapgene)

ADD REPLY • link 5.9 years ago by Bastien Hervé 5.9k

0

Entering edit mode

For Example the SnapGene Sequence is:

CTTGAAGTTATTGATAACATACTCTTAAAAATGACTGAGGAAGAATCTGCTGTGGCCGCTGCTAGCACAGGCACTGAAAAGGGGAAAAAACAAGCTGAAGACATTTTGGAGGGTGAAGATTTCGAATTTCAAGATCTACTTGGGCAAGAGCTGACAGACGCTGAAAAAGCAGAGCTTAAAAGATGTGCCATAGCCTGCGGATATAAGCCAGGGGCTACACTATTTGGTGGGGTTAACGAAGGAAAGCTGAGGTGCCTTCGAAACCGCAGCGAAGCTAAAATTGTTAGAACTCTCTGCAAAAACATAGGCTTGCCAAAGCTGGAAGTGGACCTCTGTCGTTACCAATGGCACCATATCGCCGGAAGTTTGCTTTATGCTAACTTCAAGGTAACAAATATTTTTGCTATTTTATTATTATCTTTTAGGTCGTTTTCTAACGAAGGTCTTTTCGACAGAGCATACTGTTAAGTAAAGTTCTTAAAATGCAACAAGATCTCGAAGAAGAGAAGAACAAAGCCATAATCCAAAATTTGGCTGAAAAGGTTGAAAATTACGAAGCTGATCTGAAAAAGAAGGATTTCACCATCCAAAGCTTCTCGGGGCACAGCCACCGCTTTTCTGAAGGCTGGCTGCACGCATGGAAATATTGTGAACAGACCAAACTTCAGCTTGTCAGCATCAGATCTGATAAATATCCCAAGCCTAGCCCGAAGCATCGGGAATAGATTCATGACCCAAATCTGGGTAAGTGGCGGGCGAAAAATGGCGGGTGACGAAGCTCGAAGTCACCTTAAGCTGGTAAGAAAC

The Junction Site : GAAACCTTGA in this junction site you can see the first 5 is from the end, and 5 base from the start of the SnapGene sequence.

Sanger Sequence:

GGGGACTTTACTATGCTCTGAGTCATTGATATTGAGCTTCGAAATGACAAGGCCTTGTGCCTATGCAGCTGGGCTCCCATGGCCCGTGCCCAAAGAGAATTCAAAAGGGCCCAACCCGAACTCCAAAACAGATCTAAGAGCCATGCTCTTGAAATAAGCATTTTCCACCTCTAGGGTAA.

Now I want to find the junction site in this sanger sequence, those ten bases which i have mentioned above. This is what i want to do.

ADD REPLY • link updated 5.9 years ago by Bastien Hervé 5.9k • written 5.9 years ago by abdul.suboor123 • 0

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

You have a multi fasta file of SnapGene sequences and a corresponding Sanger multi fasta file ?

Do you have some knowlegde in python or perl script language ? Even unix should do the trick I guess.

For each sequence you want an answer junction found/not found in sanger sequence ?

ADD REPLY • link 5.9 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Notepad++ is probably not the right tool for the job. Can you elaborate in which format your NGS and Sanger data are?

ADD REPLY • link 5.9 years ago by WouterDeCoster 47k

0

Entering edit mode

My NGS data is in fastQ format, whereas the sanger sequence data is in EditSeq file format. I have selected some junction reads through bedtools and then make primers for each read, and make sanger sequencing, this procedure I learn from my labmate, she perform it on circRNA data. my data is circDNA data. My objective is to validata my circDNA data, If there is any other way please tell me. I am doing the above process from last one month but yet not result.

ADD REPLY • link 5.9 years ago by abdul.suboor123 • 0

score 0 · Answer 1 · 2019-01-16

0

Entering edit mode

5.9 years ago

abdul.suboor123 • 0

@Bastien Hervé , I have a little knowledge of Python, but I am not good at python. Yes I have SnapGene sequences and sanger sequeced files. Please tell me the trick how I can do that? Yeah for each sequence I need a junction in sanger sequence.

ADD COMMENT • link 5.9 years ago by abdul.suboor123 • 0

0

Entering edit mode

Hi abdul.suboor123,

This reply is better suited as a reply on my answer's comment. Could you make the appropriate change please? That would involve the following steps:

Copy the contents of your reply from this answer (you can edit this answer (Ctrl/Cmd + click the link to open it in a new tab) and do a Select All -> Copy there).
Click on Add Reply on my post here: C: How to perform sequence validation of NGS data?
Paste the copied text
Click on the green Add Comment button
Click on moderate back in your answer here: A: How to perform sequence validation of NGS data?
Choose Delete Post
Click on the blue Submit button.

Thank you!

ADD REPLY • link 5.9 years ago by Bastien Hervé 5.9k