I have done analysis of circDNA NGS data, now I am validating the circDNA sequence reads with sanger sequence, but this is my 5th time I am trying on notepad++, in "find" option I put the junction site "five bases from start and five bases from end" of my circDNA sequence read, but it doesn't match. Please tell me the better option to validate the sequence. Thanks.
I think the best way would be to use Python or Perl to manage your text file as you need to.
Please, could you provide example of what you want to do ?
My objective is to validate my circDNA data, If there is any other way please tell me. Can you tell me about the script use for validation through python or perl?
What do you mean by
validate
your circDNA ?Could you post an example of what you are currently trying :
For example: I have sequenced my circDNA samples of maize crop, I have analysed the data with two softwares CIRI and CIRCexplorer2 and it resulted that I have 150 circDNAs from CIRI and 188 from CIRCexplorer results. Now I want to confirm that if the sequence of circDNA is really a maize line sequence or not, for this purpose through bedtools i visulalize the reads and then I took some junciton reads from the circDNA results, and prepare primers with the help of SnapGene and primer3 plus, I ran PCR with primers and control DNA, perform the sanger sequencing. Now after this I search with notepad++ about the juncntion site presence in the sanger sequence, but I haven't found junction site sequence.
Did you try using blast ?
This part need a full explanation (note that notepad++, wordpad... should be avoid to read large files as fastq) :
Which junction site ? What size ? What do you want to do with these junctions ?
I have tried blast and it shows that the sequence more than 90% belong to maize genome, but I want to validate its cicularity, whether it is circled or not??I know that the sequence is right, but acutally need to confirm the circularity of the DNA sequence which resulted after the application of the above two software. I am confused because my professor and my lab mate she applied it on circRNA, they told me to apply this validation method. And the junction site it the site where the two ends ofthe sequence meet and creat a circle,
As an example, if let's say, you know that :
Start of the expected circular DNA :
AAAAA
End of the expected circular DNA :
CCCCC
Read1 :
GCTATATAAAAACCCCCGCTAGCGT
Read2 :
AAAAAGCATGCTAGCTATTACCCCC
Read3 :
GCATGCAAAAACGTATGCTACCCCC
Read4 :
GTCAGTCGATCGATGCGTGTCCCCC
Read5 :
AAAAAGTCAGTCGATCGATGCGTGT
Read1 is circular, others aren't ?
Yes you are right, if AAAAACGCGCGCGCCCCC is circDNA sequence, with snapgene when select the circular option, it create a junction site, means it connect the start and end of the circDNA sequence, i.e. junction site from the sequence is CCCCCAAAAA. These ten basis I want to confirm from the sanger sequence but how that's what I want to know????
Could you share example data (sanger sequence and output from snapgene)
For Example the SnapGene Sequence is:
The Junction Site :
GAAACCTTGA
in this junction site you can see the first 5 is from the end, and 5 base from the start of the SnapGene sequence.Sanger Sequence:
Now I want to find the junction site in this sanger sequence, those ten bases which i have mentioned above. This is what i want to do.
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you!
You have a multi fasta file of SnapGene sequences and a corresponding Sanger multi fasta file ?
Do you have some knowlegde in python or perl script language ? Even unix should do the trick I guess.
For each sequence you want an answer junction found/not found in sanger sequence ?
Notepad++ is probably not the right tool for the job. Can you elaborate in which format your NGS and Sanger data are?
My NGS data is in fastQ format, whereas the sanger sequence data is in EditSeq file format. I have selected some junction reads through bedtools and then make primers for each read, and make sanger sequencing, this procedure I learn from my labmate, she perform it on circRNA data. my data is circDNA data. My objective is to validata my circDNA data, If there is any other way please tell me. I am doing the above process from last one month but yet not result.