Hello biostars! I downloaded fasta files from http://www.ncbi.nlm.nih.gov/Traces/trace.cgi (mouse genome traces) There are files with 'clip'-prefix, i'm not sure, but is it primers\adapters? Can't find any documentation about this files. So, I want to make clipping and trim my traces according to coordinates from 'clip'-files. After googling, i didn't find any tool for that. All tools are for trimming NGS data. My question: is there any tool for clipping or I need to write my own script? I'm newbie in programming (beginner in python) and have absolutely no idea how to write such a script.
Summary: I have 'clip' file, which looks like
TI CLIP_LEFT CLIP_RIGHT
1101188317 0 576
1101188318 19 734
1101188319 6 742
1101188320 16 809
And 'trace' file, which looks like simple fasta
>1101188317
ATGCAT...all reads are ~1660 b.p. long.
>1101188318
...
>1101188319
...
>1101188320
...
Problem is following:
- i don't understand numbers in clip file (f.ex. "clip right" is the right coordinate of what?)
- it's not clear for me what does it mean 'clip'.
- if numbers are something like coordinates of adapters i need to make trimming (trim sequences in fasta file)
- All tools are for NGS data, but this datasets are from sanger sequensing, so i don't know the adapter sequence, i know just coordinates (if this numbers are coordinates)
I am not sure about your first question. For the second question there are many tools available to perform clipping of the adapters. Use the search box in Biostar and search for 'clip adapters' or 'remove adapters'. You will find lot of informative posts.
Is NGS clipping different to clipping sanger sequencing traces? I think that you can use the same tools like cutadapt or fastx to clip your traces.
I edited my question. As i understand NGS clipping, tools need adapter sequences. But in the case of traces i have only coordinates of them (if this numbers are coordinates)
You can use substring function in python.
1) Read each fasta sequence in a string
2) Extract the desired string. If x stores the string then x[y:z] should give the sequence. y and z are left and right coordinates that you already have in the other file.
Thank you! I will try It.