Hello,
I am currently working with SOLiD RNA-seq NGS data from 2013. I was provided 48 .csfasta and .qual files from my PI (24 F3 files and 24 F5 files)- I used galaxy's solid2fastq tool (called Convert SOLiD output to fastq: https://usegalaxy.org/?tool_id=solid2fastq&version=1.0.0&__identifer=3y01dzhi3ya) to create 24 .fastqcssanger files (12 F3 and 12 F5). I ran a FastQC on these files and almost all of them had traces of ABI Solid3 Adapter B in them in the overepresented sequences. I want to remove these adapters, naturally, before moving on to alignment.
I was able to find the sequence of this adapter on https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt but I also was given a sequence for it on galaxy through FastQC: which sequence should I use in an adapter removal program?
My next question was which tool should I use to trim this SOLiD adapter? I have been trying cutadapt for quite a while now and I keep getting an invalid syntax error over and over (SyntaxError: invalid syntax). I have been trying the following code and I have been unable to get any success:
1) "/usr/bin/python" to initialize our Python (the system properly loads Python)
And then one of the following (note below that the adapter sequence I have been entering has been from the fastqc website, not the one from the FastQC overrepresented sequence- hence the above question) :
2a) "cutadapt -c -a CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT '/home/WMurphy/Documents/Chondrocyte NGS/I3_2013_02_01_F5-RNA.csfasta' '/home/WMurphy/Documents/Chondrocyte NGS/I3_2013_02_01_F5-RNA.QV.qual' > output.fastq"
- 2b) "cutadapt -c -a CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT '/home/WMurphy/Documents/Chondrocyte NGS/I3_2013_02_01_F5-RNA.csfasta' '/home/WMurphy/Documents/Chondrocyte NGS/I3_2013_02_01_F5-RNA.QV.qual'"
- 2c) "cutadapt -c -a CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT '/home/WMurphy/Desktop/I3_F5.fastqcssanger'"
- 2d) "cutadapt -c -a CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT '/home/WMurphy/Desktop/I3_F5.fastqcssanger' > output.fastq".
Nothing has worked. I have been rereading http://cutadapt.readthedocs.io/en/stable/colorspace.html over and over, and I still don't know what I am doing wrong. Any and all help would be immensely appreciated!
Have a great day!
Sincerely,
bioinformaticsfilesdrive
Are you actually quoting your commands? Or are those just for this forum quoted?
Thanks for the reply- they are just quoted on the forum
And the single quotes around paths? You don't need those.
Those single quotes are from me dragging the file into the terminal instead of typing it all out- i'll try it without those quotes. Which code sequence should I enter? 2a, 2b, 2c, or 2d?
EDIT: I tried all of them, with removing the single quotes too, and still received the same syntax error for all of them.