Hello community,
I have just started to analyse my data. As it seems it's not that easy to start bioinfomatics with this kind of data. I have been analysing methylation reads but for some reason samtools is giving me this error:
[samopen] SAM header is present: 5724 sequences.
[sam_read1] reference '256' is recognized as '*'.
Parse error at line 5732: unmatched CIGAR operation
Aborted (core dumped)
From the beginning: The fastq file has all the reads from the run and I extract only the reads that start with ''CCGG'' because of the MspI restriction enzyme. Please correct me if any of the commands I am using is wrong.
cat bsq_S1_R1_001.fastq | paste - - - - | awk -F '\t' '(substr($2,1,4)=="CCGG")' | tr "\t" "\n" > bsq_tagged.fastq
After this step, every read in the new fastq file starts with CCGG and looks like this:
@NB501461:9:HGFFTBGX2:1:11101:22748:1061 1:N:0:CGATGT
CCGGGATCCACCTTTACACACCCTCATCCAAGCATGTCAAACTCGGCGCCCGTCTCTGCCGCGTCCACCAGGGA
+
AAAAA#EEEEEEEEEEEAE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
So I trim this fastq file with trim_galore --rrbs and -a '3' adapter_sequence' to bsq_tagged_trimmed.fq
I align the bsq_tagged_trimmed.fq to the reference genome using bowtie2-2.3.1 and I get the bsq_tagged_trimmed.sam file. The .sam file has 5724 headers and then all the aligned scaffolds with the scaffold positions in a format like this:
@HD VN:1.0 SO:unsorted
@SQ SN:Scaffold LN:13623339
.
.
and so on
Then I use samtools view -b -S -o out.bam in.sam
and I get the error.
I have been using for some weeks now these parameters for alignment and trimming and I never had any problem. Thing changed: Since I update the bowtie2 to version 2.3.1 everything is going bad. I also use samtools 0.1.19 but when I ran the example files from bowtie2 directory, samtools works fine. Could this be compatibility issue of trim_galore and cutadapt with the latest bowtie2 version? How can I see the line 5732 that has unmatched CIGAR operation?
That error is indicative of wrong genome reference file. Check that you are using right reference in bowtie - especially '256' looks like a weird name for reference.
try to see the 5732th line of sam by
head
(also do another check by taking the number of header lines into account)I will try to run one more time my pipeline with a different genome as a reference. I am working with a non-model organism and the genome is just in scaffolds. :(
You should also upgrade to latest
samtools
. At this time 0.1.19 is really outdated.I am not the only one using the server and several people probably need this version of samtools to analyse their data. Since the example file from bowtie2 is running on samtools then i guess the version is not the problem. I will ask if it is possible to update but if I can not really prove that the samtools version causes errors then I will probably get a NO. :/
There's no reason that they can't provide multiple versions, I think I give my people 5 versions of samtools to choose from...
You can always install it just for yourself as user. Conda is great for that, too.