I have a plamid DNA sequenced using MiSeq PE250 with read count at 1.3 million reads. I am using different softwares to align these data. My final aim is to analyse different resequencing methodologies and validate them for future use.
I first trimmed the raw data with Trim-Galore (One level trimming) and then with Sickle (Two level trimming). When I align one level trimmed data with the reference and generate a coverage file using samtools mpileup, the coverage at position 1 is 0.
When I use two level trimmed data to align and generate coverage the coverage for position 1 in this case is 32X.
I don't understand how can the coverage for position 1 be zero with less trimming (only trim galore) and 32X with more trimmming (Trim Galore + Sickle). Following are the command which I used.
Trim Galore
trim_galore -q 30 --length 50 --no_report_file --paired read1 read2
Sickle
sickle pe --gzip-output -q 30 -l 50 -t sanger -f read1.trimmed -r read2.trimmed -o read1.sickle -p read2.sickle -s single.sickle
Alignment using Bowtie2:
bowtie2 --local -x SampleName -1 read1.sickle -2 read2.sickle -I 0 -X 1000 --fr -p 16 -S SampleName.sam
Then i sorted and indexed SAM file using Samtools sort
and index
command.
Then I performed BAQ recalibration using calmd
. Indexed the resultant BAM file again. Used mpileup
command to generate coverage for the entire sequence.
samtools mpileup -d 10000000 -aa -f ref.fasta SampleName.sorted.bam > SampleName.mpileup
cut -f1-4 SampleName.mpileup > SampleName.depth
Can anyone please explain why I received 0 coverage at position 1 with less trimming?
Since your plasmid must be circular you perhaps need to account for that when aligning: Aligning Circular Sequences
Hi Genomax, Thank you for suggestion!!! Although, why was the first position mapped in the other case (two level trim) but not in one level trim?
I will definitely map using concatenated Plasmid sequences but was curious as to why I got the previous results.