Hi, I am new to the sequencing field and was trying to align a targeted sequencing (most of the reads have a length range from 90 to 150 bp) to a reference genome.
I did some preprocess such as trimming the adapter and trimming some barcode seq from R1, then I tried to align R1 and R2 to the reference using bowtie2, the summary is as shown in the figure:
The overall alignment rate looks ok but I really concern about the time of aligned concordantly, which I think there are a lot of reads that were not aligned.
Also, I got a lof warnings such as:
which I think is because trimming the sequence produces a lot of short reads, however, I filter out short read that are less than 20 bp with cutadapt
, but the warnings are still there.
Is anyone familiar with targeted sequencing can tell me if this is normal or not? And any suggestion to improve the alignment would be really appreciated! Thanks!
Thanks for reply. Yes, I used the whole genome of hg19 as reference. I did the FASTQC and the Adapt Content check shows there might be Illumina Universal Adapter:
I checked the adapter which is AGATCGGAAGAG, then I used
cutadapt
to trim it from 3'-end from both R1 and R2 sequence:cutadapt -a AGATCGGAAGAG -o R1.trimmed.fastq.gz -p R2.trimmed.fastq.gz R1.fastq.gz R2.fastq.gz
then I find there are repeated sequences near the 5'-end in R1, then trim this from R1 only:
cutadapt -g SEPCIFIC_SEQ -o R1.trimmed1.fastq.gz R1.trimmed.fastq.gz
I did not find any pattern seq in R2, so I did not do anything to R2.trimmed.fastq.gz.
I also aligned R1 and R2 separately to the reference genome, from the summary, I found the R1 trimmed adapter and barcode looks normal, but R2 is still bad (the percentage of aligned 0 times):
Your R1 trimmed and R2 trimmed have the same percentages. Whatever you trimmed from read1 worked, there must be something you read through on read 2 that you need to trim.
Yes, that was what I thought, R1 trimmed adapter and barcode aligned to the reference well (at least I think so). At this stage, I only have enough information to trim the adapter from R2 and there is no other information about what else I should trim from R2 again. I feel there must be something else that needs to be trimmed from R2 to make most of it align to the reference, but I do not know how to identify such information, do you have any suggestion? Many thanks!!