How to get the extended unaligned bases at the end of the read in reference based genome assembly?
1
0
Entering edit mode
5.2 years ago
Kumar ▴ 120

Hi all, I had done a illumina read mapping based on a particular gene sequence (reference sequence), which is highly repetitive in nature by using the following command

bowtie2-build -f ref_gene.fasta ref_index && bowtie2 -p 4 -N 1 -t -x ref_index -1 genome_R1.fastq -2 genome_R2.fastq -S tal.sam && samtools view -bS tal.sam | samtools sort - -o tal.bam && samtools mpileup -uf ref_gene.fasta tal.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fastq && seqtk seq -aQ64 -q20 -n N cns.fastq > contig.fasta

It is works perfectly fine and yielded exactly same length of mapped sequence as ref_gene. However, I need to get both end (reverse and forward end) extended overlapping reads in the assembled contig (Please see the image for better understanding) Is there any way to do it the same? enter image description here Thanks in advance.

Assembly next-gen alignment genome sequencing • 924 views
ADD COMMENT
2
Entering edit mode
5.2 years ago

If I understood your question correctly you are looking at unaligned bases at the end of reads. That means you have to extract the soft clipped bases. Googling for that brings up this thread: extracting the soft clipped seq only from a sam file

ADD COMMENT
0
Entering edit mode

Thank you @WouterDecoster for your suggestion and giving me the proper technical word for the same.

ADD REPLY
0
Entering edit mode

Dear @WouterDeCoster, the tool you have suggested only gives the mapped/aligned reads (exactly same length as reference). It is not giving the unaligned overlapped reads.

ADD REPLY
0
Entering edit mode

I suspect you're looking at the wrong file.

ADD REPLY

Login before adding your answer.

Traffic: 2059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6