Hi!
I have nanopore sequencing results in fastq format. I need to know the coverage of specific genes. To do this, I aligned the sequence using minimap2 and converted the result to bam format. I want to use the Samtools view to extract the gene by coordinate. Then I plan to use the Samtools depth.
The command to run after alignment looks like this:
samtools view -h myFile.sam -b -o fastq_pass/myFile.bam
samtools sort myFile.bam -o myFile_sort.bam
samtools index myFile_sort.bam
samtools view myFile_sort.bam chr22:41865129-41924993 > myGene.bam
samtools depth myGene.bam > myGene.bam_bepth
I want to extract the sequence from region chr22:41865129-41924993 . But when opening a file with a gene coverage depth, the start and end coordinates are shifted:
chr22 41863407 1
chr22 41863408 1
chr22 41863409 1
...
...
...
chr22 41932508 1
chr22 41932509 1
chr22 41932510 1
chr22 41932511 1
Could someone please tell me what I'm doing wrong.
I appreciate your help! Can I use the Samtools view to get only the necessary region, if now I need to extrect the sequence?
samtools is already returning the reads with alignments overlapping your region of interest.
If a read is 100bp long and starts at position 1 with a perfect alignment, then it aligns to bases: [1-100] inclusive. If you ask for reads that overlap positions 100-200, samtools is going to return the read that starts at position 1 because it aligns to position 100 which is inside the [100-200] interval you specified.