I have downloaded a specified region as Reference Genome from UCSC (because only this region is important for me) and aligned my reads against it using Bowtie2.
But I do not know why in my .bam file there are other regions with start position of 150864, 539750 and ... .
The Ref Genome (specified coordinate) that downloaded from UCSC.
hg19_dna range=chrX:146949598-147161084 5'pad=0 3'pad=0 strand=+ repeatMasking=none
2 first line of bam file (output of bowtie2)
D00823:89:HMHHMBCX2:1:2111:4641:60710 99 hg19_dna 150864 40 48M = 150946 130 CCTGCCTCAGCCTCCTGAGTAGCTGGGACTATAGGTGCCCGCCACCAT DDHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-12 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:31C15C0 YS:i:-5 YT:Z:CP
D00823:89:HMHHMBCX2:2:1116:19382:32083 99 hg19_dna 539750 42 48M = 540015 313 CCTATGGTAGCTCTGGCCAGCGGAACTTCCTGCATCCATGGAGCAATG DDIIHIHIIHIIIIFEHIIIIIIIHIIIIIIHHHIIIHIIIIGIIII? AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:48 YS:i:0 YT:Z:CP
You've downloaded the chrX reference from position
146949598
to147161084
. This is your reference sequence. Bowtie2 does not know that you cropped the reference sequence, the reference start at 1 for the software. If you want the position on the chrX you have to take the full chrX sequence or to add146949598
to every hitThanks @ Bastien for your reply.
you mean the first position in my data is 146949598 + 150864 = 147100462
but why my last start position when I add 146949598 to it is bigger than my Ref Genome range?
my last start position is 2390138 + 146949598 = 149339736
Could you please give me the length of your reference sequence ?
it is 4231 lines. (if I did right: cat ref.fa | wc -l)
I was expecting the number of nucleotide in your reference sequence... Anyway, let's assume you have a fasta format with 60 character per line, minus the header
4230 * 60 = 253 800 bases
You've downloaded a sequence that is 211 486 bases long
How did you end up with a hit around 2 000 000 ??
Could you please try to download the full chrX and restart your alignment, this will avoid the misunderstanding
Thanks. I have done it and the head and tail of mt output are as below:
Head:
tail:
Now, would you please let me know how can I crop my specific regions?
Thanks.
i run it but I got errore :
Then I run it: samtools index uniqu_fmr.bam
but I got this error: