Hello everyone,
i have paired-end data, after alignment I have noticed that R1 and R2 have different read names in BAM file, for example:
SRR1032070.122660125.1
SRR1032070.122660125.2
so read R1 has extension .1
and read R2 has extension .2
.
This causes a problem when I try to convert BAM to BED file with bedtools bamtobed -bedpe -i.
So the only solution I can think of is to remove these extensions. Could anyone please advice on the tool, I would not like to convert data back to SAM, as it is extremely large!!!
Thank you
I assume you got this data from SRA? You should have used
-F|--origfmt Defline contains only original sequence name
option to avoid getting these kind of read names.As for adding
/1 /2
to read names you could usereformat.sh
from BBMap suite with theaddslash=t
oraddcolon=t
options.ohhh ok, so you mean when I convert SRA to FATSQ with
fastq-dump -F
useF
optionThank you