I have the sequence of some primers and I need to obtain the Start and End coordinates of these primers.
I have been given two files
p1.fa
>TP53_02
TTGGAAGTGTCTCATGCTG
>TP53_03
ATGGGACTGACTTTCTGCTC
>TP53_04a
tctgactgctcttttcaccc >TP53_04b
ACCAGCAGCTCCTACACC
>TP53_05
TGCCCTGACTTTCAACTCTG
>TP53_06
CCCAGGCCTCTGATTCC
p2.f
>TP53_02
GGCCTGCCCTTCCAATG
>TP53_03
CCAGCCCAACCCTTGTC
>TP53_04a
AGGGACAGAAGATGACAGGG
>TP53_04b
GGCCAGGCATTGAAGTC
>TP53_05
AGCCCTGTCGTCTCTCCAG
>TP53_06
AGGGCCACTGACAACCAC
I also have the code as shown below
#!/bin/bash
set -euo pipefail
module load bwa
module load bedtools
module load samtools
# map primers to reference sequence
echo "RUNNING BWA MEM"
bwa mem \
-k 8 \
./Homo_sapiens_assembly38.fasta \
p1.fa \
p2.fa | \
samtools view \
-b > out.bam
samtools view -h -o out.sam out.bam
# convert BAM to bedpe format
echo "Converting BAM to bedpe format"
bedtools \
bamtobed \
-i out.bam \
-bedpe > primers.txt
echo "END"
The output of BWA is
[M::bwa_idx_load_from_disk] read 3171 ALT contigs
[M::process] read 36 sequences (732 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 2, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 36 reads in 0.081 CPU sec, 0.080 real sec
[main] Version: 0.7.17-r1188
and the BAM/SAM obtained doesn't show any mapping
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.16.1 CL:samtools view -h -o out.sam out.bam
TP53_02 77 * 0 0 * * 0 0 TTGGAAGTGTCTCATGCTG * AS:i:0 XS:i:0
TP53_02 141 * 0 0 * * 0 0 GGCCTGCCCTTCCAATG * AS:i:0 XS:i:0
TP53_03 77 * 0 0 * * 0 0 ATGGGACTGACTTTCTGCTC * AS:i:0 XS:i:0
TP53_03 141 * 0 0 * * 0 0 CCAGCCCAACCCTTGTC * AS:i:0 XS:i:0
TP53_04a 77 * 0 0 * * 0 0 TCTGACTGCTCTTTTCACCC * AS:i:0 XS:i:0
TP53_04a 141 * 0 0 * * 0 0 AGGGACAGAAGATGACAGGG * AS:i:0 XS:i:0
TP53_04b 77 * 0 0 * * 0 0 ACCAGCAGCTCCTACACC * AS:i:0 XS:i:0
TP53_04b 141 * 0 0 * * 0 0 GGCCAGGCATTGAAGTC * AS:i:0 XS:i:0
TP53_05 77 * 0 0 * * 0 0 TGCCCTGACTTTCAACTCTG * AS:i:0 XS:i:0
TP53_05 141 * 0 0 * * 0 0 AGCCCTGTCGTCTCTCCAG * AS:i:0 XS:i:0
TP53_06 77 * 0 0 * * 0 0 CCCAGGCCTCTGATTCC * AS:i:0 XS:i:0
TP53_06 141 * 0 0 * * 0 0 AGGGCCACTGACAACCAC * AS:i:0 XS:i:0
TP53_07 77 * 0 0 * * 0 0 GCCACAGGTCTCCCCAAG * AS:i:0 XS:i:0
TP53_07 141 * 0 0 * * 0 0 GTGTGCAGGGTGGCAAGTG * AS:i:0 XS:i:0
TP53_08 77 * 0 0 * * 0 0 AGGACCTGATTTCCTTACTG * AS:i:0 XS:i:0
TP53_08 141 * 0 0 * * 0 0 ATCTGAGGCATAACTGCAC * AS:i:0 XS:i:0
TP53_09 77 * 0 0 * * 0 0 CAGTTATGCCTCAGATTCAC * AS:i:0 XS:i:0
TP53_09 141 * 0 0 * * 0 0 CCACTTGATAAGAGGTCCC * AS:i:0 XS:i:0
TP53_10 77 * 0 0 * * 0 0 GTACTGTGTATATACTTACTTCTCC * AS:i:0 XS:i:0
TP53_10 141 * 0 0 * * 0 0 TGGCCAGGAAGGGGCTGAG * AS:i:0 XS:i:0
TP53_11 77 * 0 0 * * 0 0 TCTCACTCATGTGATGTCATC * AS:i:0 XS:i:0
TP53_11 141 * 0 0 * * 0 0 CTGTCAGTGGGGAACAAGAA * AS:i:0 XS:i:0
FLT3_TKD1N676 77 * 0 0 * * 0 0 CTGGGAAGCCACGAGAATATTG * AS:i:0 XS:i:0
FLT3_TKD1N676 141 * 0 0 * * 0 0 GTTACCTGACAGTGTGCACG * AS:i:0 XS:i:0
FLT3_TKD1F691 77 * 0 0 * * 0 0 GAAACTGTAACTATTTCAGGACCA * AS:i:0 XS:i:0
FLT3_TKD1F691 141 * 0 0 * * 0 0 TTGAGAAGATCACCATAGCAAC * AS:i:0 XS:i:0
FLT3_TKD2 77 * 0 0 * * 0 0 AAAGTGGTGAAGATATGTGACT * AS:i:0 XS:i:0
FLT3_TKD2 141 * 0 0 * * 0 0 TCACATTGCCCCTGACAAC * AS:i:0 XS:i:0
NPM1_860 77 * 0 0 * * 0 0 ATGTCTATGAAGTGTTGTGGTTCC * AS:i:0 XS:i:0
NPM1_860 141 * 0 0 * * 0 0 TGTTACAGAAATGAAATAAGACGG * AS:i:0 XS:i:0
IDH2_R140 77 * 0 0 * * 0 0 ATGTGGAAAAGTCCCAATGGAAC * AS:i:0 XS:i:0
IDH2_R140 141 * 0 0 * * 0 0 TGTTTTTGCAGATGATGGGCTC * AS:i:0 XS:i:0
IDH2_R172 77 * 0 0 * * 0 0 GACCAAGCCCATCACCATTG * AS:i:0 XS:i:0
IDH2_R172 141 * 0 0 * * 0 0 GCCTACCTGGTCGCCATGG * AS:i:0 XS:i:0
IDH1_R132 77 * 0 0 * * 0 0 GTCTTCAGAGAAGCCATTATCTGC * AS:i:0 XS:i:0
IDH1_R132 141 * 0 0 * * 0 0 CAAAATCACATTATTGCCAACATGA * AS:i:0 XS:i:0
and consequently, the bed file doesn't show anything.
. -1 -1 . -1 -1 TP53_02 0 . .
. -1 -1 . -1 -1 TP53_03 0 . .
. -1 -1 . -1 -1 TP53_04a 0 . .
. -1 -1 . -1 -1 TP53_04b 0 . .
. -1 -1 . -1 -1 TP53_05 0 . .
. -1 -1 . -1 -1 TP53_06 0 . .
. -1 -1 . -1 -1 TP53_07 0 . .
. -1 -1 . -1 -1 TP53_08 0 . .
. -1 -1 . -1 -1 TP53_09 0 . .
. -1 -1 . -1 -1 TP53_10 0 . .
. -1 -1 . -1 -1 TP53_11 0 . .
. -1 -1 . -1 -1 FLT3_TKD1N676 0 . .
. -1 -1 . -1 -1 FLT3_TKD1F691 0 . .
. -1 -1 . -1 -1 FLT3_TKD2 0 . .
. -1 -1 . -1 -1 NPM1_860 0 . .
. -1 -1 . -1 -1 IDH2_R140 0 . .
. -1 -1 . -1 -1 IDH2_R172 0 . .
. -1 -1 . -1 -1 IDH1_R132 0 . .
What I am doing wrong?? It should work because I used to do this task before with the same code and input (I believe) and that worked.
This is the perfect use case for
UCSC blat
: https://genome.ucsc.edu/cgi-bin/hgBlat?command=start