Hi,
I am very new to NGS analysis.
I have used BWA for mapping. My reference genome is a set of simulated long reads of E.Coli (I have used Badreads tool for this) and I have mapped short reads to this. Now I want to know for each long read which short reads have been mapped to it. My short reads are a fasta file looking like this:
>suff_extn_AAGTGGCGGTGATTGGCGCTGGGCCTGCAGG_0
GTTAGGGTGTG
>suff_extn_AAGTGGCGGGTATTCCCGTGGTGGAACTGAT_0
GGACAGCAAGT
>suff_extn_AAGTGGTAACGATGGTCTGGAAGGTGTCAGC_0
TACATAC
I want something like:
suff_extn_AAGT........._0 mapped to LongRead1
suff_extn_AAGC........._0 mapped to LongRead1
.
.
.
My ultimate goal is to see what are the set of short reads mapped to a certain LongRead. Is there any way to do this?
Thank you.
if you parse the output file of BWA (bam? sam?) you would get there without too much hassle.
(some basic
awk
will already get you there I think)