Question

How to see set of short reads mapped to a particular long read

0

Entering edit mode

4.4 years ago

Ashi ▴ 20

Hi,

I am very new to NGS analysis.

I have used BWA for mapping. My reference genome is a set of simulated long reads of E.Coli (I have used Badreads tool for this) and I have mapped short reads to this. Now I want to know for each long read which short reads have been mapped to it. My short reads are a fasta file looking like this:

>suff_extn_AAGTGGCGGTGATTGGCGCTGGGCCTGCAGG_0
GTTAGGGTGTG

>suff_extn_AAGTGGCGGGTATTCCCGTGGTGGAACTGAT_0
GGACAGCAAGT

>suff_extn_AAGTGGTAACGATGGTCTGGAAGGTGTCAGC_0
TACATAC

I want something like:

suff_extn_AAGT........._0 mapped to LongRead1
suff_extn_AAGC........._0 mapped to LongRead1
.
.
.

My ultimate goal is to see what are the set of short reads mapped to a certain LongRead. Is there any way to do this?

Thank you.

Mapping BWA • 875 views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 4.4 years ago by Ashi ▴ 20

0

Entering edit mode

if you parse the output file of BWA (bam? sam?) you would get there without too much hassle.

(some basic awk will already get you there I think)

ADD REPLY • link 4.4 years ago by lieven.sterck 15k

score 3 · Accepted Answer · 2020-07-07

3

Entering edit mode

4.4 years ago

GenoMax 147k

I assume you have BAM files from these alignments. You could simply parse them using the read name (which will be in column 1) and the long read it is aligned to in column 3 (if the read is not aligned this field will have a *).

Something like this:

samtools view your.bam | awk -F "\t" '{OFS="\t"}{print $1,$3}'

You can then sort the result file on column 2 to get the info you need.

Note: If you have a SAM format file then just use the awk part with your file.

ADD COMMENT • link 4.4 years ago by GenoMax 147k

0

Entering edit mode

Thank you so much @genomax. I added the $10 also to see the read which has been mapped.