How to see set of short reads mapped to a particular long read
1
0
Entering edit mode
4.4 years ago
Ashi ▴ 20

Hi,

I am very new to NGS analysis.

I have used BWA for mapping. My reference genome is a set of simulated long reads of E.Coli (I have used Badreads tool for this) and I have mapped short reads to this. Now I want to know for each long read which short reads have been mapped to it. My short reads are a fasta file looking like this:

>suff_extn_AAGTGGCGGTGATTGGCGCTGGGCCTGCAGG_0
GTTAGGGTGTG

>suff_extn_AAGTGGCGGGTATTCCCGTGGTGGAACTGAT_0
GGACAGCAAGT

>suff_extn_AAGTGGTAACGATGGTCTGGAAGGTGTCAGC_0
TACATAC

I want something like:

suff_extn_AAGT........._0 mapped to LongRead1
suff_extn_AAGC........._0 mapped to LongRead1
.
.
.

My ultimate goal is to see what are the set of short reads mapped to a certain LongRead. Is there any way to do this?

Thank you.

Mapping BWA • 877 views
ADD COMMENT
0
Entering edit mode

if you parse the output file of BWA (bam? sam?) you would get there without too much hassle.

(some basic awk will already get you there I think)

ADD REPLY
3
Entering edit mode
4.4 years ago
GenoMax 147k

I assume you have BAM files from these alignments. You could simply parse them using the read name (which will be in column 1) and the long read it is aligned to in column 3 (if the read is not aligned this field will have a *).

Something like this:

samtools view your.bam | awk -F "\t" '{OFS="\t"}{print $1,$3}'

You can then sort the result file on column 2 to get the info you need.

Note: If you have a SAM format file then just use the awk part with your file.

ADD COMMENT
0
Entering edit mode

Thank you so much @genomax. I added the $10 also to see the read which has been mapped.

ADD REPLY
0
Entering edit mode

You can accept the answer to provide closure to this thread (green checkmark).

ADD REPLY

Login before adding your answer.

Traffic: 1392 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6