Hi,
I'm trying to combine the FASTA headers from my reads file with the matched headers from my BAM file and the numbers of mapped reads. I have a FASTA file with 15k reads that I aligned against an other FASTA file containing 1M markers.
I extracted the mapped reads using:
samtools view -F4 markers.sorted.bam > markers.mapped.bam
But running:
samtools view -H markers.mapped.bam
returns a lot more seqs than expected, I probably don't fully grasp the BAM format.
@HD VN:1.0 SO:coordinate
@SQ SN:gi|345004010|ref|NC_015954.1|:c1336247-1335993 LN:255
@SQ SN:gi|345004010|ref|NC_015954.1|:2775135-2775383 LN:249
@SQ SN:gi|345004010|ref|NC_015954.1|:c1686708-1686454 LN:255
@SQ SN:gi|345004010|ref|NC_015954.1|:c419000-417852 LN:1149
@SQ SN:gi|345004010|ref|NC_015954.1|:c488066-486981 LN:1086
@SQ SN:gi|345004010|ref|NC_015954.1|:2347131-2347793 LN:663
.. etc ..
The mapping was done with GraphMap and output as SAM I converted it to BAM via:
samtools view -bS markers.sam > markers.bam
Finally I'm looking for a way to count the number of mapped reads per marker.
Thanks.
I find it very upsetting when people ask questions about data but do not share the data. It is like being told about a party i'm not invited too.
Please post the header, or a snippit of it :)
EDIT: Also would be nice to know what tool did the mapping, although i hope it will be obvious from the header.
Oh dear, he knows about the party :-/
Hahah - i'll bring a data salad and a 6-pack of regex beer.
Could you give me time.sleep(300), I'm reading the man pages.
Hahaha, i can't believe such a publication exists! And copyright no less. You can't tell other bioinformaticians these secrets without attribution!
Let's organize a biostars new years party. Can you make && make install the invitations?
I updated the post with some sample data
Nice one - so yeah as abascalfederico says, you should have around 1 million 'contigs' in there, which isn't giong to be a bag of fun to deal with in anything other than SQL. I wrote a tool that might help, but I have also never asked it to deal with a million anythings before, so we'll see. The good news is that it's easy to use.
python /path/to/SeQC/SeQC.js.py --analysis RNAME --input /path/to/your.bam
htop
if you have it installed.Checking index on 633a3b0d8ab37f7cac048fa31d8ca554_RNAME
sqlite3 /path/to/myProject.SeQC
.headers on
, then.mode columns
, and finally:SELECT * from '633a3b0d8ab37f7cac048fa31d8ca554_RNAME' order by counts DESC limit 10;
Of course you can then write it out to CSV or whatever you want, but that should get you started.
Oh, if your BAM isn't already filtered, used
--analysis RNAME FLAG
, and then we can filter it in SQLAnd here I was hoping for an easy fix. I will try your tool tomorrow and report back. Thanks.
Did you ever get it to work? :)