Entering edit mode
7.3 years ago
db
•
0
I have run blast on assembled contigs (results of MEGAHIT) and obtained the output for 5 different samples. But I am not sure how I should normalize the output for comparing the five samples? Should I normalize it using the number of reads in the original raw sequences?
To provide more information:
- I am looking to compare counts of antibiotic resistance genes between samples
- The raw reads are from HiSeq (2x125bp PE) sequencinf of DNA from environmental samples
- I first used MEGAHIT to assemble the raw reads and then used BLAST with assembled contigs as the query
EDIT: I can't add more replies anymore so I will edit the question itself to reply.
What do you mean by "normalize" BLAST results? Are you looking to get a non-redundant result set?
I am looking to compare gene counts between samples.
EDIT: I can't add more replies anymore so I will edit the question itself to reply.
Is it NGS sequencing ?
Yes, the raw reads are from HiSeq (2x125bp PE)
Where does BLAST fit in? Did you use that to align the data?
I agree you should use bwa :)
I first used MEGAHIT to assemble the raw reads and then used BLAST with assembled contigs as the query.
If you are interested in counts, it may be best to align the data using an NGS aligner and then use featureCounts along with a GFF file to do the counting.Otherwise I am not sure how you are going to get counts from BLAST results (which are likely in form of HSP). Did you collect the results in tabular format?
Edit: This appears to be a metagenomics experiment (since MEGAHIT was used). Are you wanting to count at the level of organism/species/genes and counts of what?
count antibiotic resistance genes
What format did you collect your BLAST results in and what database did you blast against?
BLAST results are in .csv format (-outfmt 10). I blasted against database I created with
makeblastdb
using one of the fasta files from CARD (https://card.mcmaster.ca/download)A bit late to the party. But out of curiosity, how did you normalize the results? I also created a blastdb from CARD and I am now busy with comparing the results.
Is it an RNAseq experiment or just genome sequencing?
It is metagenomic sequencing (DNA from environmental samples).
yes then you can align your reads against your contigs if you want to describe you population
What tool would you recommend for good speed?
you can use bwa , but you have to be care if you have close species/genes a same read can be aligned against multiple contigs