Normalizing BLAST results

0

Entering edit mode

7.3 years ago

db • 0

I have run blast on assembled contigs (results of MEGAHIT) and obtained the output for 5 different samples. But I am not sure how I should normalize the output for comparing the five samples? Should I normalize it using the number of reads in the original raw sequences?

To provide more information:

I am looking to compare counts of antibiotic resistance genes between samples
The raw reads are from HiSeq (2x125bp PE) sequencinf of DNA from environmental samples
I first used MEGAHIT to assemble the raw reads and then used BLAST with assembled contigs as the query

EDIT: I can't add more replies anymore so I will edit the question itself to reply.

blast • 2.6k views

ADD COMMENT • link updated 7.2 years ago by Biostar 20 • written 7.3 years ago by db • 0

0

Entering edit mode

What do you mean by "normalize" BLAST results? Are you looking to get a non-redundant result set?

ADD REPLY • link 7.3 years ago by GenoMax 147k

0

Entering edit mode

I am looking to compare gene counts between samples.

EDIT: I can't add more replies anymore so I will edit the question itself to reply.

ADD REPLY • link 7.3 years ago by db • 0

0

Entering edit mode

Is it NGS sequencing ?

ADD REPLY • link 7.3 years ago by Titus ▴ 910

0

Entering edit mode

Yes, the raw reads are from HiSeq (2x125bp PE)

ADD REPLY • link 7.3 years ago by db • 0

1

Entering edit mode

Where does BLAST fit in? Did you use that to align the data?

ADD REPLY • link 7.3 years ago by GenoMax 147k

0

Entering edit mode

I agree you should use bwa :)

ADD REPLY • link 7.3 years ago by Titus ▴ 910

0

Entering edit mode

I first used MEGAHIT to assemble the raw reads and then used BLAST with assembled contigs as the query.

ADD REPLY • link 7.3 years ago by db • 0

0

Entering edit mode

~~If you are interested in counts, it may be best to align the data using an NGS aligner and then use featureCounts along with a GFF file to do the counting.~~

Otherwise I am not sure how you are going to get counts from BLAST results (which are likely in form of HSP). Did you collect the results in tabular format?

Edit: This appears to be a metagenomics experiment (since MEGAHIT was used). Are you wanting to count at the level of organism/species/genes and counts of what?

ADD REPLY • link 7.3 years ago by GenoMax 147k

0

Entering edit mode

count antibiotic resistance genes

ADD REPLY • link 7.3 years ago by db • 0

0

Entering edit mode

What format did you collect your BLAST results in and what database did you blast against?

ADD REPLY • link 7.3 years ago by GenoMax 147k

0

Entering edit mode

BLAST results are in .csv format (-outfmt 10). I blasted against database I created with makeblastdb using one of the fasta files from CARD (https://card.mcmaster.ca/download)

ADD REPLY • link 7.3 years ago by db • 0

0

Entering edit mode

A bit late to the party. But out of curiosity, how did you normalize the results? I also created a blastdb from CARD and I am now busy with comparing the results.

ADD REPLY • link 4.8 years ago by remon_dulos • 0

0

Entering edit mode

Is it an RNAseq experiment or just genome sequencing?

ADD REPLY • link 7.3 years ago by Joe 21k

0

Entering edit mode

It is metagenomic sequencing (DNA from environmental samples).

ADD REPLY • link 7.3 years ago by db • 0

0

Entering edit mode

yes then you can align your reads against your contigs if you want to describe you population

ADD REPLY • link 7.3 years ago by Titus ▴ 910

0

Entering edit mode

What tool would you recommend for good speed?

ADD REPLY • link 7.3 years ago by db • 0

0

Entering edit mode

you can use bwa , but you have to be care if you have close species/genes a same read can be aligned against multiple contigs

ADD REPLY • link 7.3 years ago by Titus ▴ 910

Login before adding your answer.