Hi All, I have been trying to fetch the reads mapped to influenza virus genome (negative sense RNA) in an RNA seq. data from chicken infection experiment that I have recently done., using Fastq screen (https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/). It is a nice tool, where you can edit the configuration file (by adding database/s to screen against, an aligner. I have used two aligner, the BWA and the Bowtie2, have indexed the genomes of the 3- databases (mice as a control negative, chicken, and influenza virus of the same strain used for infection plus another reference strain). The indexing was correct:
bwa index Guangdong_HA
bowtie2-build Guangdong_HA.fasta Guangdong_HA
Then I run Fastq screen : here I used bwa, but have done this for bowtie2 and I am telling the tools to align my fastq file (infected at 6 hpi) against the 3-databases.
FastQ-Screen-0.15.3/fastq_screen --aligner bwa /mnt/lustre/RDS-live/samir/ephemeral/Infection_expe/fastq/Ross-6h-A_R1_trimmed.fastq
I have done the same for fasta file that are produced from STAR alignment basically after removing the reads aligned to the chicken genome.
FastQ-Screen-0.15.3/fastq_screen --aligner bwa /mnt/lustre/RDS-live/samir/ephemeral/Infection_expe/fastq/unmapped_fastq/Ross_6h_A1Unmapped.out.mate1.fastq
I have run this for also infection at 12 hpi, I obtained nice mapping to chicken, no unique mapping to mice and no mapping to any of the flu RNA databases, see below
I am sure it is something in the aligner: Did bwa aligner works when aligning RNA (my samples) against RNA (the flu database) ? Did Bowtie2 did the same ?
Could any of you explain to me why do not I have reads mapped to the virus. Peoples who done the infection told they have nicely infected the samples.
Thanks
Thanks for your comments: (1) there was no viral RNA in the sample you collected: this is something we can not validate unless we run RT-qPCR on the sample, which is not feasible at the moment, but could be underpowered infection, because it is actually infecting chicken egg.
(2) a lack of sequencing depth precluded you from identifying any viral RNA that might have been in your sample; we run 40 X million read depth, which is high, Do you think using blast search could get me any remnants if the seq depth was not high enough to capture virus ?
or (3) the viral reference segments do not match closely enough to the viral RNA in your sample: I do not think so, because I run the alignment against the actual virus that was used in the infection. there might be changes during the infection itself, that is why I align it also against a reference influenza strain, which also did not produce any hits as well.
A general last question: do you think the polyA enrichment during the sequencing could preclude detection the virus reads as they are not polyadenylated.
(1) Propogating an influenza virus in embryonated chicken eggs is a common practice for enrichment purposes, but it does not look like it was effective based on the graphs you are showing and without confirmation of the virus using qPCR or a hemagglutination assay.
(2) If the sequencing depth was not high enough to capture the virus, BLAST will not be able to find it. I guess my point was to take the reads that did not map to anything and then BLAST them to see what hits you get. You may get hits to a viral segment not currently represented in your reference database, e.g., PB (segment 3), NA (segment 4) or NS1/NS2 (segment 8) or human/bacterial background DNA.
(3) This sounds fine. Try adding in the other segments if they are available, i.e., those representing the polymerase, neuraminidase and non-structural proteins.
I do not have an educated opinion about your last question.