Question

Extract FUngal mRNA from RNAseq Data

0

Entering edit mode

11 weeks ago

SomeOne ▴ 170

Hello reader,

we just got some data regarding Macrophage infection analysis with some fungal strains. then rna was extracted from these analysis and sequenced for mRNA 150bpx2.

Human macrophages where used for this analysis. and my colleague extracted rna using the whole well of the assay plate. which means that it has both macrophage and fungus mRNA, no idea in what proportions.

now i want to extract the reads which are fungal specific to use in another genome annotation process.

The idea which i have works like this

Map QC-cleaned reads to human reference genome using Hisat2/STAR
Extract all the reads which do not map to human Ref. Using samtools -f 4 flag.

These reads will be theoratically fungal mRNA reads.

Questions:

is this really the best approach to follow ?
Which human genome to use for this analysis. Hg38 or T2T genome.
samtools will only extract reads which show no alignment at all or there should be a threshold to extract reads ?

If you can suggest anyother way which is better that the idea that i have, it will be rally helpful. This data will also be used for RNAseq study but on later stages.

thank you

rnaseq rna fungus humant2t transcriptomics • 579 views

ADD COMMENT • link 10 weeks ago by SomeOne ▴ 170

score 3 · Accepted Answer · 2025-02-18

3

Entering edit mode

11 weeks ago

GenoMax 151k

If the genome for the fungus is available then you can use binning approach with bbsplit.sh from BBMap suite. This will allow you to logically address reads that may multi-map to both human/fungal genomes.

ADD COMMENT • link 11 weeks ago by GenoMax 151k

0

Entering edit mode

unfortunatly there is no chromosoem level reference genome available. But there are some genome assemblies available at scaffold or contig level with annotations, tho the wigth have some gaps too. can those be used ?

ADD REPLY • link 11 weeks ago by SomeOne ▴ 170

1

Entering edit mode

It is worth trying. You will know reads that map to the fungal reference (even though fragmented) definitely belong there.

You could also use a transcriptime/cDNA collection instead (if available) since you are working with RNAseq data.

ADD REPLY • link 11 weeks ago by GenoMax 151k

0

Entering edit mode

i used the following command. Kindly let me know if i missed something or if it is now correct

bbsplit.sh ref=human_GRCh38.p14_genomic.fna, Fungus_genomic_all_scf.fna \
path=./00.ref_index \
in=./D1_32931/D1_32931_trim_1.fq.gz \
in2=./D1_32931/D1_32931_trim_2.fq.gz \
basename=./01.ref_specific_reads/D1_32931_2_ref_%.fq.gz \
scafstats=./01.ref_specific_reads/D1_32931_2_ref_scafstats.txt \
refstats=./01.ref_specific_reads/D1_32931_2_ref_stats.txt \
ambiguous2=toss -Xmx200g threads=90

and these were the results

   ------------------   Results   ------------------   

Genome:                 1
Key Length:             13
Max Indel:              20
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             105979402       (15849208519 bases)

Mapping:                1964.954 seconds.
Reads/sec:              53934.79
kBases/sec:             8065.94


Pairing data:           pct pairs       num pairs       pct bases          num bases

mated pairs:             78.2038%        41439934        78.1535%        12386713356
bad pairs:                7.9484%         4211853         7.9708%         1263314616
insert size avg:          983.99


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                  92.2507%        48883350        92.2429%         7309656727
unambiguous:             83.7040%        44354493        83.7020%         6632841526
ambiguous:                8.5467%         4528857         8.5410%          676815201
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:       36.4915%        19336720        36.4624%         2889407313
semiperfect site:        36.5918%        19389884        36.5628%         2897361904
rescued:                  0.0109%            5776

Match Rate:                   NA               NA        92.5373%         6836944703
Error Rate:              60.3932%        29524867         7.4605%          551204372
Sub Rate:                58.6064%        28651324         3.8982%          288007548
Del Rate:                 9.8427%         4811883         1.0646%           78653486
Ins Rate:                24.5991%        12025964         2.4978%          184543338
N Rate:                   0.1215%           59402         0.0022%             161138


Read 2 data:            pct reads       num reads       pct bases          num bases

mapped:                  92.2198%        48867004        92.2110%         7307590567
unambiguous:             83.6751%        44339205        83.6717%         6630864447
ambiguous:                8.5447%         4527799         8.5393%          676726120
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:       36.2372%        19202003        36.2015%         2868915139
semiperfect site:        36.3385%        19255647        36.3028%         2876942183
rescued:                  0.0114%            6044

Match Rate:                   NA               NA        92.5300%         6834312552
Error Rate:              60.6241%        29627809         7.4675%          551555518
Sub Rate:                58.8687%        28769911         3.9199%          289523453
Del Rate:                 9.8388%         4808370         1.0623%           78459430
Ins Rate:                24.5342%        11990196         2.4854%          183572635
N Rate:                   0.1961%           95825         0.0025%             181927

and stats

#name   %unambiguousReads   unambiguousMB   %ambiguousReads ambiguousMB unambiguousReads    ambiguousReads  assignedReads   assignedBases
human_GRCh38.p14_genomic    96.64158    15318.39947 0.09862 15.676933   102420164   104518  102420164   15318399471
fungi_genomic_all_scf   1.57809 250.129487  0.09862 15.676933   1672452 104518  1776970 265806420

Thank you

ADD REPLY • link 11 weeks ago by SomeOne ▴ 170

1

Entering edit mode

Looks good to me. You are being strict with multi-mappers across genomes (i.e. throwing them away). Something to make a note of.

ADD REPLY • link 11 weeks ago by GenoMax 151k

0

Entering edit mode

Hey, GenoMax I have another query regarding bbaplit.sh.

As i have mentioned that the raw-reads are of RNASeq, is bbsplit.sh performing splice-awear alignments ?

or it is just considering the data as DNA-sequenceand performing alinments like BWA-mem or other DNA-alignment tools ?

ADD REPLY • link 10 weeks ago by SomeOne ▴ 170