Question

Issues with RAT in CAT_pack and Optimizing Relative Abundance Calculation

1

Entering edit mode

11 weeks ago

LuciaNhu ▴ 10

Hello everyone,

My question relates to shotgun metagenomics.

I would like to calculate relative abundance based on reads mapped to contigs. Typically, I deduplicate all contigs using cd-hit and treat them as a reference, then map my FASTQ files to these references to create feature counts. However, with nearly 100 samples, my deduplicated references are becoming too large (almost 50GB), which is causing performance issues.

I noticed that the CAT-BAT-RAT suite provides an intelligent set of tools within CAT_pack. However, I’m encountering some issues and would appreciate your assistance.

When running RAT, I get an error related to the reads, stating that the FASTQ headers must match those in the contigs. My contigs were generated by MEGAHIT and are named in the format >k127_10000, without any reference to FASTQ headers. How can I resolve this mismatch?

 /CAT_pack reads --mode cr -c contigs.fa -1 R_1.fastq -2 R_2.fastq -d /path/to/20240422_CAT_nr/db/ -t /path/to/200422_CAT_nr/tax/ -o lala -p out.CAT.predicted_proteins.faa -a out.CAT.alignment.diamond --c2c out.CAT.contig2classification.txt

 <frozen genericpath>:39: RuntimeWarning: bool is used as a file descriptor
 # CAT_pack v6.0.

 [2024-10-11 09:41:47] samtools found: samtools 1.21. [2024-10-11 09:41:47] bwa found: Version: 0.7.18-r1243-dirty.

 RAT is running. Mapping reads against assembly with bwa mem.

 [2024-10-11 09:41:47] Running bwa mem for read mapping. File fastq.bwamem.sorted will be generated.Do not forget to cite bwa mem and samtools when using RAT in your publication! [2024-10-11 09:41:47] Contigs fasta is already indexed. [2024-10-11 09:41:47] Running bwa mem... ... [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa mem -t 24  [main] Real time: 118.429 sec; CPU: 2701.163 sec [2024-10-11 09:43:52] Sorting bam file... [bam_sort_core] merging from 0 files and 24 in-memory blocks... [2024-10-11 09:44:00] Read mapping done!

 [2024-10-11 09:44:00] contig2classification file supplied. Processing contig classifications. [2024-10-11 09:44:00] Loading file 20240422_CAT_nr/tax/nodes.dmp. [2024-10-11 09:44:03] Processing mapping file(s).

 [2024-10-11 09:44:32] Chosen mode: cr. Classifying unclassified contigs and unmapped reads with diamond if no classification file is supplied. [2024-10-11 09:44:32] No unmapped2classification file supplied .Grabbing unmapped and unclassified sequences... [2024-10-11 09:44:33] Contigs written! Appending forward reads... Traceback (most recent call last):   File "CAT_pack/CAT_pack/./CAT_pack", line 101, in <module>
     main()
     ~~~~^^   File "CAT_pack/CAT_pack/./CAT_pack", line 85, in main
     reads.run()
     ~~~~~~~~~^^   File "CAT_pack/CAT_pack/reads.py", line 435, in run
     make_unclassified_seq_fasta(reads_files[0], unmapped_reads['fw'],
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                 uncl_unm_fasta, 'fastq', 'a','_1')
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "CAT_pack/CAT_pack/reads.py", line 1260, in make_unclassified_seq_fasta
     fasta_dict[seq]))
     ~~~~~~~~~~^^^^^ KeyError: 'FT100012261L1C013R00202177739'

It seems like RAT requires CAT.contig2classification.txt to be run first, although the tutorial doesn’t mention this. Is running CAT mandatory before RAT?

Finally, does the RAT method (mapping reads to contigs) require a re-run of the deduplication step to accurately calculate relative abundance, or can I use the original deduplicated contigs?

Thank you for your help!

Best regards, Nhu

metagenomics taxonomy mag annotation • 472 views

ADD COMMENT • link updated 17 days ago by mks002 ▴ 220 • written 11 weeks ago by LuciaNhu ▴ 10

score 0 · Answer 1 · 2024-12-12

0

Entering edit mode

21 days ago

mks002 ▴ 220

Hi Nhu,

Since you are running the CAT_pack in "--mode cr", it would ask for the classification file.

So Below error is what you are getting.

"No unmapped2classification file supplied"

I think first you should running CAT_pack in contigs mode.

Let me know if you are able to fix it.

ADD COMMENT • link 17 days ago by mks002 ▴ 220