Issues with RAT in CAT_pack and Optimizing Relative Abundance Calculation
0
0
Entering edit mode
7 weeks ago
LuciaNhu • 0

Hello everyone,

My question relates to shotgun metagenomics.

I would like to calculate relative abundance based on reads mapped to contigs. Typically, I deduplicate all contigs using cd-hit and treat them as a reference, then map my FASTQ files to these references to create feature counts. However, with nearly 100 samples, my deduplicated references are becoming too large (almost 50GB), which is causing performance issues.

I noticed that the CAT-BAT-RAT suite provides an intelligent set of tools within CAT_pack. However, I’m encountering some issues and would appreciate your assistance.

  • When running RAT, I get an error related to the reads, stating that the FASTQ headers must match those in the contigs. My contigs were generated by MEGAHIT and are named in the format >k127_10000, without any reference to FASTQ headers. How can I resolve this mismatch?

     /CAT_pack reads --mode cr -c contigs.fa -1 R_1.fastq -2 R_2.fastq -d /path/to/20240422_CAT_nr/db/ -t /path/to/200422_CAT_nr/tax/ -o lala -p out.CAT.predicted_proteins.faa -a out.CAT.alignment.diamond --c2c out.CAT.contig2classification.txt
    
     <frozen genericpath>:39: RuntimeWarning: bool is used as a file descriptor
     # CAT_pack v6.0.
    
     [2024-10-11 09:41:47] samtools found: samtools 1.21. [2024-10-11 09:41:47] bwa found: Version: 0.7.18-r1243-dirty.
    
     RAT is running. Mapping reads against assembly with bwa mem.
    
     [2024-10-11 09:41:47] Running bwa mem for read mapping. File fastq.bwamem.sorted will be generated.Do not forget to cite bwa mem and samtools when using RAT in your publication! [2024-10-11 09:41:47] Contigs fasta is already indexed. [2024-10-11 09:41:47] Running bwa mem... ... [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa mem -t 24  [main] Real time: 118.429 sec; CPU: 2701.163 sec [2024-10-11 09:43:52] Sorting bam file... [bam_sort_core] merging from 0 files and 24 in-memory blocks... [2024-10-11 09:44:00] Read mapping done!
    
     [2024-10-11 09:44:00] contig2classification file supplied. Processing contig classifications. [2024-10-11 09:44:00] Loading file 20240422_CAT_nr/tax/nodes.dmp. [2024-10-11 09:44:03] Processing mapping file(s).
    
     [2024-10-11 09:44:32] Chosen mode: cr. Classifying unclassified contigs and unmapped reads with diamond if no classification file is supplied. [2024-10-11 09:44:32] No unmapped2classification file supplied .Grabbing unmapped and unclassified sequences... [2024-10-11 09:44:33] Contigs written! Appending forward reads... Traceback (most recent call last):   File "CAT_pack/CAT_pack/./CAT_pack", line 101, in <module>
         main()
         ~~~~^^   File "CAT_pack/CAT_pack/./CAT_pack", line 85, in main
         reads.run()
         ~~~~~~~~~^^   File "CAT_pack/CAT_pack/reads.py", line 435, in run
         make_unclassified_seq_fasta(reads_files[0], unmapped_reads['fw'],
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                     uncl_unm_fasta, 'fastq', 'a','_1')
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "CAT_pack/CAT_pack/reads.py", line 1260, in make_unclassified_seq_fasta
         fasta_dict[seq]))
         ~~~~~~~~~~^^^^^ KeyError: 'FT100012261L1C013R00202177739'
    
  • It seems like RAT requires CAT.contig2classification.txt to be run first, although the tutorial doesn’t mention this. Is running CAT mandatory before RAT?

  • Finally, does the RAT method (mapping reads to contigs) require a re-run of the deduplication step to accurately calculate relative abundance, or can I use the original deduplicated contigs?

Thank you for your help!

Best regards, Nhu

metagenomics taxonomy mag annotation • 267 views
ADD COMMENT

Login before adding your answer.

Traffic: 1632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6