Demultiplexing RNAseq data with tag file and barcodes (single end)
0
0
Entering edit mode
2.4 years ago
zbidav ▴ 30

Hi all,

I would like to seek advice about demultiplexing fastq data. the data consists of a tag sample and a 75bp reads. Each fastq sample contains multiple samples within it. I encountered a strange problem using zUMI (both newer and 0.06 versions).

zUMI 0.06 failed after the alignment step, while computing the counts because of the following message (the gtf file seems fine however):

In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
...
Error in `dplyr::filter()`:
! Problem while computing `..1 = (XC %in% bc$V1) & (!is.na(GE))`.
Caused by error:
! object 'GE' not found

zUMI in the newer versions failed while trying to install specific libraries. I tried to use "je demultiplex" but it failed as well. I assume that there are an recommended way to this analysis and I would gladly appreciate any help!

Thanks in advance.

The tag file:

@NB551168:465:H3WHLBGXG:1:11101:17139:1074 1:N:0:GCTCATNA
AGTTCCTGGATGTCCG
+
AAAAAEEEEEEEEEEE
@NB551168:465:H3WHLBGXG:1:11101:15910:1074 1:N:0:GCTCATNA
GTGAAAGACGGCGGAT
+
AAAAAEEEEEEEEEEE

The fastq file is in a regular fastq format:

@NB551168:465:H3WHLBGXG:1:11101:25059:1145 2:N:0:GCTCATGA
CCGTGGCGGCGACGACCCATTCGAACGTCTGCCCTATCAACTTTCGATGGTAGTCGCCGTGCCTAC
+
AAAAAEEEEEE/EE/EEEEEEEE6EEEEEEEAEEEEEEE6EEEEEEEEEEEEEEEAEEEEEEEEAE
@NB551168:465:H3WHLBGXG:1:11101:5778:1145 2:N:0:GCTCATGA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAAA
+
AAAAAEEEEEEAEEEEE/EE/EEAAEEE/EA/EA/////AEEE/////<//AE/////////////
`

barcodes file (I have a separate file that connect between each sample and the barcode):

ATCACG
GTGAAA
GTAGAG
CACCGG
CGATGT

zUMI command:

project="test"
file_Barcode="R1_Pool1_S1_R1_001.fastq.gz";
file_reads="R1_Pool1_S1_R2_001.fastq.gz";
out_DIR="outDir_temp4/";
genome="Human/hg38.STAR.7.ReadsLn75.gencode28/";
GTF_file="Human/hg38.STAR.ReadsLn75.gencode28/gencode.v28.annotation.gtf";
input_zUMI="zUMI_olderVersions/zUMIs-zUMIs.0.0.6/";
Barcodes="scrb_32_A_D_1_8.txt";
pigZ="anaconda3/bin/pigz";
STAR_command="STAR-2.7.3a";
samtools_command="samtools-1.9";
read_length=76 
mkdir -p $out_DIR
nohup bash zUMI_olderVersions/zUMIs-zUMIs.0.0.6/zUMIs-master.sh -f $file_Barcode -r $file_reads -g $genome -a $GTF_file -o $out_DIR -c 1-6 -m 7-16 -l $read_length -s 1 -p 16 -B 1 -b $Barcodes -i $input_zUMI -n $project -e $STAR_command -t $samtools_command -P $pigZ 1>${out_DIR}/out.log 2>${out_DIR}/out.error &

je command

file_Barcode="R1_Pool1_S1_R1_001.fastq.gz";
file_reads="R1_Pool1_S1_R2_001.fastq.gz";
Barcodes="barcodes_samples.txt";
je demultiplex F1=$file_Barcode F2=$file_reads.gz BF=$Barcodes BPOS="READ_1" BM=READ_1

Here I changed the barcodes to:

R1_Pool1_S1_R1_001.fastq.gz     GTGAAA
R1_Pool1_S1_R1_001.fastq.gz     AGTCAA
R1_Pool1_S1_R1_001.fastq.gz     AGTTCC
RNAseq bash Demultiplexing zUMI • 1.6k views
ADD COMMENT
0
Entering edit mode

Can you clarify how the two sequence files are related to barcodes? What do you mean by "tag" read? Where are the barcodes supposed to be in the fastq data (beginning of read)?

ADD REPLY
0
Entering edit mode

If I understand correctly, the first file (tag file) should be UMI+barcode for each read in the fastq file. The barcodes only indicate which barcodes should be used to differ between the patients

ADD REPLY
0
Entering edit mode

This looks almost like 10x genomics data. Is it 10x? It may be simpler for the people who did the sequencing to demultiplex the data for you.

ADD REPLY
0
Entering edit mode

Thank you kindly and sorry for the delay. It used the same principle (UMI +barcodes) but (we hope) it's whole transcriptome. Hmm... unfortunately it's not so possible. To be exact actually they recommended us to use zUMI protocol to extract the data. This sadly did not work as expected. However it a bit strange in general because while the protocol for demultiplex exist (zUMI, je, fastX), I did not find a well documented example how to perform them.

ADD REPLY
0
Entering edit mode

Re. your first lot of code what was the path to GE? The error message is giving you good information: GE object is not being found so either your path is wrong or you are possibly renaming it/accidentally removing it in a previous step of your script?

You said, "zUMI in the newer versions failed while trying to install specific libraries." what was the command and output that specifically led to the failed library installs? It seems to me the easiest way for you to rectify this entire problem is by solving why those libraries couldn't be installed and then following the recommended process because there is an enormity of tutorial content and already solved questions online for Linux install type problems.

If the libraries are failing to install it is often because you don't have some other piece of software installed that the library you want to install relies on. So check the error messages and hopefully you'll figure it out!

ADD REPLY
0
Entering edit mode

I apologize for the delay, thanks for the your answer. First, If it all right, I will clear a bit the confusion: zUMI is a published software with multiple versions. The software should, at some level, used as a "black box".

  • GE is not a parameter that zUMI except. It should be created (if I understand correctly) based on the gtf file. as the GTF file were downloaded directly from enterez and used for STAR generation, it should be correct, so there are a possible bug at the software.
    • GE is a parameter that generated while running the script. However I have no indication why it missing or what the parameter is.
    • The command in the newer version was running zUMI.
      ~/zUMIs/zUMIs.sh -y "runzUMIs.run.yaml"
      
      Sadly, it a bit more complicated. we can install packages only locally in the home directory and not at the root (version comparability). zUMI attempt install the packages at root (without asking :/ ). But i'll look up to it again, thanks!
ADD REPLY

Login before adding your answer.

Traffic: 1611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6