Hi all,
I would like to seek advice about demultiplexing fastq data. the data consists of a tag sample and a 75bp reads. Each fastq sample contains multiple samples within it. I encountered a strange problem using zUMI (both newer and 0.06 versions).
zUMI 0.06 failed after the alignment step, while computing the counts because of the following message (the gtf file seems fine however):
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
...
Error in `dplyr::filter()`:
! Problem while computing `..1 = (XC %in% bc$V1) & (!is.na(GE))`.
Caused by error:
! object 'GE' not found
zUMI in the newer versions failed while trying to install specific libraries. I tried to use "je demultiplex" but it failed as well. I assume that there are an recommended way to this analysis and I would gladly appreciate any help!
Thanks in advance.
The tag file:
@NB551168:465:H3WHLBGXG:1:11101:17139:1074 1:N:0:GCTCATNA
AGTTCCTGGATGTCCG
+
AAAAAEEEEEEEEEEE
@NB551168:465:H3WHLBGXG:1:11101:15910:1074 1:N:0:GCTCATNA
GTGAAAGACGGCGGAT
+
AAAAAEEEEEEEEEEE
The fastq file is in a regular fastq format:
@NB551168:465:H3WHLBGXG:1:11101:25059:1145 2:N:0:GCTCATGA
CCGTGGCGGCGACGACCCATTCGAACGTCTGCCCTATCAACTTTCGATGGTAGTCGCCGTGCCTAC
+
AAAAAEEEEEE/EE/EEEEEEEE6EEEEEEEAEEEEEEE6EEEEEEEEEEEEEEEAEEEEEEEEAE
@NB551168:465:H3WHLBGXG:1:11101:5778:1145 2:N:0:GCTCATGA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAAA
+
AAAAAEEEEEEAEEEEE/EE/EEAAEEE/EA/EA/////AEEE/////<//AE/////////////
`
barcodes file (I have a separate file that connect between each sample and the barcode):
ATCACG
GTGAAA
GTAGAG
CACCGG
CGATGT
zUMI command:
project="test"
file_Barcode="R1_Pool1_S1_R1_001.fastq.gz";
file_reads="R1_Pool1_S1_R2_001.fastq.gz";
out_DIR="outDir_temp4/";
genome="Human/hg38.STAR.7.ReadsLn75.gencode28/";
GTF_file="Human/hg38.STAR.ReadsLn75.gencode28/gencode.v28.annotation.gtf";
input_zUMI="zUMI_olderVersions/zUMIs-zUMIs.0.0.6/";
Barcodes="scrb_32_A_D_1_8.txt";
pigZ="anaconda3/bin/pigz";
STAR_command="STAR-2.7.3a";
samtools_command="samtools-1.9";
read_length=76
mkdir -p $out_DIR
nohup bash zUMI_olderVersions/zUMIs-zUMIs.0.0.6/zUMIs-master.sh -f $file_Barcode -r $file_reads -g $genome -a $GTF_file -o $out_DIR -c 1-6 -m 7-16 -l $read_length -s 1 -p 16 -B 1 -b $Barcodes -i $input_zUMI -n $project -e $STAR_command -t $samtools_command -P $pigZ 1>${out_DIR}/out.log 2>${out_DIR}/out.error &
je command
file_Barcode="R1_Pool1_S1_R1_001.fastq.gz";
file_reads="R1_Pool1_S1_R2_001.fastq.gz";
Barcodes="barcodes_samples.txt";
je demultiplex F1=$file_Barcode F2=$file_reads.gz BF=$Barcodes BPOS="READ_1" BM=READ_1
Here I changed the barcodes to:
R1_Pool1_S1_R1_001.fastq.gz GTGAAA
R1_Pool1_S1_R1_001.fastq.gz AGTCAA
R1_Pool1_S1_R1_001.fastq.gz AGTTCC
Can you clarify how the two sequence files are related to barcodes? What do you mean by "tag" read? Where are the barcodes supposed to be in the fastq data (beginning of read)?
If I understand correctly, the first file (tag file) should be UMI+barcode for each read in the fastq file. The barcodes only indicate which barcodes should be used to differ between the patients
This looks almost like 10x genomics data. Is it 10x? It may be simpler for the people who did the sequencing to demultiplex the data for you.
Thank you kindly and sorry for the delay. It used the same principle (UMI +barcodes) but (we hope) it's whole transcriptome. Hmm... unfortunately it's not so possible. To be exact actually they recommended us to use zUMI protocol to extract the data. This sadly did not work as expected. However it a bit strange in general because while the protocol for demultiplex exist (zUMI, je, fastX), I did not find a well documented example how to perform them.
Re. your first lot of code what was the path to GE? The error message is giving you good information: GE object is not being found so either your path is wrong or you are possibly renaming it/accidentally removing it in a previous step of your script?
You said, "zUMI in the newer versions failed while trying to install specific libraries." what was the command and output that specifically led to the failed library installs? It seems to me the easiest way for you to rectify this entire problem is by solving why those libraries couldn't be installed and then following the recommended process because there is an enormity of tutorial content and already solved questions online for Linux install type problems.
If the libraries are failing to install it is often because you don't have some other piece of software installed that the library you want to install relies on. So check the error messages and hopefully you'll figure it out!
I apologize for the delay, thanks for the your answer. First, If it all right, I will clear a bit the confusion: zUMI is a published software with multiple versions. The software should, at some level, used as a "black box".