Hi all,
I'm very new to bioinformatics and am currently using UMI-tools to look at my recently sequencing libraries. I have a few steps I'm struggling with:
1st question- Using the UMI-tools guide, I extracted the UMIs from my R1 file, mapped, indexed and sorted my file. Now, when I deduplicate, I only get a resulting bam file and no tsv files. When I did this using the UMI-tools example files, I got a bam file + 3 tsv files. How do I go about generating this? The command line I am using is:
$ umi_tools dedup -I Sorted.bam --output-stats=deduplicated -S deduplicated.bam
2nd question- I have seven libraries in total. I have a R1 and R2 fastq file generated from each library so 14 fastq files in total. Can I process all 14 at once, or at least the R1 and R2 of each library together? I used the command from the UMI-tools guide for paired reads but was unsuccessful. Their website provides the command below:
$ umi_tools extract -I pair.1.fastq.gz --bc-pattern=NNNXXXXNN \
--read2-in=pair.2.fastq.gz --stdout=processed.1.fastq.gz \
--read2-out=processed.2.fastq.gz
Here is how I used it:
$ umi_tools extract -I FilenameR1.fastq.gz --bc-pattern=NNNXXXXNN \
--read2-in=FilenameR2.fastq.gz --stdout=processed.FilenameR1.fastq.gz \
--read2-out=processed.FilenameR2.fastq.gz
Have I made a mistake here?
Thanks in advance!!
When you say it was unsucessful, what do you mean? Was there an error message?
You are also going to want to use a
--bc-pattern
that matches your read geometry. And unless you are analysing an iCLIP experiment from circa 2012, I doubt itsNNNXXXXNN
Also, for the dedup, check that the job is finishing. The logging output should end with
# job finished
. It is doesn't, that suggests that umi_tools isn't finishing before it produces the stats files. Could be due to being killed by an OOM killer, or by a cluster resource manager.In regards to the dedup step for my unpaired file- yes, it appears that umi-tools isn't finsihing as I get the following error at the end of a message with mostly the file parameters:
My input file is in bam format and is not empty. I'm not sure why it wouldn't have a header if I followed the previous commands from the guideline correctly.
Hmmm... that is starange. Have you run
samtools quickcheck
on the BAM file? And if that passes, how aboutsamtools view -H
this should output the header if there is one.Sorry, by unsuccessful, I meant that the first fastq file text appeared in my terminal as a response. The start of the message was:
And the end was:
Also, yes, sorry, I hadn't put the correct pattern into the command above as I was using the example command. What I used was:
Today, I tried the following command which only generated a log file and a fastq.gz file for the R1 file:
I was expecting two output files but that might be incorrect as when I get to paired de-duplication, the command only includes one input file. Sorry if I've confused you!
Sarah why did you delete this post?