I am working on my first 10x Genomics de novo assembly. For those who may not be familiar with it, 10x adds barcodes to Illumina short reads coming from the same long fragments that are then used to reconstruct long "linked" reads. Since these are non-standard "linked" reads and the technology is fairly new, I don't think there are any tools available for processing the data other than the official Supernova software. Has anyone had luck with alternate assemblers for 10x data?
The reason I am asking is because Supernova is essentially failing for me. I used one HiSeq lane of input, which is fairly reasonable. Based on my discussions with 10x, that amount of data should take about a week to process. I had it going for 4 weeks and it did not finish. Then, the server had to be restarted, so the job was killed. Even half a lane did not finish after several weeks. If I use a small subset of data (like 10M or 100M reads), the process finishes. The results are terrible, but at least it shows there aren't any problems with the dependencies or the environment. I was wondering if anyone here had run into any similar problems and if they had any solutions. Normally, if a certain tool has issues, you can try a different one, but that does not seem to be the case here.
Have you tried other long-read assemblers, like Celera, or tools intended for PacBio?
I think the issue is getting a hold of gem specific pools of reads. I am still familiarizing myself with some 10x data.
Exactly. If I could get to long reads, that would be fantastic. There are basically two steps: converting short reads to "long" reads and then assembling those reads. Unfortunately, those two steps are combined in their software and there is no way to just do one or the other.
Oh, I see. Well, good luck :) If I had access to some 10x data, I might be able to write something that converted it to long reads via Tadpole, but I've never seen any 10x data.
They post some examples here if you want to check: http://support.10xgenomics.com/de-novo-assembly/datasets
I think the biggest problem is that the whole process is not described in detail (for example, the barcode sequences aren't published as far as I know), so it would probably require some reverse engineering.
Have you tried
longranger mkfastq
followed bylongranger basic
? That yields a single file of interleaved fastq's labeled with barcodes. Order of R1/R2 is not guaranteed and R1/R2 markers are not there in the headers. It may be better to do something with the files fromlongranger mkfastq
.That's an interesting idea. I never checked
longranger
documentation since it's a different workflow.longranger mkfastq
is basically abcl2fastq
wrapper and just complicates things if you are already familiar withbcl2fastq
.longranger basic
sounds promising, though. Even if it works as expected, trying to assemble the linked reads yourself is not a trivial undertaking.Did the server you were using meet their required specifications? Performing assembly on a full lane will take much, much more RAM and CPU than a fifth (100M reads). I've had good luck with their assembly software but it needs ample resources to do its job properly.
I ran it with 16 threads and 512GB RAM, which is more than they require. Also, there would probably be a memory-related error if that was a problem.
Nice to hear that someone else got it to work, though.
That must be 512GB :)
You're right. It's been a long week.
Hi Igor and genomax,
I hope you can still see this message. I am just going to start using supernova. I am having a hard time using it. Can you please suggest articles, websites or anything that can help to use it. or if you can tell me the steps that I should since you seem to know how to work with it. Thank you so much in advance. Looking forward to your reply.
Official resource is here: https://support.10xgenomics.com/de-novo-assembly/software/pipelines/latest/using/running
samnioue :
supernova
can be tricky to get going on a cluster. Depending on type of job scheduler your cluster uses you will need to adjust a settings file. You may want to take help from your systems administrator for all of this since you may not have the necessary rights to install things/change settings. Once the software is installed and configured do the test run as suggested on the page I linked.Take a look at the hardware requirements as well. You would need a node with large amount of RAM. 512G would be preferable.
thank you genomax. Do you think is it better to do it wihout cluster. Is there any ohter ways to do it ?
I don't know. If you have a high memory server available you could try running on it. You can try contacting 10x tech support. They are pretty responsive.