I have two fastq files, first set are the footprints, while the second sets are data coming from a common RNA-seq experiment performed in parallel, I want to adapter sequences, using the software CUTADAPT
The adapter sequences are:
GTTCAGAGTTCTACAGTCCGACGATC # 5 prime adapter sequence
ATCTCGTATGCCGTCTTCTGCTTG # 3 prime adapter sequence
May you please tell whether and how I can do this project with windows seven 64 bit?? because I am not familiar enough with linux
You mean first I should extract BBMap_34.92.tar.gz then what is the later step?
Please help me more if possible
When I typed your command in cmd, it told can't find or load main class of jgi BBDukf
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.6 years ago by
zizigolu
★
4.3k
3
Entering edit mode
Hi Sarah,
First you need to extract BBMap. You can do that with 7-zip (a great program); just right-click and choose 7-zip then "extract here". You need to do this once for the .gz file and then once for the tar file.That will create a directory called bbmap, which you can put in C:.
Then (for simplicity) move your sequence files to a new directory called C:\sequence\.
Then open a command prompt (typically by typing cmd at the bottom of the start menu). Type cd C:\sequence.
Then you type the java command, which should work.
I mean why I can not get some result like which I copied from your thread
Total output: 966786 reads 113303242 bases
Perfectly Correct (% of output): 901541 reads (93.251%) 103689866 bases (91.515%)
Incorrect (% of output): 65245 reads (6.749%) 9613376 bases (8.485%)
Adapters Remaining (% of adapters): 65243 reads (13.973%) 1229480 bases (1.085%)
Non-Adapter Removed (% of valid): 2 reads (0.000%) 27 bases (0.000%)
or anything else to present to my supervisor?
for example he asked me: "A quick check you might do would be to control the initial number of entries (reads) included in the "raw data" fastq file and compare it with the number of entries (reads) which are in the final fastq file. The number of entries in the "final" file should be less.
but I don't know where I should search for read counts
in cmd just there is something about java and run time spent
Excuse me...
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.6 years ago by
zizigolu
★
4.3k
1
Entering edit mode
Those specific results are from analyzing the output of trimming operations on synthetic data, in which the answer is known. For real data, you don't know the correct answer, so the output will look more like this:
The input has to be a fastq or fasta file, not a qual file. And ref=gruseq.fa was just used as an example for the synthetic test; in real life you should specify the actual adapter sequences (in your case, with literal=).
from where I can download BBMerge? does BBMerge embed in BBMap and no need to be downloaded?
using this command in=reads.fq out1=clean1.fq out2=clean2.fq minlen=25 qtrim=r trimq=10
I have two clean.fq files now, if I want to align the trimmed reads against a reference genome, which of those files (clean.fq 1 and 2) I should use?
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.6 years ago by
zizigolu
★
4.3k
1
Entering edit mode
BBMerge, BBDuk, and BBMap are all together in the BBMap download.
When aligning paired reads, you should use both files at the same time. Please note that if you run BBDuk with one input and 2 outputs, it will assume the input is interleaved split the input into two files. You can, alternately, just specify one output file to keep it interleaved.
To map these to a reference, you can do so like this with BBMap:
That will map the files to the specified reference. However, the amount of memory required depends on the size of the reference file. For the human genome, for example, you would need to change -Xmx1g to -Xmx22g. BBMap generally requires approximately 7 bytes per base pair of reference sequence.
For the rRNA filtering part (i.e. mapping the fastq reads on the rRNA sequences and keeping only the reads which do not map): I should type
bowtie2 -x [name of the bowtie2-build indicized file containing the rRNA sequence] --un [name of the fastq file which will contain the UNMAPPED reads] -U [name of the fastq file containing the reads] -S [name of the .sam file that will contain the MAPPED and UNMAPPED reads]
I could not understand his mean about --un option because I don't know which I should type instead of [name of the fastq file which will contain the UNMAPPED reads]
May you please help me although this irrelevant to sequence trimming.
Excuse me
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.6 years ago by
zizigolu
★
4.3k
I have just one read file then I thought it is not paired read, I trimmed my reads based on your previous tips, in the other hand my supervisor newly asked me to align my clean.fq file against reference genome, but I don't know which I should type in [name of the fastq file which will contain the UNMAPPED reads]. in bowtie2 manual I read that
--un <path> write unpaired reads that fail to align to file at <path>. These reads correspond to the SAM records with the FLAGS 0x4 bit set and neither the 0x40 nor 0x80bits set. If --un-gz is specified, output will be gzip compressed. If --un-bz2 or --un-lz4 is specified, output will be bzip2 or lz4 compressed.
but I didn't get which I should type after --un
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.6 years ago by
zizigolu
★
4.3k
0
Entering edit mode
Brian,
I have a sam file containing unmapped reads, I am going to convert my sam file to fasta or fastq to be a input in bowtie2 to align on genome.
May you please tell how I can convert sam to fasta or fastq by reformat in BBDuk?
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.6 years ago by
zizigolu
★
4.3k
You clever boy deserve to be appreciated, your code worked well.
Thank you
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.6 years ago by
zizigolu
★
4.3k
0
Entering edit mode
Brian,
Can I use BBDuk to see genome coverage? I tried bedtools to produce bed graph and bed histogram but it doesn't work on windows, also I tried samtools but I just found some weird command. I have a bam file that I want to see genome coverage of.
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.5 years ago by
zizigolu
★
4.3k
basecov will give you the exact coverage at every base location, so that file will be huge if you have a large genome; so you may want to skip that one. bincov gives the coverage binned by every 1000bp (by default) so is a lot smaller and easier to plot in, say, Excel.
set USE_COURAGE_ARRAYS to true
set USE_BITSETS to false
exception in thread "main" java.lang .ASSERTAIONERROR: -128 = ?
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.5 years ago by
zizigolu
★
4.3k
1
Entering edit mode
Can you please copy and paste the full error, include all the information above it?
Oh - I should mention, that command will not work unless samtools is installed and in your path. If samtools is not installed, Pileup can only read the file if you first convert it to sam format.
I installed samtools already. My bam file is in samtools folder
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Users\yang>cd C:\Users\yang\Documents\Downloads\samtools
C:\Users\yang\Documents\Downloads\samtools>java -ea -Xmx1g -cp C:\Users\yang\Documents\Downloads\BBMap\current jgi.CoveragePileup in=eg1.bam covstats=stats.txt hist=histogram.txt basecov=basecov.txt bincov=bincov.txt binsize=1000
Executing jgi.CoveragePileup [in=eg1.bam, covstats=stats.txt, hist=histogram.txt, basecov=basecov.txt, bincov=bincov.txt, binsize=1000]
Set USE_COVERAGE_ARRAYS to true
Set USE_BITSETS to false
Exception in thread "main" java.lang.AssertionError: Missing field 1: ▼♦ ÿ♠BC☻ ? }?»
at stream.SamLine.<init>(SamLine.java:472)
at jgi.CoveragePileup.processSamLine(CoveragePileup.java:675)
at jgi.CoveragePileup.process(CoveragePileup.java:351)
at jgi.CoveragePileup.main(CoveragePileup.java:49)
C:\Users\yang\Documents\Downloads\samtools>
and for my another bam file:
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Users\yang>cd C:\Users\yang\Documents\Downloads\samtools
C:\Users\yang\Documents\Downloads\samtools>java -ea -Xmx1g -cp C:\Users\yang\Documents\Downloads\BBMap\current jgi.CoveragePileup in=sim_reads_aligned.bam covstats=stats.txt hist=histogram.txt basecov=basecov.txt bincov=bincov.txt binsize=1000
Executing jgi.CoveragePileup [in=sim_reads_aligned.bam, covstats=stats.txt, hist=histogram.txt, basecov=basecov.txt, bincov=bincov.txt, binsize=1000]
Set USE_COVERAGE_ARRAYS to true
Set USE_BITSETS to false
Exception in thread "main" java.lang.AssertionError: -128 = ?array=▼♦ ÿ♠ BC☻ ? srôe?b``p?pá♀ó?2?3à♀ö·*?+?/*IMá♫ä♀ö?J?¬144031024?)JM«ñs?70°025Ö3¬áôñ?2±4?°42àpçôt±J?//ÉL5â♀?ƒ3??←é☺?çb♦?&Äx?dX◄?? B?Bú? ♦ ÿ♠ BC☻ g%?}[?-?uÖ?ó?☻$▲Ü?Fô♥²èN+n♀??Kß?xè?è♫æw@?♫??Ü?9g&?C$?↔#¶ƒ??▬4►@ ?☻?♦?P$Ä♂?Ä♥◄y@AÜ?¶ °►, start=247, stop=249
at align2.Tools.parseInt(Tools.java:1508)
at stream.SamLine.<init>(SamLine.java:473)
at jgi.CoveragePileup.processSamLine(CoveragePileup.java:675)
at jgi.CoveragePileup.process(CoveragePileup.java:351)
at jgi.CoveragePileup.main(CoveragePileup.java:49)
C:\Users\yang\Documents\Downloads\samtools>
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.5 years ago by
zizigolu
★
4.3k
1
Entering edit mode
Well... it looks like, for whatever reason, Pileup can't find Samtools. I suggest you first convert the bam to sam and then run Pileup on the sam file.
Can I use BBDuk to convert sra, wig or txt to fasta or fastq? Even though I could not find them in reformat's accepted formats in your thread in SEQanswer.
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.5 years ago by
zizigolu
★
4.3k
1
Entering edit mode
Nope, the only thing I know of that will convert sra files is the sra toolkit.
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.6 years ago by
zizigolu
★
4.3k
0
Entering edit mode
Sorry Brian,
Can I use reformat to convert txt.gz or wig.gz files to fasta or fastq?
ADD REPLY
• link
updated 22 months ago by
Ram
44k
•
written 9.5 years ago by
zizigolu
★
4.3k
1
Entering edit mode
Reformat will only convert reads; the only formats it will accept are fasta, fastq, fasta+qual, scarf, sam, and bam (if samtools is in the path). It will accept gzipped or not gzipped. But there's nothing that can convert a wig file to reads, because the data is no longer there. As for .txt, that's a generic extension that could mean anything, so it depends on the contents, but probably not.
Read this documentation properly. https://cutadapt.readthedocs.org/en/stable/