Entering edit mode
3.4 years ago
selplat21
▴
20
I received a contamination file after submitting a genome assembly to Genbank and was wondering if anyone has experience modifying the list of intervals they give you for bedtools maskfasta.
The following are the contents of the contamination.txt file:
[] We ran your sequences through our Contamination Screen. The screen found
contigs that need to be trimmed and/or excluded. The results are in the
Contamination.txt file posted in your submission on the WGS submission portal
https://submit.ncbi.nlm.nih.gov/subs/genome/.
GenBank staff will automatically remove contaminants that are found to be
the entire sequence or at the end of a sequence, and will post the reports
and edited fasta file to the submission portal. Note that internal contamination
will not be automatically removed since the sequence may be misassembled and
therefore should be split at the contamination and resubmitted as separate sequences.
In addition, we do not automatically remove mitochondrial sequences in
eukaryotic submissions.
If you selected the submission portal option "Do not automatically trim or
remove sequences identified as contamination" then you will need
to adjust the sequences appropriately and then resubmit your sequences.
After you remove the contamination, trim any Ns at the ends of the sequence
and remove any sequences that are shorter than 200 nt and not part of a
multi-component scaffold.
Note that mismatches between the name of the adaptor/primer identified in the screen
and the sequencing technology used to generate the sequencing data should not be used
to discount the validity of the screen results as the adaptors/primers of many
different sequencing platforms share sequence similarity.
Mitochondrion:
[] Some sequences are similar to a mitochondrial sequence. There are three options:
(A) If you are not sequencing mitochondrial DNA, remove these sequences
(B) If you want to include the mitochondrial sequences in the genome submission,
label them as mitochondrial.
- To label the sequences in a BATCH submission, add a source qualifier in the fasta
definition line [location=mitochondrion]. See "IMPORTANT: Additional requirements
for batch submissions" at https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#batch_assignment
- If this is not a BATCH submission, indicate that these sequences are mitochondrial in the
Assignment tab of the submission portal.
- In addition, it is recommended but not required that you move these sequences to the end
of your submission or put them in a separate file so that they are clustered together.
(C) If these are nuclear mitochondrial pseudogene regions (numt's) that should remain
in the submission, please notify us by including a comment in the submission portal
or by emailing us. Note that numts should be integrated into the genome and therefore
should only be marked as "Trim". Any sequence marked as "Exclude" because of a hit
to mitochondria must either be removed (option A) or labeled as mitochondrial (option B).
The numt sequences will still appear in the contamination report, but once you have
removed all those marked to exclude and told us that the rest are numts, you can ignore
the errors. We will manually override the error when we review your submission.
Screened 1,340 sequences, 450,786,658 bp.
Note: 119 sequences with runs of Ns 10 bp or longer (or those longer that 20 MB) were split before screening.
82 sequences with locations to mask/trim
(85 split spans with locations to mask/trim)
Trim:
Sequence name, length, span(s), apparent source
000000F 10718699 1845347..1845478,3309508..3310062,3686734..3687014,4566627..4566818,4582486..4582837 mitochondrion
000001F 10374927 1243815..1244105,2211823..2212157,9224026..9225141 mitochondrion
000002F 9682142 6370591..6371853,8801874..8802128 mitochondrion
000003F 7674730 4397340..4397673 mitochondrion
000005F 2418719 1516348..1516987 mitochondrion
000006F 6130004 175695..176200,1368648..1368883,3578718..3579667 mitochondrion
000008F 6665304 4761589..4762119 mitochondrion
000009F 6358232 1807815..1807973,4641072..4641792,5428340..5428953,6047984..6048418 mitochondrion
000010F 6389901 125850..126094,734615..734785,3342761..3343000,4223963..4224288,4574033..4574247 mitochondrion
000011F 6114160 3569219..3569742,3831575..3832275,4024782..4024931 mitochondrion
000012F 6036981 362201..362901,4404978..4405102 mitochondrion
000014F 5682764 1281919..1282272,1542973..1546176 mitochondrion
000016F 5634279 3981592..3981804 mitochondrion
000017F 5750491 2812567..2812725,4850337..4850469 mitochondrion
000019F 5112736 2348398..2348536,3050337..3050496 mitochondrion
000020F 4868406 1067760..1068057 mitochondrion
000021F 5059621 2129428..2129566,2129772..2129915,2290329..2290617,2799695..2800583,3482351..3483948,4411681..4412559 mitochondrion
000022F 4776965 2818919..2820038,4671479..4671847 mitochondrion
000024F 4579434 2831427..2831869,3567969..3568394,3568477..3569059 mitochondrion
000027F 4339752 789609..789795,1859523..1859682,3773127..3773442 mitochondrion
000028F 3985823 2808230..2808648 mitochondrion
000029F 3964867 1158046..1158174 mitochondrion
000031F 3187727 1256145..1256401,2263958..2264177 mitochondrion
000032F 3811840 1599793..1600201,1670679..1670849,1814206..1814465,3447311..3449636,3450583..3451644 mitochondrion
000034F 1089070 1048969..1049389 mitochondrion
000037F 3207319 128503..128625 mitochondrion
000039F 3240232 2640248..2640458 mitochondrion
000041F 2450599 6165..6364 mitochondrion
000044F 3118497 2040129..2040647,2050634..2050780,2052035..2053538,2057614..2058735,2061235..2061465,2061745..2062442,2062838..2063368,2073753..2074168,2199171..2199291 mitochondrion
000049F 2859690 1387483..1387688 mitochondrion
000051F 2879007 1067486..1067707 mitochondrion
000054F 973803 802110..802413 mitochondrion
000056F 413187 123424..123612 mitochondrion
000059F 2699107 915725..915869,2191943..2192110 vector/etc
000065F 2564273 243386..243550,1925197..1925339 mitochondrion
000070F 2229282 242194..242402 mitochondrion
000072F 2391111 927082..936209,2067053..2067291 mitochondrion
000073F 2206652 810626..810938 mitochondrion
000075F 2264779 798452..798601 mitochondrion
000078F 2277580 311353..311502 mitochondrion
000079F 1897683 433075..433197 mitochondrion
000081F 1182999 217336..217595 mitochondrion
000085F 2139560 1007648..1008344,1508098..1508358 mitochondrion
000088F 1057830 539702..539889 mitochondrion
000094F 1916475 315..2828,5901..6138,9087..9847,11159..11280,1192361..1192573 vector/etc
000101F 1913017 1836912..1837265 mitochondrion
000106F 544451 465130..465426 mitochondrion
000107F 369373 207136..207549 mitochondrion
000111F 1862678 914190..914401 mitochondrion
000124F 1008822 538706..538873 mitochondrion
000126F 1697619 718111..718304 mitochondrion
000130F 996569 909966..910091 mitochondrion
000134F 1401600 225246..225418 mitochondrion
000152F 1393241 496944..497062,497544..497694,497912..498297 mitochondrion
000153F 145345 16164..17437,19532..20818,45089..45248,45468..45936 mitochondrion
000155F 1120018 1079313..1081130,1083130..1083260,1100918..1102204,1105792..1106264 mitochondrion
000176F 1108250 925549..926112 mitochondrion
000193F 544079 441736..441872,442932..443182 mitochondrion
000214F 671671 395139..396598,397750..398633,403849..403989,407040..407170,408501..408855,410142..410288,426065..427367,434035..434181,437765..440529,441249..441646,442958..443079,449066..449825,467011..468364,488540..490517,491663..491815,494263..496240,501342..501800,503141..503287,505112..507095,545573..545888,555106..557404,561338..562703,574191..575182,579479..581456,583281..583427,593852..594321,596071..597560,598713..599999,626530..627729,627877..628743,639847..640606,641962..642637,644460..644606,645618..645767 mitochondrion
000228F 681544 623035..623552 mitochondrion
000232F 703900 293603..293727 mitochondrion
000250F 601353 130722..130911 mitochondrion
000279F 455376 169724..170056 mitochondrion
000300F 371277 368532..369191 mitochondrion
000310F 274701 195298..195560,195835..196583,197476..197622,199798..200210,206114..206731,209044..209281,212557..213260,213354..214430,217838..217968,220149..221462 mitochondrion
000350F 293903 68656..69013,85101..85458,154665..155022 mitochondrion
000377F 261818 213534..213777 mitochondrion
000418F 226830 4793..5150 mitochondrion
000428F 135233 94503..94727 mitochondrion
000451F 209099 186575..186697 mitochondrion
000473F 166641 54212..55527,56679..57971 mitochondrion
000559F 76849 5249..5486,8518..9278,15994..16198 mitochondrion
000583F 99957 41053..41875 mitochondrion
000814F 104678 94246..94603,98318..98675 mitochondrion
000843F 93854 4727..5031,5705..6012 mitochondrion
000849F 45017 349..676,4818..6530,9459..9589,13997..14460 mitochondrion
000973F 77717 1..144 mitochondrion
001063F 69923 69635..69857 mitochondrion
001229F 63775 53560..55537,57362..57508,58849..58979,61904..63624 mitochondrion
001413F 45160 1284..1415 mitochondrion
001459F 47867 24759..24889,28935..31448,34518..34755,37747..38507,39794..39933 mitochondrion
001533F 39796 27600..27876 mitochondrion