RepeatModeler Version 2.0.7 =========================== Using output directory = /dev/shm/rModeler.HR174x/RM_404605.TueSep90321462025 Search Engine = rmblast 2.14.1+ Threads = 32 Dependencies: TRF 4.09, RECON , RepeatScout 1.0.7, RepeatMasker 4.2.1, RepeatAfterMe 0.0.7 LTR Structural Analysis: Disabled [use -LTRStruct to enable] Random Number Seed: 1757413305 Database = /dev/shm/rModeler.HR174x/GCA_003724095.1_ASM372409v1 - Sequences = 147937 - Bases = 1204132922 - N50 = 60621 - Contig Histogram: Size(bp) Count ----------------------------------------------------------------------- 735089-787528 | [ 2 ] 682650-735088 | [ 1 ] 630211-682649 | [ 2 ] 577773-630211 | [ 2 ] 525334-577772 | [ 11 ] 472895-525333 | [ 3 ] 420456-472894 | [ 14 ] 368018-420456 | [ 32 ] 315579-368017 | [ 59 ] 263140-315578 | [ 127 ] 210701-263139 | [ 276 ] 158263-210701 | [ 576 ] 105824-158262 | [ 1196 ] 53385-105823 |* [ 3061 ] 947-53385 |************************************************** [ 142575 ] Storage Throughput = excellent ( 1647.46 MB/s ) RepeatModeler Round # 1 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 40000000 bp - Final Sample Size = 44061298 bp ( 40004365 non ambiguous ) - Num Contigs Represented = 5734 - Sequence extraction : 00:00:02 (hh:mm:ss) Elapsed Time -- Running RepeatScout on the sequences... - RepeatScout: Running build_lmer_table ( l = 14, min = 10 ).. - RepeatScout: Running RepeatScout.. : 231 raw families identified - RepeatScout: Running filtering stage.. 202 families remaining - RepeatScout: 00:03:42 (hh:mm:ss) Elapsed Time - Collecting repeat instances... - Refining 202 families... 00:04:10 (hh:mm:ss) Elapsed Time - Redundant Families and Large Satellite Filtering.. : 1 satellite(s), 67 contained, found in 00:00:02 (hh:mm:ss) Elapsed Time Family Refinement: 00:00:03 (hh:mm:ss) Elapsed Time Round Time: 00:08:00 (hh:mm:ss) Elapsed Time : 134 families discovered. RepeatModeler Round # 2 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 10000000 bp - Sequence extraction : 00:00:00 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:00:17 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 7281 repeats masked totaling 7431717 bp(s). - TE Masking time 00:00:19 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 11109407 bp Num Contigs Represented = 1408 Non ambiguous bp: Initial: 10003165 bp After Masking: 2521396 bp Masked: 74.79 % -- Input Database Coverage: 11109407 bp out of 1204132922 bp ( 0.92 % ) Sampling Time: 00:00:37 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 1006071 Comparison Time: 00:07:24 (hh:mm:ss) Elapsed Time, 4883 HSPs Collected Number of families returned by RECON: 446 Round Time: 00:09:23 (hh:mm:ss) Elapsed Time : 6 families discovered. RepeatModeler Round # 3 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 30000000 bp - Sequence extraction : 00:00:01 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:00:50 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 22200 repeats masked totaling 23102202 bp(s). - TE Masking time 00:00:51 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 32951888 bp Num Contigs Represented = 4344 Non ambiguous bp: Initial: 30001197 bp After Masking: 6756186 bp Masked: 77.48 % -- Input Database Coverage: 44061295 bp out of 1204132922 bp ( 3.66 % ) Sampling Time: 00:01:44 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 9541896 Comparison Time: 00:24:14 (hh:mm:ss) Elapsed Time, 18239 HSPs Collected Number of families returned by RECON: 1310 Round Time: 00:26:28 (hh:mm:ss) Elapsed Time : 27 families discovered. RepeatModeler Round # 4 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 90000000 bp - Sequence extraction : 00:00:05 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:02:29 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 70095 repeats masked totaling 71867713 bp(s). - TE Masking time 00:02:54 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 99940268 bp Num Contigs Represented = 12975 Non ambiguous bp: Initial: 90001111 bp After Masking: 17696288 bp Masked: 80.34 % -- Input Database Coverage: 144001563 bp out of 1204132922 bp ( 11.96 % ) Sampling Time: 00:05:33 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 86678361 Comparison Time: 01:30:41 (hh:mm:ss) Elapsed Time, 97731 HSPs Collected Number of families returned by RECON: 3534 Round Time: 01:41:05 (hh:mm:ss) Elapsed Time : 132 families discovered. RepeatModeler Round # 5 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 270000000 bp - Sequence extraction : 00:00:13 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:07:30 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 231502 repeats masked totaling 225533203 bp(s). - TE Masking time 00:16:09 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 298036345 bp Num Contigs Represented = 38543 Non ambiguous bp: Initial: 270022495 bp After Masking: 43254248 bp Masked: 83.98 % -- Input Database Coverage: 442037908 bp out of 1204132922 bp ( 36.71 % ) Sampling Time: 00:24:06 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 802742346 Comparison Time: 06:17:50 (hh:mm:ss) Elapsed Time, 315516 HSPs Collected Number of families returned by RECON: 9506 Round Time: 06:52:54 (hh:mm:ss) Elapsed Time : 369 families discovered. RepeatScout/RECON discovery complete: 668 families found # # RepeatClassifier # # Version 2.0.7 # Threads: 32 # Current Working Directory: /dev/shm/rModeler.HR174x/RM_404605.TueSep90321462025 # Protein Library: /hive/data/outside/RepeatMasker/RepeatMasker-4.2.1/Libraries/RepeatPeps.lib # - 18011 proteins # Consensi Library: /hive/data/outside/RepeatMasker/RepeatMasker-4.2.1/Libraries/RepeatMasker.lib # - 26292 consensus sequences - Looking for simple/tandem and low complexity sequences.. - Looking for similarity to known repeat proteins.. - Looking for similarity to known repeat consensi.. Classification Time: 00:03:14 (hh:mm:ss) Elapsed Time Program Time: 09:21:04 (hh:mm:ss) Elapsed Time