Copy llama_load_model_from_file: using device Metal (Apple M3) - 15997 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 389 tensors from data/bge-large-en-v1.5-q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = bert
llama_model_loader: - kv 1: general.name str = bge-large-en-v1.5
llama_model_loader: - kv 2: bert.block_count u32 = 24
llama_model_loader: - kv 3: bert.context_length u32 = 512
llama_model_loader: - kv 4: bert.embedding_length u32 = 1024
llama_model_loader: - kv 5: bert.feed_forward_length u32 = 4096
llama_model_loader: - kv 6: bert.attention.head_count u32 = 16
llama_model_loader: - kv 7: bert.attention.layer_norm_epsilon f32 = 0.000000
llama_model_loader: - kv 8: general.file_type u32 = 7
llama_model_loader: - kv 9: bert.attention.causal bool = false
llama_model_loader: - kv 10: bert.pooling_type u32 = 2
llama_model_loader: - kv 11: tokenizer.ggml.token_type_count u32 = 2
llama_model_loader: - kv 12: tokenizer.ggml.bos_token_id u32 = 101
llama_model_loader: - kv 13: tokenizer.ggml.eos_token_id u32 = 102
llama_model_loader: - kv 14: tokenizer.ggml.model str = bert
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 100
llama_model_loader: - kv 19: tokenizer.ggml.seperator_token_id u32 = 102
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 21: tokenizer.ggml.cls_token_id u32 = 101
llama_model_loader: - kv 22: tokenizer.ggml.mask_token_id u32 = 103
llama_model_loader: - kv 23: general.quantization_version u32 = 2
llama_model_loader: - type f32: 244 tensors
llama_model_loader: - type q8_0: 145 tensors
llm_load_vocab: control token: 100 '[UNK]' is not marked as EOG
llm_load_vocab: control token: 101 '[CLS]' is not marked as EOG
llm_load_vocab: control token: 0 '[PAD]' is not marked as EOG
llm_load_vocab: control token: 102 '[SEP]' is not marked as EOG
llm_load_vocab: control token: 103 '[MASK]' is not marked as EOG
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 5
llm_load_vocab: token to piece cache size = 0.2032 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = bert
llm_load_print_meta: vocab type = WPM
llm_load_print_meta: n_vocab = 30522
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 512
llm_load_print_meta: n_embd = 1024
llm_load_print_meta: n_layer = 24
llm_load_print_meta: n_head = 16
llm_load_print_meta: n_head_kv = 16
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 1.0e-12
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 4096
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 0
llm_load_print_meta: pooling type = 2
llm_load_print_meta: rope type = 2
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 512
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 335M
llm_load_print_meta: model ftype = Q8_0
llm_load_print_meta: model params = 334.09 M
llm_load_print_meta: model size = 340.90 MiB (8.56 BPW)
llm_load_print_meta: general.name = bge-large-en-v1.5
llm_load_print_meta: BOS token = 101 '[CLS]'
llm_load_print_meta: EOS token = 102 '[SEP]'
llm_load_print_meta: UNK token = 100 '[UNK]'
llm_load_print_meta: SEP token = 102 '[SEP]'
llm_load_print_meta: PAD token = 0 '[PAD]'
llm_load_print_meta: CLS token = 101 '[CLS]'
llm_load_print_meta: MASK token = 103 '[MASK]'
llm_load_print_meta: LF token = 0 '[PAD]'
llm_load_print_meta: EOG token = 102 '[SEP]'
llm_load_print_meta: max token length = 21
llm_load_tensors: tensor 'token_embd.weight' (q8_0) (and 4 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
ggml_backend_metal_log_allocated_size: allocated buffer, size = 307.23 MiB, ( 693.44 / 16384.02)
llm_load_tensors: offloading 24 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 25/25 layers to GPU
llm_load_tensors: Metal_Mapped model buffer size = 307.23 MiB
llm_load_tensors: CPU_Mapped model buffer size = 33.69 MiB
................................................................................
llama_new_context_with_model: n_seq_max = 1
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_ctx_per_seq = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3
ggml_metal_init: picking default device: Apple M3
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name: Apple M3
ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has bfloat = true
ggml_metal_init: use bfloat = false
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 17179.89 MB
ggml_metal_init: loaded kernel_add 0x104541960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_add_row 0x1218415b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub 0x121841870 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub_row 0x121841b30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul 0x1107e7c70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_row 0x1218408e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div 0x121735910 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div_row 0x1107e74c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f32 0x1107e7780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f16 0x12210d300 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i32 0x1107e8e30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i16 0x12187e120 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale 0x104ff3320 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale_4 0x104e4d4a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_clamp 0x1107e9920 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_tanh 0x122122e50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_relu 0x1107ea300 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sigmoid 0x122123110 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu 0x1221233d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_4 0x12187e660 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick 0x1107ea950 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick_4 0x12187ebb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu 0x12187f0f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu_4 0x12187f630 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_elu 0x12187fb70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f16 0x1218800b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f16_4 0x104ff35e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f32 0x121880370 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f32_4 0x122123690 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf 0x121880630 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf_8 0x1107ebd20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f32 0x1107ead60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f16 0x1218808f0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_get_rows_bf16 (not supported)
ggml_metal_init: loaded kernel_get_rows_q4_0 0x121880bb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_1 0x121880e70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_0 0x1107ecac0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_1 0x121881130 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q8_0 0x104ff38a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q2_K 0x1218813f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q3_K 0x104ff3b60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_K 0x1107ed960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_K 0x1218816b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q6_K 0x121881970 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xxs 0x121881c30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xs 0x1107edc80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_xxs 0x1107edf40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_s 0x1107ee2f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_s 0x121881ef0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_s 0x1218821b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_m 0x121882470 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_nl 0x122123950 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_xs 0x121882730 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_i32 0x1218829f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rms_norm 0x121882cb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_group_norm 0x1107ef9d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_norm 0x1107ee810 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_ssm_conv_f32 0x121882f70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_ssm_scan_f32 0x104ff4590 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f32_f32 0x1107efdc0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mv_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16 (not supported)
ggml_metal_init: loaded kernel_mul_mv_f16_f32 0x1107f00d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_1row 0x121883670 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_l4 0x104ff4850 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f16 0x121883c30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_0_f32 0x104ff5920 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_1_f32 0x122123c10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_0_f32 0x121883ef0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_1_f32 0x1107f04d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q8_0_f32 0x1107f1c10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_2 0x1107f1690 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_3 0x1218841b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_4 0x122123ed0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_5 0x122124190 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_2 0x121884620 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_3 0x1107f2060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_4 0x121884d30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_5 0x122124450 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_2 0x1107f28e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_3 0x1107f3a20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_4 0x12170d600 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_5 0x1107f3220 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_2 0x1107f34e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_3 0x121885310 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_4 0x1107f4b40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_5 0x121886150 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_2 0x1107f58a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_3 0x12170d8c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_4 0x1107f6690 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_5 0x122124710 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_2 0x1221249d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_3 0x121886890 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_4 0x121887770 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_5 0x12170db80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_2 0x122124c90 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_3 0x1218859a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_4 0x122124f50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_5 0x122125210 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_2 0x1221254d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_3 0x122125790 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_4 0x12170e010 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_5 0x121888bc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_2 0x122125a50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_3 0x1218880c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_4 0x122126090 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_5 0x1221269b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_2 0x121888540 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_3 0x1221272c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_4 0x1107f6b30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_5 0x1107f7440 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q2_K_f32 0x122127b70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q3_K_f32 0x121889ca0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_K_f32 0x12188a680 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_K_f32 0x122127f40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q6_K_f32 0x122f0cdd0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_xxs_f32 0x1240140c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_xs_f32 0x122f0c910 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq3_xxs_f32 0x122128990 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq3_s_f32 0x12188aa70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_s_f32 0x12188bd00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq1_s_f32 0x122128310 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq1_m_f32 0x12170ed50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq4_nl_f32 0x1107f7b60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq4_xs_f32 0x12188b5b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_f32_f32 0x12170e760 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_f16_f32 0x12170f640 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mv_id_q4_0_f32 0x12188b870 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_1_f32 0x12188d260 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_0_f32 0x12188dbc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_1_f32 0x12188c6c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q8_0_f32 0x12188cbb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q2_K_f32 0x122129110 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q3_K_f32 0x1107f89b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_K_f32 0x12170ff20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_K_f32 0x12188ee30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q6_K_f32 0x1107f7f30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xxs_f32 0x1107f9140 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xs_f32 0x122129770 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_xxs_f32 0x12188fc70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_s_f32 0x1107f9f60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_s_f32 0x1218904c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_s_f32 0x121712640 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_m_f32 0x121710970 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_nl_f32 0x12212b310 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_xs_f32 0x121890c50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_f32_f32 0x121891370 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_f16_f32 0x121891790 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mm_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32 0x12212a6d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32 0x121892420 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_0_f32 0x12212bd10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_1_f32 0x1107fae50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32 0x12212cb30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32 0x121711b70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32 0x12212d350 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32 0x12212dba0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32 0x1107fb6e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32 0x1107f96c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_xxs_f32 0x12212c0c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_xs_f32 0x1107fc3b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq3_xxs_f32 0x12212e430 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq3_s_f32 0x12212f060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_s_f32 0x1107fcb40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq1_s_f32 0x121741bb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq1_m_f32 0x121711580 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq4_nl_f32 0x1107fc670 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq4_xs_f32 0x121892720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_f32_f32 0x122130920 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_f16_f32 0x121893770 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mm_id_q4_0_f32 0x12212eb10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_1_f32 0x1217423e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_0_f32 0x122131950 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_1_f32 0x121894230 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q8_0_f32 0x1107fe260 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q2_K_f32 0x1107fd550 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q3_K_f32 0x122132240 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_K_f32 0x122132a10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_K_f32 0x121711170 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q6_K_f32 0x121743ca0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xxs_f32 0x1221333e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xs_f32 0x122133d00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_xxs_f32 0x1107fef00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_s_f32 0x1218949f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_s_f32 0x121744430 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_s_f32 0x1107fe940 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_m_f32 0x122132cd0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_nl_f32 0x121894f60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_xs_f32 0x121895b40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_norm_f32 0x1107fdac0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_norm_f16 0x122134410 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_neox_f32 0x121745310 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_neox_f16 0x12170b210 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_f16 0x1107ffbc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_f32 0x1221346d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_ext_f16 0x124404150 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_ext_f32 0x12170b640 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_conv_transpose_1d_f32_f32 0x122135100 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_conv_transpose_1d_f16_f32 0x122f0d2b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_upscale_f32 0x1240146e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pad_f32 0x124404850 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pad_reflect_1d_f32 0x121895640 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_timestep_embedding_f32 0x1221357b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_arange_f32 0x1244051a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argsort_f32_i32_asc 0x1218965d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argsort_f32_i32_desc 0x124405890 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_leaky_relu_f32 0x121896c30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h64 0x1221370b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h80 0x1217459d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h96 0x121897db0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h112 0x124407ec0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h128 0x124408180 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h256 0x124408950 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h64 0x124409820 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h80 0x124409220 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h96 0x122137860 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h112 0x12440a140 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h128 0x122137e80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h256 0x122136a60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h64 0x122138730 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h80 0x122139c70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h96 0x121898570 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h112 0x12440aa70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h128 0x12440be00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h256 0x122139310 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h64 0x12213ada0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h80 0x1218994f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h96 0x12440b440 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h112 0x121899e00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h128 0x12440b800 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h256 0x12440cb90 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h64 0x12170cb60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h80 0x12170bb60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h96 0x12189adb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h112 0x121747070 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h128 0x1217479b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h256 0x12213ba30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h64 0x12213a900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h80 0x12440db70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h96 0x12213c970 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h112 0x12440e480 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h128 0x12213d270 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h256 0x12440cf20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h128 0x12213db70 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h128 0x121747fd0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h128 0x12189b9d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h128 0x12189b580 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h128 0x12213e4d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h128 0x12189a960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h256 0x12440e7c0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h256 0x12213eda0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h256 0x12189d4d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h256 0x12440f6d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h256 0x12440eca0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h256 0x12170c550 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_set_f32 0x12440ff80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_set_i32 0x12189ced0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_f32 0x124410240 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_f16 0x1217483e0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_cpy_f32_bf16 (not supported)
ggml_metal_init: loaded kernel_cpy_f16_f32 0x12213f1c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f16_f16 0x12213fae0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported)
ggml_metal_init: loaded kernel_cpy_f32_q8_0 0x12189ddd0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q4_0 0x121749900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q4_1 0x124410500 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q5_0 0x1217493a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q5_1 0x124411b70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_iq4_nl 0x1221404f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_concat 0x12189e8c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sqr 0x12189c5f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sqrt 0x121748e80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sin 0x1244113b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cos 0x12174acf0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sum_rows 0x124412e60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argmax 0x12174bbb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pool_2d_avg_f32 0x124412130 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pool_2d_max_f32 0x1244145f0 | th_max = 1024 | th_width = 32
llama_kv_cache_init: Metal KV buffer size = 48.00 MiB
llama_new_context_with_model: KV self size = 48.00 MiB, K (f16): 24.00 MiB, V (f16): 24.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.00 MiB
llama_new_context_with_model: Metal compute buffer size = 25.00 MiB
llama_new_context_with_model: CPU compute buffer size = 5.01 MiB
llama_new_context_with_model: graph nodes = 849
llama_new_context_with_model: graph splits = 2
Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | MATMUL_INT8 = 1 | ACCELERATE = 1 | AARCH64_REPACK = 1 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | MATMUL_INT8 = 1 | ACCELERATE = 1 | AARCH64_REPACK = 1 |
Model metadata: {'tokenizer.ggml.cls_token_id': '101', 'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.seperator_token_id': '102', 'tokenizer.ggml.unknown_token_id': '100', 'general.quantization_version': '2', 'tokenizer.ggml.token_type_count': '2', 'general.file_type': '7', 'tokenizer.ggml.eos_token_id': '102', 'bert.context_length': '512', 'bert.pooling_type': '2', 'tokenizer.ggml.bos_token_id': '101', 'bert.attention.head_count': '16', 'bert.feed_forward_length': '4096', 'tokenizer.ggml.mask_token_id': '103', 'tokenizer.ggml.model': 'bert', 'bert.attention.causal': 'false', 'general.name': 'bge-large-en-v1.5', 'bert.block_count': '24', 'bert.attention.layer_norm_epsilon': '0.000000', 'bert.embedding_length': '1024', 'general.architecture': 'bert'}
Using fallback chat format: llama-2
llama_perf_context_print: load time = 59.73 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 8 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 59.76 ms / 9 tokens
llama_perf_context_print: load time = 59.73 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 62 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 29.66 ms / 63 tokens
llama_perf_context_print: load time = 59.73 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 47 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 28.45 ms / 48 tokens
llama_perf_context_print: load time = 59.73 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 40 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 31.09 ms / 41 tokens
llama_perf_context_print: load time = 59.73 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 40 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 30.95 ms / 41 tokens
llama_perf_context_print: load time = 59.73 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 54 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 28.12 ms / 55 tokens