Add vision capability to the workflow. — add_vision

add_vision_capability lets you declare that this model can support the description or extraction of information from images. Only few models support such capabilities.

Usage

add_vision_capability(workflow_obj, max_image_dimension = NA)

Arguments

workflow_obj: A workflow object containing all parameters describing the workflow required
max_image_dimension: A numerical value (defaults to 672 if not provided) that defines the largest dimension (width or height) of the pictures to be sent to the model.

Details

Lets you declare that this workflow can leverage vision capabilities (i.e. you can also send images on top of the prompt, optionally). Make sure that the model you include in your workflow has such vision capability in the first place. It is usually limited to models like llava, moondream, and llama3.2-11b models (while there are probably more).

Examples

myflow_test <- ai_workflow() |>
   set_connector("ollama")  |> 
   set_model(model_name= "llama3.1:8b-instruct-q5_K_M") |>
   set_n_predict(1000) |>
   set_temperature(0.8) |> 
   set_default_missing_parameters_in_workflow() |> 
   add_vision_capability()
#> → Default IP address has been set to 127.0.0.1.
#> → Default port has been set to 11434.
#> → Frequency Penalty was not specified and given a default value of 1.
#> → Presence Penalty was not specified and given a default value of 1.5.
#> → Repeat Penalty was not specified and given a default value of 1.2.
#> → Mode was not specified and 'chat' was selected by default.
#> → System Prompt was not specified and given a default value of 'You are a helpful AI assistant.'.
#> → No numerical max_image_dimension provided, will default to resizing all images to 672 pixels as max dimension.