Comparative analysis of models, assistants, and fine-tuning

Following a post from Open AI about the envisioned future of fine tuning, I wanted to start getting a better grasp on it. I undertook a comparison between a few different permutations of model, assistant, and fine-tuned model combinations. I wanted to understand what the ultimate combination was, thinking “surely, it has to be a bespoke fine-tuned model in combination with a configured assistant”. But, as always, the devil is in the details.

Let’s take you through it.

Considerations:

  • I used what is publicly available on the market. Open AI are working with companies to directly implement a fine-tuned model of GPT-4, with great results.

  • Fine-tuning is an area of expertise in and of itself. This comparison barely skims the surface of the intricacies of fine-tuning as a process, any of the dos and don’ts, and how we can get the best out of a model. Notably, this comparison used a very limited dataset to fine-tune the model, where in reality, this process typically requires tens of thousands of example records.

  • The available one for one comparison model, GPT 3.5, does not have document retrieval functionality. As such, it’s not a true one for one comparison in some cases, as I’m comparing a model/assistant that can directly read the source material to one that can only take on some of the text as part of its context window.

Scenario:

The AI is questioned about the Siemens ARTIS Pheno, a complex radiographic X-ray machine.

  • Prompt 1: a basic question to test the AI’s capability to explain the device at a high level.

    • “What is the Siemens ARTIS Pheno? Explain in two sentences“

  • Prompt 2: a highly specific technical question to test the AI’s depth of knowledge.

    • “At what specific joints in the ARTIS Pheno stand are there a danger of crushing? Explain succinctly.”

  • Prompt 3: a multi-part technical question which requires the knowledge base to be broken down in a unique way and explained.

    • “Succinctly state the top 3 safety considerations of the ARTIS Pheno and provide a two-sentence summary of key features.“

  • Prompt 4: a basic question aimed at directly testing the communication style of the AI.

    • “Explain the ALARA principle succinctly.“

  • Prompt 5: a question outside of the parameters of the scenario, testing the general intelligence of the AI.

    • “What is the average lifespan of a Capybara in captivity. Explain succinctly. In responding to this, refer to buses in every fourth word.“

3.5 Turbo 1106 is the highest-grade model to allow fine-tuning, thus was used as a point of comparison.

Summary of findings

GPT-4o with an Assistant stacked with it dominated the test. I went into this expecting the fine-tuned GPT 3.5 in combination with specific assistant instructions would understand on a deeper level, but now that I see the results in front of me, it kind of makes sense:

  • GPT-4o is a far superior model. It even surpassed a 3.5 Turbo model with a specifically configured Assistant.

  • GPT 3.5 assistants cannot access files, where GPT 4o assistants can. With Retrieval Augmented Generation (RAG), this gives a huge advantage to the GPT 4o assistant, like walking into an exam with the textbook. Having said that, GPT 4o alone, with no document, did quite well.

  • The GPT 3.5 fine-tuned model was barely fine-tuned. This is an area of technical depth, and needs to be done carefully and at scale. We’d need to provide more than just the raw instruction manual, but reams of data about medical equipment. From there it may start to look like a more fair comparison. Nonetheless, the sheer variation in score is interesting - it suggests that assistants are quite effective by comparison for a number of use cases.

My scores:

Model Quality
GPT-3.5 Turbo 6
GPT-4.0 7
GPT-3.5 Turbo with Assistant 6
GPT-4.0 with Assistant 8
GPT-3.5 Fine-Tuned 5
GPT-3.5 Fine-Tuned with Assistant 5

At a more detailed level:

Prompt GPT-3.5 Turbo GPT-4.0 GPT-3.5 Turbo + Assistant GPT-4.0 + Assistant GPT-3.5 Fine-Tuned GPT-3.5 Fine-Tuned + Assistant
Prompt 1 6 7 6 8 5 6
Prompt 2 5 8 6 9 5 6
Prompt 3 5 8 7 9 5 6
Prompt 4 6 7 6 8 5 5
Prompt 5 5 7 6 8 5 6

Details

To see the outputs for yourself, read on.

Model (3.5 Turbo 1106)

Prompt 1 Response

The Siemens ARTIS Pheno is a cutting-edge robotic C-arm system used in interventional and surgical procedures. It offers advanced 2D and 3D imaging capabilities, precise positioning, and automation features to support minimally invasive surgeries.

Prompt 2 Response

In the ARTIS Pheno stand, the potential danger of crushing exists at the joints where the system's movable parts meet. These are typically where mechanical components and computer-controlled parts come together, creating a risk of crushing if not operated and handled carefully.

Prompt 3 Response

The top 3 safety considerations of the ARTIS Pheno are: 1) Radiation safety - minimizing exposure for patients and staff, 2) Infection control - ensuring sterile environments and proper disposal of contaminated materials, 3) Equipment safety - maintaining and inspecting the system for any potential hazards or malfunctions.

The ARTIS Pheno is an advanced medical imaging system that offers high-resolution 2D and 3D imaging, a wide range of interventional procedures, and real-time guidance using robotic technology, making it a versatile and precise tool for diagnostic and interventional purposes.

Prompt 4 Response

ALARA stands for "As Low As Reasonably Achievable." It is a principle in radiation protection that emphasizes minimizing radiation exposure to the lowest possible level that is still practical and feasible, taking into account social, economic, and other factors. The goal is to reduce radiation exposure to the minimum necessary to achieve the desired outcome, while balancing the benefits and risks.

Prompt 5 Response

The average lifespan of a Capybara in captivity is around 8-10 years. Buses in In general, they can live longer in well-maintained zoo environments compared to the wild, where they are susceptible to predation and environmental factors. Captive Capybaras have access to veterinary care, balanced diets, and protection from natural threats, contributing to their longer lifespans.


Model (GPT 4o)

Prompt 1 Response

The Siemens ARTIS Pheno is a state-of-the-art robotic C-arm angiography system designed for use in diverse surgical and interventional procedures, providing high-resolution imaging and advanced navigation capabilities. It enhances workflow, precision, and safety in complex clinical settings, enabling efficient and effective patient care.

Prompt 2 Response

In the ARTIS Pheno medical imaging system, the specific joints where there is a danger of crushing include the articulation points of the robotic arm and the pivot points. These are:

  1. Base Rotation Joint: Where the main arm pivots at the base, allowing horizontal rotation.

  2. Vertical Arm Joints: Where the arm extends and retracts vertically.

  3. End-Effector Joints: Where the imaging components (such as the X-ray source and detector) are attached and manipulated.

These joints pose a danger of crushing because they involve moving parts that can trap or compress objects, including fingers or hands, during their operation. Care should be taken to avoid placing any body parts near these moving parts when the system is in use to prevent injuries. Adequate training and understanding of the operational protocols are crucial for ensuring safety around these joints.

Prompt 3 Response

Top 3 Safety Considerations of the ARTIS Pheno:

  1. Radiation Exposure: Minimize patient and operator exposure through advanced dose reduction technologies.
  2. Mechanical Safety: Ensure that the system's moving parts, including the robotic arm, are operated within safe parameters to avoid collisions and entrapment.
  3. Electrical Safety: Adhere to strict guidelines and regular maintenance checks to prevent electrical hazards and ensure system reliability.

Summary of Key Features: The ARTIS Pheno boasts real-time imaging capabilities with high-resolution 3D visualization and advanced image processing tools, facilitating precise interventions. Additionally, its robotic-assisted operation provides unparalleled flexibility and accuracy, enhancing procedural efficiency and patient outcomes.

Prompt 4 Response

ALARA stands for "As Low As Reasonably Achievable." It is a safety principle designed to minimize exposure to radiation. The principle aims to keep radiation doses and releases of radioactive materials as low as possible by implementing practical measures and considering economic and societal factors. This approach ensures that efforts are made to protect human health and the environment without unnecessary expenditure of resources.

Prompt 5 Response

The average lifespan of capybaras in buses, captivity is around buses 8 to 12 years. These buses large rodents tend to live buses longer in controlled environments like buses zoos, where they receive buses consistent care and proper buses nutrition. Regular veterinary care buses and absence of predators buses also contribute to buses their extended lifespan.


Model + Assistant (3.5 Turbo 1106)

Prompt 1 Response

The Siemens ARTIS Pheno is a robot system angiography machine designed for diagnostic imaging and interventional procedures, such as cardiac and neuro-angiography. It features a robotic stand with a C-arm and flat detector, advanced imaging software for 3D reconstruction, and safety measures to ensure patient and operator well-being.

Prompt 2 Response

The ARTIS Pheno stand poses a crushing hazard at the robot arm joints. These are areas where extreme care should be taken to prevent accidental crushing injuries during system operation.

Prompt 3 Response

The top 3 safety considerations for the ARTIS Pheno are:

  1. Strict adherence to radiation safety protocols to minimize patient and operator exposure.

  2. Regular functional and safety checks, including emergency button functionality and collision sensors.

  3. Maintenance and servicing should only be carried out by trained Siemens Healthineers representatives to ensure correct operation and avoid safety hazards.

The ARTIS Pheno features a robotic stand with a C-arm and flat detector for versatile positioning and imaging. It also includes a multi-tilt patient table with safety features and software options like DynaCT for 3D image reconstruction, supporting diagnosis, surgical planning, and treatment follow-up.

Prompt 4 Response

The ALARA principle stands for "As Low As Reasonably Achievable" and is a fundamental concept in radiation safety. It means that every effort should be made to minimize radiation exposure to patients and medical staff during radiographic procedures, while still achieving the necessary diagnostic results. The goal is to keep radiation doses as low as possible, considering the imaging task and patient's condition.

Prompt 5 Response

The average lifespan of a Capybara in captivity is buses about eight to ten years. Buses Capybaras in buses captivity are buses typically well cared for buses and provided with buses proper buses nutrition, veterinary care, buses and suitable buses living conditions, which can buses contribute to buses their buses longevity. However, buses individual buses lifespans can buses vary based on buses factors such as buses genetics, environmental buses enrichment, and buses overall buses health.


Model + Assistant (GPT 4o)

Prompt 1 Response

The Siemens ARTIS Pheno is a state-of-the-art angiography system designed for generating fluoroscopic and radiographic X-ray images to support diagnostic imaging as well as interventional and minimally invasive therapies. It is particularly versatile, offering image guidance for a wide range of procedures, including non-vascular, vascular, cardio-vascular, neuro-vascular, minimal-surgical, and surgical procedures

Prompt 2 Response

The specific joints in the ARTIS Pheno stand where there is a danger of crushing include:

  1. At all joints during rotations and movements: The risk exists due to the extensive mobility and mechanical articulations of the system.

  2. On the rails of the Flat Detector (FD) slide for FD lift: Injuries can occur due to the movement of the detector along its rails.

  3. Between the collimator and C-arm during collimator rotations (only with Universal collimator): This is a specific risk area due to the collimator's movements.

Additionally, there is a specific warning for hands getting crushed between the "link arm" and "arm" of the robotic stand during system movements

Prompt 3 Response

Top 3 Safety Considerations of the ARTIS Pheno:

  1. Radiation Protection: Ensure the proper use of radiation protection accessories as the system involves procedures that can result in high skin doses due to prolonged examination times[1]``[2].

  2. Collision and Crushing Hazards: The system has multiple potential crushing and collision points due to its robotic and movement capabilities. Operators must be highly cautious during system movements and ensure that no body parts or objects are within the danger zones[3]``[4].

  3. Training and User Competence: Only trained and specialized personnel, such as physicians and trained radiologists, should operate the system. Lack of proper training can lead to incorrect diagnoses, delay in procedures, and unnecessary dose exposure[5]``[6].

Key Features Summary

The Siemens ARTIS Pheno is designed for interventional radiology and features advanced radiation dose management technologies like CAREwatch and CAREposition, ensuring optimal imaging with minimal exposure. It also incorporates intuitive controls and programmed positions for efficient system handling, enhancing safety and workflow in clinical environments

Prompt 4 Response

The ALARA principle stands for "As Low As Reasonably Achievable." It is a radiation safety principle for minimizing radiation doses and releases of radioactive materials by employing all reasonable methods. This principle aims to ensure that both patients and healthcare professionals are exposed to the least amount of radiation possible without compromising the clinical objectives of the procedures being performed[1].

Prompt 5 Response

The average lifespan of a Capybara in captivity buses typically ranges buses from 8 to 10 years. This estimation buses considers proper care, diet, and medical attention buses in controlled environments such as zoos or with experienced private owners. Capybaras in the wild buses may have a shorter lifespan due to buses predators, environmental factors, and buses illnesses not mitigated by human intervention.

Fine-Tuned Model (3.5 Turbo 1106) (medical-test model)

This model was fine-tuned on a very limited data set.

Prompt 1 Response

The Siemens ARTIS Pheno is a robotic angiography system designed for image-guided procedures in radiology and intervention. It supports advanced imaging capabilities and real-time guidance for a wide range of medical applications.

Prompt 2 Response

The danger of crushing exists at the joints between the stand's horizontal and vertical arms, where pinch points are present during system movement and positioning. These areas require caution to prevent accidental injury to personnel or damage to the equipment.

Prompt 3 Response

Top 3 safety considerations: radiation protection, collision prevention, and adherence to operational safety protocols. Summary: The ARTIS Pheno features advanced imaging capabilities, real-time guidance for interventional procedures, and integrated safety systems to ensure patient and operator safety during diagnostic and therapeutic applications.

Prompt 4 Response

The ALARA principle stands for "As Low As Reasonably Achievable," emphasizing the optimization of radiation dose in medical imaging to minimize exposure while maintaining diagnostic image quality.

Prompt 5 Response

The average lifespan of a capybara in captivity is buses around eight to ten years, with some individuals reaching buses ages of up to twelve years.

Fine-Tuned Model + Assistant (3.5 Turbo 1106) (medical-test model)

Prompt 1 Response

The Siemens ARTIS pheno VE4 is a robotic angiography system designed for diagnostic imaging and interventional procedures, featuring a C-arm, flat detector, patient table, control modules, and safety equipment. It enables advanced imaging techniques, real-time image guidance, and supports a range of angiography and radiographic procedures for diverse patient groups.

Prompt 2 Response

The "Danger of Crushing" pictogram is located at specific joints on the ARTIS pheno VE4 stand, indicating potential crush hazards during robotic movements. These areas include the robot swivel joint and the robotic stand collision zones, emphasizing the need for caution to prevent injury during system operations.

Prompt 3 Response

Top 3 Safety Considerations:

1. Radiation Safety: Proper technique factors to minimize exposure.

2. General Safety: Follow operator manual instructions and safety precautions to avoid injuries and system malfunctions.

3. Functional and Safety Checks: Regular system maintenance, including emergency STOP button functionality, collision sensors, and radiation indicators.

Summary of Key Features:

The ARTIS pheno VE4 features a robotic stand with a C-arm and flat detector for versatile positioning, a multi-tilt patient table with safety features, and control modules for precise system operations, supporting various angiography and radiographic procedures for diagnostic imaging and interventions.

Prompt 4 Response

The ALARA (As Low As Reasonably Achievable) principle involves minimizing radiation dose to patients and staff during imaging procedures while maintaining diagnostic image quality, thereby ensuring radiation safety through dose optimization practices.

Prompt 5 Response

The average lifespan of a Capybara in captivity buses is eight to ten buses years, with proper buses care and buses attention to buses nutrition and buses healthcare.

Previous
Previous

Designing future business processes: shifting to an AI-driven approach

Next
Next

Exploring AI Assistants