AI Code Security Check - Phaeton Documentation

For security purposes, model code that will be send to sensitive data has to be check for safety.
To do an initial check of the code, we provide a code security module powered by generative AI.
To use this service, you need to have generative AI service credentials that are compatible to use with dspy.

Preparation for analysis¶

The code cells below demonstrates the use of the AI code check feature.

import os
import dspy
from phaeton.ai import codebase_security_check

lm = dspy.LM(
    "azure/gpt-4o",
    api_key=os.environ["AZURE_AI_API_KEY"],
    api_base=os.environ["AZURE_AI_ENDPOINT"],
)

dspy.configure(lm=lm)

The code that needs to be check has to be inside a tarball. For example let’s clone the git repository from RIVM COVID-19 projection model and put it inside a tar archive.

!git clone https://github.com/rivm-syso/COVID-projectionmodel --depth 1

!tar -czvf COVID-projectionmodel.tar.gz ./COVID-projectionmodel/

We can run the code security check on the tarball as follows:

codebase_security_check(
    "COVID-projectionmodel.tar.gz",
    create_report=True,
    save_report=True,
    report_path='security_report.md'
)

Resulting output will be saved to a markdown file named security_report.md.

Example code security analysis output¶

The overall code is mostly safe to run, with the exception of a few files that require caution:

./README.html: Contains a potentially unsafe operation in Chunk 2, where a script is dynamically appended to the HTML document’s <head> section without verifying its content or source. This poses a risk of executing malicious code.
./R/00_masterscript_20210106.R: Sources external scripts, and while the chunks themselves are safe, the safety of the external scripts being sourced cannot be fully guaranteed without reviewing their content.
./.git/hooks/sendemail-validate.sample: Interacts with Git configurations and worktrees, which could potentially affect a repository. It is recommended to run this in a controlled environment and ensure proper implementation of the TODO sections.

For the rest of the files, they have been reviewed and deemed safe to run. They primarily involve data manipulation, visualization, configuration, and Git hooks, with no harmful operations or malicious intent.

Statistics¶

File	Format	Is it safe?	Analysis remarks
./R/code4model/Populationdata4model.R	R	✅	Both chunks of code have been reviewed and deemed safe to run. They focus on data manipulation and analysis without performing any harmful operations on the system.
./R/code4model/OSIRISanalyses4model.R	R	✅	The code is safe to run. Both chunks have been reviewed and found to contain statistical or epidemiological modeling without any harmful operations or malicious intent.
./R/code4model/DelaysProbabilities4model.R	R	✅	The code is safe to run. Both chunks involve mathematical operations and the creation of delay distributions without any harmful actions or system-level manipulations.
./R/code4model/SEROanalysesContactinput4model.R	R	✅	The code is safe to run as all chunks involve data processing, manipulation, and summarization operations without any harmful or system-altering commands.
./R/code4model/simulationcode4model_v3.R	R	✅	The code appears to be safe to run as both chunks do not include harmful operations such as file manipulation, system commands, or network access. The code seems to focus on simulation and statistical processes. However, it is important to ensure that the referenced objects and functions (e.g., `ContactMatrices`, `SeasonalityCurves`, `LogY0`, `LogInfectivities`, `engine_allvars`) are properly defined and do not introduce unsafe behavior.
./R/code4model/Seasonality4model.R	R	✅	All code chunks have been evaluated as safe to run.
./R/code4model/ContactsInfectivities4model.R	R	✅	The code appears to be safe to run based on the provided chunks. Chunk 1 defines functions related to epidemiological modeling without any harmful operations. Chunk 2 involves accessing and returning data, and while the full context of the code and data is not available, the snippet itself does not indicate any unsafe behavior.
./R/code4model/SimulatePlot4model.R	R	✅	The code is safe to run. Both chunks have been reviewed and determined to contain no harmful operations or malicious intent. They focus on statistical modeling, plotting, and data visualization.
./R/code4model/EPI_NICEdata4fit.R	R	✅	The code contains one chunk that is safe to run and another chunk that is incomplete with syntax errors, making it difficult to fully assess its safety. However, based on the visible portion, there are no indications of harmful operations. Caution is advised when running the incomplete chunk.
./R/code4model/Replace_syntheticresults.R	R	✅	The code is safe to run. Both chunks are focused on loading data and defining functions without performing any harmful operations on the system.
./R/code4model/readmatrices4model.R	R	✅	The code appears to be safe to run based on the analysis of both chunks. Chunk 1 involves reading `.rds` files, which should be verified as coming from a trusted source to avoid potential risks. Chunk 2 performs safe data manipulations and calculations without harmful operations.
./R/code4model/EPI_NICEanalyses4model.R	R	✅	The code chunks provided are safe to run. Chunk 1 involves standard data processing operations without any harmful actions, and Chunk 2 contains non-executable text that poses no risk.
./R/code4model/EPI_Reportingdelays4model.R	R	✅	The code appears to be safe to run. Both chunks define functions for calculating reporting delays, perform data manipulation using filtering and summarization, and return processed results. There are no indications of harmful operations such as file system access, network calls, or system modifications.
./R/code4model_original/EPI_NICEdata4fit.R	R	✅	Both chunks of code have been reviewed and deemed safe to run. They involve data processing, filtering, and transformation operations without any harmful or malicious commands.
./R/code4figures/Figure_S12_logbetaestimationhistory.R	R	✅	The code is safe to run. Both chunks involve standard data analysis and visualization tasks without any harmful operations or malicious intent.
./R/code4figures/Figure_2.R	R	✅	The code appears to be safe to run overall. Both chunks involve standard operations such as loading data, performing calculations, and saving plots. However, caution should be exercised to ensure that external files and libraries used in the code are trustworthy and free from malicious content.
./R/code4figures/Figure_S11_continuoustimemodel.R	R	✅	The code is safe to run. Both chunks involve standard operations such as data loading, function definition, modeling, data visualization, and statistical analysis, with no indications of harmful or malicious activities.
./R/code4figures/Figure_1.R	R	✅	All code chunks have been reviewed and deemed safe to run. The code primarily involves data visualization and saving plots in R, with no harmful operations or system-level commands.
./R/00_masterscript_20210106.R	R	✅	The code appears to be safe to run based on the provided chunks. However, the safety of the external scripts being sourced in Chunk 1 cannot be fully guaranteed without reviewing their content. It is recommended to ensure that these external scripts are from a trusted source and do not contain harmful code.
./R/code4data_original/opschonen_data_nice_episode.R	R	✅	Both chunks of code have been reviewed and deemed safe to run. They involve standard data processing operations in R using packages like `dplyr`, without any indications of harmful actions such as file system access, network calls, or system-level commands. However, it is recommended to ensure the data being processed is secure and does not contain sensitive information.
./R/code4data_original/opschonen_data_nice.R	R	✅	The full code is safe to run. Both chunks have been reviewed and do not contain harmful operations or system-level manipulations. They primarily source external R scripts and perform environment cleanup by removing a variable.
./R/code4data_original/importeren_data_nice.R	R	✅	The full code is safe to run. Both chunks have been reviewed and show no indications of harmful operations or system-altering commands.
./R/code4data_original/opschonen_data_nice_opnamedatum.R	R	✅	The full code is safe to run. Both chunks are focused on data manipulation using the `dplyr` package in R and do not include any harmful operations. However, as a precaution, ensure that the data being processed does not contain sensitive information and is handled securely.
./R/code4data_original/opschonen_data_nice_filter.R	R	✅	Both code chunks have been reviewed and deemed safe to run. They involve standard data manipulation operations in R using packages like `dplyr` and tidyverse, with no indications of harmful actions such as file deletion, system modifications, or external network calls.
./R/code4data_original/opschonen_data_nice_algemeen.R	R	✅	The full code is safe to run. Both chunks perform data transformation and cleaning operations using R’s dplyr package, with no indications of harmful actions such as file deletion, system modification, or external network calls.
./results/Fig1.jpg	binary	❌	Binary files are considered unsafe by default.
./results/Fig2.pdf	binary	❌	Binary files are considered unsafe by default.
./results/additionalresults/FigS11.pdf	binary	❌	Binary files are considered unsafe by default.
./results/additionalresults/FigS12.jpg	binary	❌	Binary files are considered unsafe by default.
./results/additionalresults/maxlikelihoodsestimationhistory_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./results/additionalresults/simulations_continuoustime_10pct_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./results/additionalresults/FigS12.pdf	binary	❌	Binary files are considered unsafe by default.
./results/additionalresults/maxlikelihoodcontinuoustime_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./results/additionalresults/FigS11.jpg	binary	❌	Binary files are considered unsafe by default.
./results/Fig2.jpg	binary	❌	Binary files are considered unsafe by default.
./results/Fig1.pdf	binary	❌	Binary files are considered unsafe by default.
./results/simulations_10pct_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./results/maxlikelihood_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./README.html	JavaScript	✅	The code contains a potentially unsafe operation in Chunk 2, where a script is dynamically appended to the HTML document’s `<head>` section without verifying its content or source. This poses a risk of executing malicious code. Therefore, the overall code cannot be deemed entirely safe to run.
./README.md	R	✅	Both code chunks are safe to run. The first chunk is a descriptive text about the repository and synthetic data usage, while the second chunk is a license text excerpt. Neither contains executable code or poses any risk.
./.git/hooks/commit-msg.sample	Shell Script	✅	Both chunks of code are safe to run. They are Git hooks designed to check for duplicate “Signed-off-by” lines in commit messages and do not perform any harmful operations on the system.
./.git/hooks/pre-merge-commit.sample	Shell Script	✅	All code chunks have been reviewed and deemed safe to run. The scripts are Git hook templates for pre-merge-commit checks, designed to assist in version control workflows without performing any harmful operations.
./.git/hooks/pre-commit.sample	Shell Script	✅	The code is safe to run. Both chunks are Git pre-commit hook scripts designed to enforce checks like preventing non-ASCII filenames and whitespace errors before committing changes. They do not perform any harmful operations on the system.
./.git/hooks/pre-rebase.sample	Bash	✅	Both code chunks are safe to run. The first chunk is a Git hook that enforces conditions during a rebase operation without performing harmful actions, and the second chunk is purely documentation explaining Git strategies and commands, which is non-executable and harmless.
./.git/hooks/post-update.sample	Shell	✅	All code chunks have been evaluated as safe to run.
./.git/hooks/pre-receive.sample	Shell Script	✅	All code chunks have been reviewed and deemed safe to run. The scripts are Git hooks that process push options, echo certain values, and reject pushes based on specific conditions without performing any harmful operations.
./.git/hooks/push-to-checkout.sample	Shell	✅	The code appears to be safe to run as it primarily consists of Git hook scripts designed to handle updates to a checked-out tree during a `git push`. Both chunks interact with Git commands and do not contain harmful operations. However, the code is incomplete and assumes proper permissions and a valid Git environment. It is recommended to run it in a controlled environment to avoid unintended changes.
./.git/hooks/pre-applypatch.sample	Shell	✅	All code chunks have been reviewed and deemed safe to run. The scripts are Git hook templates intended for verifying commits during the `applypatch` process and do not perform any harmful operations.
./.git/hooks/applypatch-msg.sample	Shell Script	✅	All code chunks have been reviewed and deemed safe to run. They are Git hook examples for checking commit messages during the `applypatch` process and do not perform any harmful operations on the system.
./.git/hooks/fsmonitor-watchman.sample	Perl	✅	The code appears to be safe to run based on the provided chunks. However, it is incomplete and may not function as intended without the missing parts. Additionally, it assumes the presence of specific dependencies and a properly configured environment. Ensure all prerequisites are met before executing the code.
./.git/hooks/pre-push.sample	Shell Script	✅	All code chunks have been reviewed and deemed safe to run. The scripts are Git pre-push hooks designed to prevent pushing commits with log messages starting with “WIP” and do not perform any harmful operations on the system.
./.git/hooks/update.sample	Bash	✅	Both code chunks have been reviewed and deemed safe to run. They are Git hooks designed to enforce repository policies and include safety checks to prevent improper usage. No harmful operations are performed, and the scripts are contextually safe for their intended use.
./.git/hooks/sendemail-validate.sample	Shell	✅	The code appears to be generally safe to run, as it does not contain any explicitly harmful commands or operations. However, it interacts with Git configurations and worktrees, which could potentially affect a repository. It is recommended to run the code in a controlled environment and ensure that the TODO sections are properly implemented before using it in production.
./.git/hooks/prepare-commit-msg.sample	Shell Script	✅	All code chunks have been reviewed and deemed safe to run. The scripts are Git hooks designed for modifying commit messages, utilizing standard tools like Perl and Git commands without performing any harmful operations on the system.
./.git/index	binary	❌	Binary files are considered unsafe by default.
./.git/description	Plain Text	✅	All code chunks have been evaluated as safe to run.
./.git/logs/HEAD	Plain Text	✅	The code consists of metadata or log entries related to cloning a GitHub repository. It does not contain executable code or harmful instructions. Therefore, the code is safe to run.
./.git/logs/refs/heads/master	Plain Text	✅	The code consists of metadata or log entries related to cloning a GitHub repository. It does not contain executable code or harmful instructions. Therefore, the code is safe to run.
./.git/logs/refs/remotes/origin/HEAD	Plain Text	✅	The code consists of metadata or log entries related to cloning a GitHub repository. It does not contain executable code or harmful instructions. Therefore, the code is safe to run.
./.git/shallow	Plain Text	✅	The full code is safe to run as all chunks are non-executable and do not perform any operations.
./.git/config	Git Configuration File	✅	All code chunks are safe to run. They consist of Git configuration file snippets that do not execute any harmful operations.
./.git/HEAD	Git Reference File	✅	All code chunks have been evaluated as safe to run.
./.git/packed-refs	Git	✅	All chunks are safe to run. The code consists of Git reference file snippets, which are non-executable and pose no harm to the system.
./.git/refs/heads/master	Plain Text	✅	The full code is safe to run as all chunks are non-executable and do not perform any operations.
./.git/refs/remotes/origin/HEAD	Git	✅	All code chunks have been evaluated as safe to run.
./.git/info/exclude	Git Ignore File	✅	All code chunks are safe to run. They consist of configuration snippets for Git’s exclude file and do not execute any harmful operations.
./.git/objects/pack/pack-e7afdaf9ec81eece024c119642ead20676be848e.pack	binary	❌	Binary files are considered unsafe by default.
./.git/objects/pack/pack-e7afdaf9ec81eece024c119642ead20676be848e.rev	binary	❌	Binary files are considered unsafe by default.
./.git/objects/pack/pack-e7afdaf9ec81eece024c119642ead20676be848e.idx	binary	❌	Binary files are considered unsafe by default.
./LICENSE	Plain Text	✅	The provided code chunks are safe to run. They consist of non-executable text related to the GNU Affero General Public License, including legal and informational content, and do not contain any harmful instructions.
./COVID-projectionmodel.Rproj	R configuration file	✅	All code chunks have been reviewed and deemed safe to run. They appear to be configuration files with no harmful commands.
./.gitignore	Plain Text	✅	All code chunks are safe to run as they consist of lists of file paths or filenames and do not contain executable code.
./data/population/ROAZregios_synthetic.csv	CSV	✅	All chunks have been reviewed and determined to be safe to run. The content consists of datasets or structured information without any executable code or harmful instructions.
./data/figuredata/PrognosisData20210106.rds	binary	❌	Binary files are considered unsafe by default.
./data/figuredata/Observations20220321.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_restrictions28sep_2020-11-26.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_batch2_1june_2020-11-26.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrix_D3asEpiPose1_residualincreased_27mei2020.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_intelligentlockdown_march_2020-11-26.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_2week-lockdown-november_2020-11-26.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_start-schoolyear-september_2020-11-26.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_batch3_summerholiday_2020-11-26.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_partiallockdown-october_2020-11-26.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/ContactmatricesD3praktijk_midpoint_24mrt2020.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_winter-lockdown_2020-12-16.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_partiallockdown-october-holiday_2020-11-26.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_winter-lockdown-christmas_2020-12-16.rds	binary	❌	Binary files are considered unsafe by default.
./data/contacts/Contactmatrices_batch1_11may_2020-11-26.rds	binary	❌	Binary files are considered unsafe by default.
./data/OSIRIS/OSIRISdata_20210106_synthetic.rds	binary	❌	Binary files are considered unsafe by default.
./data/NICE/NICEIDdata_20210106_synthetic.rds	binary	❌	Binary files are considered unsafe by default.
./data/NICE/data_nicedelay_20210106_synthetic.rds	binary	❌	Binary files are considered unsafe by default.
./data/processedmodelinput/modelinput2021-01-06.RData	binary	❌	Binary files are considered unsafe by default.
./data/processedmodelinput/modeldatainput2021-01-06.RData	binary	❌	Binary files are considered unsafe by default.
./data/originalresults/PreAdmissionProbs_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./data/originalresults/NICEtimeseries_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./data/originalresults/NICEdelays_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./data/originalresults/NICEprobabilities_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./data/originalresults/ReportingDelays_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./data/originalresults/PreAdmissionDelays_20210106.rds	binary	❌	Binary files are considered unsafe by default.
./data/sero/pico1data_synthetic.csv	CSV	✅	Both code chunks are safe to run. They consist of synthetic data and references to a COVID projection model without any executable code or harmful instructions.
./data/sero/pico2data_synthetic.csv	CSV	✅	All code chunks have been reviewed and deemed safe to run. They consist of synthetic data and metadata without any executable code or harmful instructions.

AI Services

Setting up AI services