Skip to Main Content

Manage Your Research Data: Consistency and Reproducibility

Consistency and reproducibility are fundamental to credible AI research. By managing AI data effectively, researchers can ensure that results are consistent across multiple iterations, regardless of time or researcher.

For example, when using AI to analyse text data, keeping the structure and wording of prompts consistent allows results to be reliably reproduced.

Keep a detailed log of each prompt version, documenting the changes made and their rationale. This process ensures that any modifications are consistent with the research goals, allows you to track how each change impacts AI output, and makes it easier to evaluate the potential effects of these changes on model performance.

Ensuring consistency: By recording each version of the prompt, you maintain a clear record of how the prompt has evolved over time. This ensures that any changes made are consistent with the intended direction of the research, allowing you to reproduce the same results when needed.

Tracking modifications: Documenting every modification to the prompt allows you to identify exactly what was changed at each stage. You can review and track changes to understand their potential influence on the results.

Evaluating their impact on the output of the AI model: When a prompt is modified, it can change the way the AI interprets and responds to the input. By keeping track of each prompt version, you can assess how these changes affect the AI's results. This helps understand how small adjustments may influence the accuracy, fairness, or relevance of the AI’s output.


 

Ensure that all relevant information, such as input data, expected outcomes, model parameters, and any pre-processing steps, is well-documented. This transparency supports clarity for future research and collaboration, and ensures that results can be replicated.

Input data: This refers to the raw data or information that is fed into the AI system. Researchers should document where this data comes from, how it was collected, and any characteristics it has (e.g., format, size, variables) to ensure others can understand and replicate the research process.

Expected outcomes: This refers to the goals or results the researcher wants achieving with the AI model. It could include specific performance metrics, predictions, or classifications the AI is expected to produce.

Model parameters: These are the settings or configurations of the AI model. Recording these details ensures others can reproduce the model with the same configurations, which is crucial for verifying results and comparisons.

Pre-processing steps: This refers to any data cleaning, transformation, or manipulation done before feeding the data into the AI model. It might include normalising data, or handling missing values.

Keep a detailed record of how AI prompts change over time, as well as the reasons for these changes. This is especially important in AI research where the prompt given to the AI model can significantly affect the output and outcomes.

Tracking Changes: As you work with AI tools, the initial prompt (the question or instruction you provide to the AI) might evolve as you refine your research, address challenges, or improve the AI's performance. Keeping track of each version of the prompt - what it was, when it was updated, and how it changed - ensures that any modifications are documented.

Rationale Behind Changes: Along with each prompt change, you should record the rationale for making the modification. For instance, you may update the prompt to improve the AI's understanding, make it more specific. Documenting these helps others understand why certain decisions were made and ensures transparency.

Promotes Transparency and Accountability: By keeping a log of prompt evolution, you can provide clear evidence of how the AI's inputs have been adjusted over time. This helps other researchers, collaborators, or stakeholders trace the decision-making process and understand the logic behind changes.