Skip to Main Content

Generative AI and Copilot: Ethical & Privacy Risks

Privacy Risks

Always make sure to read the Terms of Service (ToS) of any tool you plan to use so you can make an informed decision on whether you wish to access the service. ToS will often include terms on how your personal data can be used by the provider (for example, can the service access your usage for tool diagnostics, or training material, can they use your data for advertising, do they have (non)exclusive rights to work created on the platform, or do you give them permission to on-sell your identifying, and even health, data? In some cases, agreeing to these ToS determines whether you are able to access the service at all, whereas other tools will allow you to use the service, but you should be aware of what you choose to share on the platform, given this acknowledgement of potential re-ruse.

As such, whilst interacting with GenAI, it is important to be careful with the personal and sensitive information you provide the tool. Avoid sharing protected or highly personal data with AI tools. This includes personally identifiable information (names, addresses, biometric data, health and medical information, government-issued ID, etc.), unpublished exams, unpublished research data and results, or confidential or commercially sensitive material (this includes asking GenAI for input on business plans and strategies, and security credentials.)

Be aware that third-party materials are often under Copyright, and using them as an input to GenAI may breech the law. Make sure to only upload material where you have obtained permission from the Copyright holder, or where Fair Dealing provisions may apply.

Note: When uploading third-party material, it is advisable to opt out of data collection (manage your data in ChatGPT.)

Ethical Use

There are ethical considerations in using GenAI that has been trained on publicly-available data (beyond the Academic Integrity issues already discussed), including Copyright Infringement (see below), but also the Human-cost to its training.

In a report in Time ("OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic," Perrigo, 2023) it was found that in the process of training ChatGPT that OpenAI outsourced the labeling of textual descriptions of sexual abuse, hate speech, and violence to a firm in Kenya. The workers faced disturbing and traumatic content while performing their duties. The Time article highlights how AI systems are often reliant on hidden, exploitative human labour.

It is critical to recognise that much of the technological "magic" of AI systems, are often produced by people, and we should be aware of these potential costs. Make sure that you are comfortable with these trade-offs, and try to approach GenAI with a sense of fairness, privacy, reliability, transparency, and accountability.

Copyright Infringement

Generative AI is trained on vast amounts of information (in the case of ChatGPT it is text material, image-based GenAI may use images, etc.) This information is often Public Domain, or otherwise legally-licensed material, however, a lot of the information is from Copyrighted sources. Ask ChatGPT to summarise Harry Potter books, and it would not be able to give a breakdown of the book's plot, unless it had access to the Copyrighted Work in its training data.

See the current cases of authors v OpenAI, "Two authors are suing OpenAI for training ChatGPT with their books. Could they win?" (Thampapillai, 2023).

Note: This is quite complex in terms of Copyright law, given the models do not actually contain the Works themselves, and have only been trained on the patterns found within the Works (with the Works then deleted from the model.) This means the GenAI may be likely to generate an acurate summary of Harry Potter but this is due to a set of learned, probabilistic relationships between words, and not a direct copying of the Work. It is also just as likely the GenAI will "hallucinate" a likely, but incorrect summary of the Work.

Some of this Copyrighted material may have been licensed for this use, and there are arguments this training use may come under Fair Dealing provisions.