Adler University Library: AI for Research: Ethical Considerations

Ethical Considerations

Consider some of the following ethical considerations before using AI in your research:

Training

Generative AI is trained on lots and lots of data. Many of the LLMs commonly used are essentially trained on the entire internet. While that data is fed to the LLMs, it is unsupervised, and we don't know what connections it is making or how it is learning. Much of the internet contains incorrect, non-factual information and LLMs learn from that and reproduce it in their outputs. The information that LLMs are trained upon is not neutral, and neither are LLMs.

Bias

The internet contains a lot of biased information, that AI learns from, and replicates. Often, AI does not simple replicate biased opinions, but expands upon them. Studies have shown that AI used in hiring practices has perpetuated biased hiring practices, by continuing to discriminate against marginalized populations (Chen, 2023). There are many examples of AI perpetuating biased opinions.

Privacy

AI developers often claim that their generative AI models were not trained on confidential and private data, however due to the black box nature of AI training, this is hard to confirm, and private information has been known to appear in outputs.

Many terms and conditions of AI specify that they can use imputed data for future AI training. Sensitive, private, and confidential data should never be input into generative AI.

Hallucinations

AI hallucinations are outputs that contain misleading or incorrect information. This information is "fluent but not factual" so it can sometimes be difficult to spot (Stokel-Walker et al., 2023). When LLMs are confronted with a lack of data on a specific subject, they create it. AIs have created many examples of completely fabricated data, including non-existent academic articles in bibliographies.

Intellectual Property

LLMs are essentially trained on the entire internet. However, much of the internet is copyrighted, and LLMs use and remix that copyrighted material without compensating or crediting the owner. This also means that LLMs can plagiarize, if the output is too closely related to the material that it was trained on.

Legally, it is still unclear who owns the outputs from generative AI. The work you write as a student at Adler University belongs to you; however, the same cannot be said for the output of prompts you've generated.

Environment

Training and operating AI requires a vast amount of electricity, significantly more than a simple search requires. As AI becomes more and more integrated into society, more data centres are being built to accommodate. These data centres are projected to be some of the world's top consumers of electricity; rivalling many countries' electrical use.

As well, AI data centres require massive amounts of water to cool servers. Current estimates expect that by 2027, data centres globally will have the same water withdrawl as approximately half of the United Kingdom (Li, et al., 2023).