Several major Generative AI models have been trained on sources that include much of the internet: the good, the bad, the weird - everything. Models respond to prompts building from their model inputs, which can carry over the biases from the original material. Even if specific biased resources are excluded from the model, the overall training material could underrepresent different groups and perspectives.
AI companies acknowledge and try to mitigate these risks, it is not clear how successful they can be.
Initial general LLM development was based on scraping the internet. There is a lack of transparency as to what exactly is included in some models, but there are allegations that the models include pirated content, such as a number of books, images not licensed by their creators for reuse, and content from the internet that was intended to support its creators with advertising revenue. Some companies are beginning to let web sites be excluded from future scraping.
There are legal questions surrounding fair use and what content can be scraped, put into a model, and used to generate new content. This litigation is ongoing and will not be resolved quickly. Beyond the legal question, what does it mean to have an AI model use material where the creator expected to be compensated?
You can use Generative AI to create content in the style of many artists and writers. How would an artist react when an AI creates a new but similar work to their own?
Generative AI products can take content, and transform it in a number of ways. Some try to change the writing style, some can summarize content, and others can translate it to a new language. But what happens when that content has personal or private information? Many AI companies utilize user input to improve their models. Interacting with an AI may feel personal, but that one AI could be interacting with millions of other people.
Tip: In ChatGPT, go to Settings - Data Controls to prevent chat history from being used to further train the model.
In the case of ChatGPT, one can use a free model that requires creating an account, or pay extra for a newer, more robust model with more features. If Generative AI is used in academic work, what does it mean when some students can afford the best AI, and others cannot?
Creating a Generative AI model and service takes an enormous amount of computing resources. All of these computers require electricity to operate. In creating an image of a cat in a Red Sox hat, are the emissions associated with that power consumption worth the result?
Prompt: create a picture of a cat wearing a red sox hat
DALL-E 3
Oct. 24, 2023