In 2023, a Samsung employee uploaded company source code to ChatGPT (Ray, 2023). By doing this, the sensitive data was now beyond Samsung's control. This disclosure could compromise Samsung's intellectual property or introduce a situation that could hurt their business in other ways. For example, the code could potentially be used to train AI models or be reproduced line-by-line elsewhere. Because of this, Samsung banned employee use of generative AI tools shortly after.
This isn't just a corporate problem. It shows how quickly sensitive data we use every day, but that seems innocuous, can be inadvertently shared too widely.
Generative AI is the latest iteration of artificial intelligence, which has evolved from World War II code-breaking (Imperial War Museums, 2018) through increasingly complex applications over the past 70 years. Like any tool, it needs appropriate use. When you interact with generative AI through prompts, you're providing information that can be handled in various ways—some beneficial, some potentially harmful.
This can be tough to navigate, especially given how new this technology is. We’re all figuring things out at the same time. In this post, we’ll discuss some ways AI can help or hinder (Yao et al., 2024), as well as some tools to help you navigate using generative AI responsibly.
Understanding AI Tools
To start off, let’s go over some key terms:
- Generative AI: Systems that create content (text, images, code, audio, video) based on patterns
- Data: Any information that is input by the user and/or used to train the system, including prompts, documents, and personal details
- Model/LLM: The AI system processing your input (Large Language Model, or LLM, specifically refers to text-trained generative AI)
- Terms of Service: Rules governing how your data is used, stored, and shared
You’ve likely interacted with various AI models in your everyday life. You may have some come across some like ChatGPT, Claude, and Copilot. Different company policies have different levels of data risk depending on various factors, such as their terms of service and the way the models were trained. It can be tough to understand the differences between them. Let’s look at some common differences below.
| Tool Type | Examples | Data Risk |
|---|
| Free Services | ChatGPT, Claude, Copilot, Gemini | Higher |
| Campus-Licensed | Institution-specific tools | Lower |
| Specialized Academic | Scite.ai, Consensus, Elicit | Varies |
Overall, free services often offer the least security in their terms of service. They may use your inputs for training or share data with third parties unless you opt out. Campus-licensed versions of LLMs, such as ChatGPT and Copilot, typically have institutional agreements that limit data retention and prevent training use.
For example, a version of ChatGPT with an education license is available for faculty and staff at the University of Illinois Urbana-Champaign “with plans to release licenses for personal purchase for faculty, staff and students at a later date” (Kwak, 2025). This license "provides substantially stronger safeguards—offering enhanced data protection, encryption and administrative controls—than the public consumer version” (Kwak, 2025).
Most University of Illinois System students and employees with a Microsoft Outlook A5 license also have access to Copilot with Data Protection service (Microsoft 365, Copilot with Data Protection - AI Chat for the Web, 2024), which offers similar levels of data protection; “chat data is not saved, and chat data will not be available in any capacity to Microsoft or other large language models to train their AI tools against” (Kwak, 2023).
Where Your Data Goes
What happens to the data you input into an LLM? We can think of it flowing through the following steps:
- Your input: You hit "send" and your data starts its journey
- Company Servers: It travels over the internet to company infrastructure
- Processing & Storage: The AI generates a response; your conversation may be saved depending on terms of service
- Training/Improvement: Your data may train future model versions
- Long-term Retention: Data may be retained long-term or permanently, even if you delete it from your view
This process varies significantly by tool, account settings, institutional agreements, and opt-out policies. For example, some enterprise or educational plans may skip offer-deletion options or skip the training stage. Checking specific privacy policies for the tools you use is the best way to figure out exactly what happens to the data you share. These can often be found on company (for non-licensed models) and university (for licensed models) websites. For example, the privacy policy for ChatGPT can be found on its webpage (OpenAI, 2023) and further details on the University of Illinois education license can be found on the U of I webstore (ChatGPT EDU Named User License Access (Expiring 02/28/2026) | University of Illinois WebStore, 2025).
When AI Helps
AI can support learning and work when used thoughtfully. It can break down complex concepts, provide alternative explanations, summarize research, help brainstorm ideas, and refines writing. For technical work, it can identify security vulnerabilities, debug code, assist with data analysis, and automate repetitive tasks.
That being said, context is important. There's no universal answer for when AI use is acceptable. Check with professors, supervisors, or institutional guidelines before using AI for academic or professional work. When uncertain, disclose your AI use.
Let’s look at some examples scenarios:
Classroom:
Using AI to help explain a difficult concept in calculus is likely fine. Having AI solve your homework step-by-step probably isn't. The best way to navigate this is to check your syllabus and ask your instructor.
Personal Use:
Getting recommendations on what to cook for dinner is low risk. Uploading credit card statements for spending analysis is very high risk. After all, that's sensitive financial data that could be stored indefinitely.
Workplace:
Getting help with routine email phrasing may be acceptable depending on your workplace’s policy. Uploading proprietary data or client information is high risk and may violate confidentiality agreements. Make sure to check your employer's policy first.
Each context is different, so make sure to weigh the trade-offs. Overall, it can be best to err on the side of caution.
Common Risks of Using AI
Persistence
First, data persistence. Most AI companies retain conversations for extended periods, sometimes indefinitely. It’s always best to assume that your data will be stored for the long-term. And even content deleted within seconds has already been transmitted and recorded.
For example, sensitive conversations could be exposed in future data breaches, even years later. When you delete conversations, you're usually just removing them from your view. Companies may also retain data for legal reasons, training, or policy provisions. For example, your conversations may train future AI versions, meaning your words and ideas could influence responses to other users.
Aggregation
Secondly, data aggregation. Details accumulate across conversations and models can store information across chats (Using Claude’s Chat Search and Memory to Build on Previous Context | Claude Help Center, 2020). For example, you might mention in separate conversations that you attend a college in Illinois surrounded by corn, that you're a junior studying biology, that you're researching a specific plant species, and that you're applying to a particular program. Separately, this information doesn’t identify you. But, put together, it might identify you as belonging to a small group of people.
Linking Public & Private Data
Also, it bears keeping in mind that seemingly anonymous conversations become identifiable when combined with publicly available information—for example university department websites or your personally available LinkedIn or Instagram. AI companies could link conversations and publicly available data to build a comprehensive profile of you. If your data is exposed in a data breach, anyone could piece this information together. This adds complexity to an already complicated data brokerage system, where personal data is collected and sold for profit. LLMs collect a lot of data and linking that data to what is already available can reveal a lot more about a person than what is currently available.
Possible Solution: Abstraction
It's easy to overshare when you need help. People inadvertently share full names, student IDs, grades, health information, financial details, and personal situations, too. The solution is abstracting questions to get useful guidance without exposing sensitive details.
Let’s look at some examples:
- Instead of: "Here are my medical lab results [upload]. Can you interpret them and tell me if I should reduce my course load?"
Ask: "How should students manage coursework during health challenges? What accommodations are available?" - Instead of: "Can you review this email to my professor about my depression diagnosis?"
Ask: "How should I communicate with a professor about needing health accommodations?"
Overall, remember to strip out identifying details while still getting helpful answers.
AI System Vulnerabilities
It’s also important to keep in mind that models and the technical and policy infrastructure that they use to function aren’t foolproof, they have weaknesses. Let’s delve into a few of them.
Memory Leaks
First, memory leaks. AI systems occasionally output others' private information. Research shows carefully crafted prompts can extract training data snippets, potentially including private conversations or confidential information (Chen et al., 2025).
Prompt Injection
Second, prompt injection. Specific prompts can bypass AI safety rules, potentially exposing information. You can't determine if others are exploiting these vulnerabilities.
Opaque Data Privacy Practices
Third, opaque data practices. Privacy policies are often lengthy and complex. Key questions about storage locations, access permissions, sharing circumstances, and deletion timelines may lack clear answers. This opaqueness can make it hard to understand how your data is stored and used.
Limited Transparency on Cybersecurity
Fourth, limited transparency. You can't independently verify security claims that an AI makes without auditing capabilities. Actual security implementation often remains unknown until incidents occur.
Changing Terms
Fifth, changing terms: Terms of service update regularly, sometimes without notice. Policies may change through updates, acquisitions, or jurisdictional expansions. All this quick change means that staying informed requires active monitoring.
Account-Dependent Rules
Sixth, account-dependent rules. The same service operates under different privacy rules depending on access method. Institutional accounts typically provide different protections than personal accounts, with variations in data retention, training use, security requirements, and privacy protections. Educational and enterprise agreements often include protections unavailable in consumer versions, but interfaces can look identical across account types. This means a need to stay alert as to which model you are using under which account.
Why Students Can Face a Higher Risk
Students represent a unique user group when it comes to AI tool adoption and privacy risks. While they're often technologically fluent and early adopters of new platforms, several interconnected factors can increase their (your!) vulnerability to privacy concerns and data misuse. Understanding these risk factors can help you make more informed decisions about which tools to use and how to protect your information.
- Students adopt emerging AI tools early, often integrating them into high-stakes academic work before long-term implications are understood. This can expose users to unidentified risks.
- Many students, and users generally, haven't evaluated terms of service and data privacy policies extensively, making it harder to identify concerning clauses in these documents.
- Growing up with technology can create assumptions that popular tools are inherently secure. But technological fluency doesn't equal understanding privacy risks.
- Competitive environments create pressure to adopt tools peers are using, which may lead students to prioritize immediate needs over privacy, especially when facing deadlines.
- Research indicates age moderates AI's impact on learning (Han et al., 2025). Younger students may be more susceptible to over-reliance or still developing critical evaluation skills.
- Budget limitations often lead students toward free AI tools with weaker privacy protections. Students may not know more secure campus-licensed options exist or lack access to them.
So, What to Do?
With all the uncertainty that comes with AI use, as we’ve it, it can be hard to know how and when to use these tools. The following guidelines can help you make quick decisions about what information to share with AI tools and what to keep private. These aren't about avoiding AI entirely, they're about using it strategically as a tool while protecting your data.
- The Publicity Test: Would I post this on a public bulletin board? If not, don't input it into AI.
- The Stranger Test: Would I tell this to a random stranger? AI tools aren't friends or therapists. They're helpful but, at the same time, record everything.
- The Future Test: Could this affect me in five years? Consider future applications for graduate school, jobs, or security clearances.
It can also be helpful to keep some guidelines in mind.
Never Share
- Student ID, Social Security Number, passwords
- Medical records, financial account information
- Others' personal information
- Unpublished research data (unless discussed with PI and part of approved methodology)
Requires Careful Consideration
- Full name + school name together (individually fine; combined makes you identifiable)
- Specific personal struggles
- Complete assignments before submission
- Identifying life details
Generally Safer
- General academic questions
- Concept explanations
- Brainstorming generic ideas
- Study strategies
The goal is thoughtful sharing and informed decision-making, not perfection.
Key Takeaways:
Assume permanence: Treat every conversation as recorded and stored indefinitely
Protect sensitive information: Keep student IDs, passwords, health information, and financial details out of AI chats
Understand data scope: Data is used for training, analysis, and potentially unauthorized purposes
Balance benefits and risks: AI offers useful capabilities but requires careful use
Ask when uncertain: Campus resources, professors, and IT departments can help
Exercise control: You control what you share and how you use these tools
This is new territory. Rules are still developing, research is ongoing, and uncertainty is normal. Critical thinking and continuous learning matter most.
AI will remain relevant. The goal isn't avoidance but thoughtful use with appropriate protection. Think before sharing, ask questions when uncertain, and consider broader implications.
Samsung Revisited
Remember the story from the beginning of this post? Wondering what happened next?
Samsung didn't permanently ban AI (Farooqui, 2025). They implemented security protocols, created guidelines, and lifted restrictions with appropriate training and awareness of sharing boundaries.
The goal is to learn how to use AI strategically and safely. We can draw from other’s experiences, and our own, in order to do just that.
References
Chen, K., Zhou, X., Lin, Y., Feng, S., Shen, L., & Wu, P. (2025). A survey on privacy risks and protection in large language models. Journal of King Saud University - Computer and Information Sciences, 37(7). https://doi.org/10.1007/s44443-025-00177-1
Han, X., Peng, H., & Liu, M. (2025). The impact of GenAI on learning outcomes: A systematic review and meta-analysis of experimental studies. Educational Research Review, 100714. https://doi.org/10.1016/j.edurev.2025.100714
Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., & Zhang, Y. (2024). A survey on Large Language Model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 4(2), 100211. https://doi.org/10.1016/j.hcc.2024.100211