1 min read

Learning from the DeepSeek data breach

Tshedimoso Makhene February 04, 2025

News

Chinese AI startup DeepSeek inadvertently exposed sensitive user data through unsecured databases.

What happened?

DeepSeek, known for its DeepSeek-R1 large language model (LLM), left two ClickHouse database instances publicly accessible without authentication. Security researchers at Wiz discovered that these databases contained over a million log entries, including user chat histories, API keys, backend system details, and operational metadata. The exposure meant that anyone with knowledge of the database URLs could query and access sensitive information.

The unsecured databases, hosted at oauth2callback.deepseek.com:9000 and dev.deepseek.com:9000, allowed arbitrary SQL queries via a web interface. The ‘log_stream’ table stored internal logs dating back to January 6, 2025, containing:

User queries in plaintext,
API keys for backend authentication,
Internal infrastructure details,
Various operational metadata.

Wiz Research warned that the exposure posed a severe security risk, allowing potential attackers to retrieve plaintext passwords and proprietary data. Although it remains unclear whether malicious actors exploited this vulnerability, Wiz promptly reported the issue, and DeepSeek secured the databases soon after.

Cybersecurity concerns

This incident emphasizes the need for good cybersecurity practices, particularly for AI companies handling vast amounts of user data. Exposed chat logs raise significant privacy concerns, especially for businesses using AI tools for confidential operations.

Moreover, the exposure of backend details and API keys could have led to privilege escalation attacks, granting unauthorized access to DeepSeek’s internal network and potentially causing larger-scale breaches. This security lapse, coupled with DeepSeek’s recent struggles against persistent cyberattacks, raises concerns about the company’s preparedness against future threats.

Read also:

Lessons and recommendations

To prevent similar incidents, AI companies must adopt proactive security measures:

Secure database configurations: Ensure all databases require authentication and are not publicly accessible unless necessary.
Encrypt sensitive data: Store user chat histories and API keys in encrypted form to mitigate exposure risks.
Implement regular security audits: Conduct routine security assessments to identify vulnerabilities before attackers do.
Use role-based access control (RBAC): Limit database access based on user roles to minimize the risk of unauthorized access.
Monitor for unauthorized access: Deploy real-time monitoring tools to detect and respond to suspicious activities promptly.
Train employees on cybersecurity best practices: Ensure teams understand security protocols and recognize risks associated with improper data handling.

FAQs

What are the risks of such an exposure?

Risks include privacy breaches, unauthorized access to internal systems, and potential privilege escalation attacks.

How does this incident affect trust in AI companies?

It raises concerns about data security and highlights the need for stricter cybersecurity policies in AI-driven organizations.

HIPAA compliant email you can set and forget

Subscribe to Paubox Weekly

Every Friday we'll bring you the most important news from Paubox. Our aim is to make you smarter, faster.

Learning from the DeepSeek data breach

What happened?

Cybersecurity concerns

Lessons and recommendations

FAQs

What are the risks of such an exposure?

How does this incident affect trust in AI companies?

ChatGPT outage and data breach

Lessons: Business associates fined for HIPAA violations

Lessons from the Cummins Behavioral Health data breach settlement

Subscribe to Paubox Weekly

Products

Resources

Company