FixVibe

// 프로브 / 스포트라이트

LLM 프롬프트 인젝션

AI 기능이 사용자 입력을 명령으로 신뢰하면, 사용자가 시스템 프롬프트를 다시 쓸 수 있어요.

핵심

Prompt injection is the new SQL injection, with one large complication: the parser is a probabilistic neural network whose decision boundaries are not specified anywhere. There is no equivalent of parameterized queries because the LLM does not have a structural separation between code and data — both arrive as text in a single context window. Every developer who builds a chatbot, summarization tool, or RAG-backed search starts with the same naive pattern (concatenate the system prompt with the user's input, send it to the LLM), and every one of them is vulnerable until they layer on defenses. Worse: the attacker doesn't need to be the user. Indirect prompt injection — instructions hidden inside documents, web pages, or emails the LLM consumes — turns content fetching into command execution.

어떻게 동작하나요

Prompt injection appears when LLM-facing inputs can override instructions, leak context, or trigger unsafe tool behavior. The risk depends on what the AI feature can read or do inside the product.

피해 범위

Data exfiltration: system prompts (often containing internal context, business logic, or credentials), conversation history from other users in shared deployments, document contents from RAG systems. Reputation damage when chatbots produce offensive output that screenshots cleanly. Phishing assistance via injected URLs that the LLM presents as legitimate. Financial loss when LLMs with tool access execute unintended operations — sending email, posting messages, hitting paid APIs, calling code-execution sandboxes. In agentic systems with broader tool access, the impact grades up to remote code execution.

// what fixvibe checks

What FixVibe checks

FixVibe checks this class with verified-domain active testing that is bounded, non-destructive, and evidence-driven. Public reports describe the affected surface and remediation. For check-specific questions about exact detection heuristics, active payload details, or source-code rule patterns, contact support@fixvibe.app.

확실한 방어

Treat user input as untrusted by the LLM. The most reliable structural defense is to constrain output to structured formats (JSON schema with strict validation, function-calling with tool-side permission gates) so the LLM's output is parsed by your code rather than executed. Layer the system prompt to restate boundaries after user content — instructions placed at the end of the context have measurably better adherence than those at the beginning. For tool-using agents, gate every dangerous operation behind a human approval step (don't let an LLM send email, transfer money, or execute code without a confirmation prompt). For indirect injection, sanitize fetched content before adding it to the context — strip HTML attributes, comment markers, and instruction-shaped text patterns. Run a second model as a 'jailbreak detector' over user inputs and outputs in high-stakes deployments; OpenAI's moderation endpoint and Anthropic's classifier APIs cover the common cases. Most importantly, design with assumed compromise: what's the worst the LLM can do if it follows attacker instructions? If the answer is 'leak the system prompt,' you might accept that risk. If the answer is 'send mass email or call paid APIs,' you need the gates.

핵심 정리

There is no parameterized-query equivalent for LLMs yet. Defense is currently architectural — constrain what the model can do with its output, not what it can be told to do.

// 내 앱에서 직접 실행해보세요

FixVibe가 지켜보는 동안 계속 배포하세요.

FixVibe는 공격자가 보는 것처럼 앱의 공개 영역을 압박 테스트합니다 — 에이전트도, 설치도, 카드도 필요 없어요. 새로운 취약점 패턴을 계속 연구해 실용적인 체크와 Cursor, Claude, Copilot에 바로 붙여넣을 수 있는 수정안으로 바꿉니다.

능동 프로브
103
이 카테고리에서 실행되는 테스트
모듈
27
전용 능동 프로브 검사
매 스캔
384+
모든 카테고리 합계 테스트
  • 무료 — 카드 없이, 설치 없이, Slack 알림 없이
  • URL만 붙여넣으세요 — 크롤, 탐지, 보고는 저희가
  • 심각도별 분류, 중복 제거된 신호만
  • 최신 AI 수정 프롬프트를 Cursor, Claude, Copilot에 바로 붙여넣기
무료 스캔 실행

// 최신 체크 · 실용적인 수정 · 자신 있게 배포

LLM 프롬프트 인젝션 — 취약점 스포트라이트 | FixVibe · FixVibe