[Paper Review] A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
๐Ÿ“

[Paper Review] A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning

Tags
LLM Reasoning
Test-time Scaling
Efficient Inference
AI
Published
March 9, 2026
Author

๋ฌธ์ œ

Sampling ๊ธฐ๋ฐ˜ test-time scaling์—์„œ SC์™€ PPL ๋ชจ๋‘ ์ด๋ก ์  ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•˜๋ฉฐ, ์ด๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ๋ถ€์žฌํ•ฉ๋‹ˆ๋‹ค.

๋ฐฉ์•ˆ

Reasoning error๋ฅผ Estimation Error + Model Error๋กœ ๋ถ„ํ•ดํ•˜๋Š” ์ด๋ก  ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ˆ˜๋ฆฝํ•˜๊ณ , LLM ๋‚ด๋ถ€ ํ™•๋ฅ ์„ SC ํ”„๋ ˆ์ž„์›Œํฌ์— Perplexity Consistencyํ•œ ๋’ค ์ €ํ™•๋ฅ  ์ถ”๋ก  ๊ฒฝ๋กœ๋ฅผ Reasoning Pruningํ•˜๋Š” RPC ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ๊ธฐ์—ฌ

  1. Sampling ๊ธฐ๋ฐ˜ test-time scaling์„ confidence estimation ๊ด€์ ์—์„œ ๋ถ„์„ํ•˜๋Š” ์ตœ์ดˆ์˜ ์ด๋ก  ํ”„๋ ˆ์ž„์›Œํฌ ์ œ์‹œ
  1. SC์™€ PPL ๊ฐ๊ฐ์˜ ํ•œ๊ณ„๋ฅผ Estimation Error / Model Error ๊ด€์ ์—์„œ ์ •๋Ÿ‰์ ์œผ๋กœ ๊ทœ๋ช…
  1. ๋‘ ๋ฐฉ๋ฒ•์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•œ RPC ์ œ์•ˆ โ€” ์ถ”์ • ์˜ค์ฐจ ์ˆ˜๋ ด์„ ์„ ํ˜•์—์„œ ์ง€์ˆ˜์ ์œผ๋กœ ๊ฐ€์†ํ•˜๋ฉด์„œ ๋ชจ๋ธ ์˜ค์ฐจ๋ฅผ ๋‚ฎ๊ฒŒ ์œ ์ง€

๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ

๊ธฐ์กด ๋ฐฉ๋ฒ•์˜ ์ ‘๊ทผ

Consistency ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ• (Ex, Self-Consistency)
  • ๋™์ผํ•œ ๋ฌธ์ œ์— ๋Œ€ํ•ด n๊ฐœ์˜ reasoning path๋ฅผ ์ƒ˜ํ”Œ๋ง
  • Majority vote๋กœ ๊ฐ€์žฅ ๋นˆ๋ฒˆํ•œ ๋‹ต์„ ์„ ํƒ
  • Monte Carlo ์ถ”์ •์œผ๋กœ confidence๋ฅผ ๊ณ„์‚ฐ
Probability ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ• (Ex, Perplexity)
  • LLM์ด ์ƒ์„ฑํ•œ ๊ฐ reasoning path์˜ ๋‚ด๋ถ€ ํ™•๋ฅ  ์„ ์ง์ ‘ ํ™œ์šฉ
  • ํ™•๋ฅ ์ด ๋†’์€ path๋ฅผ ๋” ์‹ ๋ขฐํ•  ๋งŒํ•œ ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผ

๋ฌธ์ œ์  ๋ฐ ๋ณ‘๋ชฉ

1. Self-Consistency์˜ ๋А๋ฆฐ ์ˆ˜๋ ด
  • Monte Carlo ์ถ”์ •์— ์˜์กดํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ถ”์ • ์˜ค์ฐจ๊ฐ€ ์œผ๋กœ๋งŒ ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค
  • ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ ์ ์„ ๋•Œ ์„ฑ๋Šฅ์ด ๋ถˆ์•ˆ์ •ํ•ฉ๋‹ˆ๋‹ค
  • ์˜ˆ: ์ถฉ๋ถ„ํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋ ค๋ฉด 64~128๊ฐœ์˜ ์ƒ˜ํ”Œ์ด ํ•„์š”ํ•œ๋ฐ, ์ด๋Š” ๋น„์šฉ์ด ํฝ๋‹ˆ๋‹ค
2. Perplexity์˜ ๋†’์€ ๋ชจ๋ธ ์˜ค์ฐจ
  • LLM ๋‚ด๋ถ€ ํ™•๋ฅ ์„ ์ง์ ‘ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ์ด ํ™•๋ฅ  ์ž์ฒด๊ฐ€ ์ •๋‹ต ํ™•๋ฅ ๊ณผ ๊ดด๋ฆฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค
  • ํŠนํžˆ ํ™•๋ฅ  ๊ฐ’์ด ๋งค์šฐ ๋‚ฎ์€ reasoning path์—์„œ ์ˆ˜๋ ด ์„ฑ๋Šฅ์ด ๊ธ‰๊ฒฉํžˆ degradationํ•ฉ๋‹ˆ๋‹ค
3. ๋‘ ๋ฐฉ๋ฒ• ๋ชจ๋‘ ๊ฐœ์„  ์—ฌ์ง€ ์กด์žฌ
  • SC: ์ˆ˜๋ ด์€ ๋А๋ฆฌ์ง€๋งŒ ๋ชจ๋ธ ์˜ค์ฐจ๊ฐ€ ๋‚ฎ์Šต๋‹ˆ๋‹ค
  • PPL: ์ˆ˜๋ ด์€ ๋น ๋ฅด์ง€๋งŒ ๋ชจ๋ธ ์˜ค์ฐจ๊ฐ€ ๋†’์Šต๋‹ˆ๋‹ค
  • ๋‘ ์žฅ์ ์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๊ฐ€๋Šฅํ•  ๊ฒƒ์ด๋ผ๋Š” ์ด๋ก ์  ๊ทผ๊ฑฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค

๊ทผ๋ณธ ์›์ธ

๋‘ ๋ฐฉ๋ฒ•์˜ ํ•œ๊ณ„๋Š” confidence estimation ์ „๋žต์˜ ๊ทผ๋ณธ์  ์ฐจ์ด์—์„œ ๋น„๋กฏ๋ฉ๋‹ˆ๋‹ค. SC๋Š” ์ˆœ์ „ํžˆ ๋นˆ๋„ ๊ธฐ๋ฐ˜์ด๋ผ ์ •๋ณด๋ฅผ ์ถฉ๋ถ„ํžˆ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•˜๊ณ , PPL์€ ํ™•๋ฅ ์„ ์ง์ ‘ ์“ฐ์ง€๋งŒ ๊ด€์ธก๋˜์ง€ ์•Š์€ path์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ๊ฐ€ ๋ถ€์žฌํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ ์˜ ๋ฐฉ๋ฒ•์€ ๋น ๋ฅธ ์ถ”์ • ์˜ค์ฐจ ์ˆ˜๋ ด๊ณผ ๋‚ฎ์€ ๋ชจ๋ธ ์˜ค์ฐจ๋ฅผ ๋™์‹œ์— ๋‹ฌ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๋ฐฉ๋ฒ•

์ด๋ก  ํ”„๋ ˆ์ž„์›Œํฌ: Reasoning Error ๋ถ„ํ•ด

notion image
์ด ๋…ผ๋ฌธ์˜ ์ด๋ก ์  ํ† ๋Œ€๋Š” reasoning error๋ฅผ ๋‘ ๊ฐœ์˜ ๋…๋ฆฝ์  ์„ฑ๋ถ„์œผ๋กœ ๋ถ„ํ•ดํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋ฌธ์ œ ์ •์˜

์ถ”๋ก  ๋ฌธ์ œ ์—์„œ ๋Š” ์ž…๋ ฅ ์ฟผ๋ฆฌ, ๋Š” ์ •๋‹ต์ž…๋‹ˆ๋‹ค. LLM์€ reasoning path ์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ƒ์„ฑํ•˜๋ฉฐ, ์ถ”์ถœ ํ•จ์ˆ˜ ์ด ์ตœ์ข… ๋‹ต ๋ฅผ ๋„์ถœํ•ฉ๋‹ˆ๋‹ค.
Confidence: reasoning path ์˜ ์ƒ์„ฑ ํ™•๋ฅ  , ๋˜๋Š” ๋‹ต ์˜ ํ™•๋ฅ 
์‹ค์ œ๋กœ๋Š” ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ path๋ฅผ ์—ด๊ฑฐํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ, ๊ฐœ์˜ ์ƒ˜ํ”Œ ์œผ๋กœ confidence๋ฅผ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค.

Proposition 1: Error Decomposition (ํ•ต์‹ฌ ์ •๋ฆฌ)

์ž„์˜์˜ ์ž…๋ ฅ , ์ •๋‹ต , ๊ฐ€๋Šฅํ•œ ๋‹ต ์— ๋Œ€ํ•ด, reasoning error ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ถ„ํ•ด๋ฉ๋‹ˆ๋‹ค:
๊ฐ ํ•ญ์˜ ์˜๋ฏธ:
  • Estimation Error: ์ถ”์ •๋œ confidence ์™€ ์‹ค์ œ ํ™•๋ฅ  ์‚ฌ์ด์˜ ์ฐจ์ด. ์ƒ˜ํ”Œ ์ˆ˜ ๊ณผ ์ถ”์ • ์ „๋žต์— ์˜์กดํ•ฉ๋‹ˆ๋‹ค.
  • Model Error: LLM์ด ๋ถ€์—ฌํ•œ ์‹ค์ œ ํ™•๋ฅ  ์™€ ์ •๋‹ต ์ง€์‹œํ•จ์ˆ˜ ์‚ฌ์ด์˜ ์ฐจ์ด. LLM์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ ์ž์ฒด์— ์˜์กดํ•˜๋ฉฐ, ์ƒ˜ํ”Œ๋ง๊ณผ ๋ฌด๊ด€ํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์˜ ํ•œ๊ณ„๋ฅผ ๊ฐ๊ฐ ์–ด๋–ค ์˜ค์ฐจ ์„ฑ๋ถ„ ๋•Œ๋ฌธ์ธ์ง€ ์ž์„ธํ•˜๊ฒŒ ์ง„๋‹จํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ๋ถ„ํ•ด๊ฐ€ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

SC์˜ ์ด๋ก ์  ๋ถ„์„

Proposition 2: SC Reasoning Error Decomposition

SC์˜ confidence ์ถ”์ •:
ํ•ต์‹ฌ ๊ด€์ฐฐ:
  • Estimation Error๊ฐ€ โ€” ์ฆ‰ ์„ ํ˜•์ ์œผ๋กœ๋งŒ ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค
  • ์ƒ˜ํ”Œ ์ˆ˜๋ฅผ 2๋ฐฐ ๋Š˜๋ ค์•ผ ์˜ค์ฐจ๊ฐ€ ์ ˆ๋ฐ˜์ด ๋˜๋Š”, ํšจ์œจ์ด ๋‚ฎ์€ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค
  • ๋‹ค๋งŒ Model Error๋Š” SC์˜ consistency function ํŠน์„ฑ์ƒ ์ƒ๋Œ€์ ์œผ๋กœ ๋‚ฎ์Šต๋‹ˆ๋‹ค

PPL์˜ ์ด๋ก ์  ๋ถ„์„

Proposition 3: PPL Reasoning Error Decomposition

PPL์˜ confidence ์ถ”์ •:
ํ•ต์‹ฌ ๊ด€์ฐฐ:
  • Estimation Error์— ํ•ญ์ด ํฌํ•จ๋˜์–ด ์ง€์ˆ˜์ ์œผ๋กœ ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค
  • ํ•˜์ง€๋งŒ ์ด๋ฉด ์ด ๋˜์–ด ์ˆ˜๋ ด์ด ํ‡ดํ™”ํ•ฉ๋‹ˆ๋‹ค
  • Model Error๊ฐ€ SC๋ณด๋‹ค ์ผ๋ฐ˜์ ์œผ๋กœ ํฌ๋ฉฐ, path-level ํ™•๋ฅ ๊ณผ ์ •๋‹ต ์—ฌ๋ถ€์˜ ๊ดด๋ฆฌ๊ฐ€ ์›์ธ์ž…๋‹ˆ๋‹ค

์ „์ฒด ์•„ํ‚คํ…์ฒ˜: RPC ๋ฐฉ๋ฒ•

notion image
notion image
์ž…๋ ฅ ๋ฌธ์ œ x โ†’ LLM Sampling: n๊ฐœ์˜ reasoning path ์ƒ์„ฑ โ†’ Reasoning Pruning (RP): ์ €ํ™•๋ฅ  path ์ œ๊ฑฐ โ†’ Perplexity Consistency (PC): ๋‚จ์€ path๋กœ confidence ์ถ”์ • โ†’ ์ตœ์ข… ๋‹ต ์„ ํƒ

๋ชจ๋“ˆ 1: Perplexity Consistency (PC)

LLM ๋‚ด๋ถ€ ํ™•๋ฅ ์„ SC ํ”„๋ ˆ์ž„์›Œํฌ์— ํ†ตํ•ฉํ•˜์—ฌ, PPL์˜ ๋น ๋ฅธ ์ˆ˜๋ ด๊ณผ SC์˜ ๋‚ฎ์€ ๋ชจ๋ธ ์˜ค์ฐจ๋ฅผ ๋™์‹œ์— ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ณต์‹

๊ณ ์œ ํ•œ reasoning path ์ง‘ํ•ฉ ์— ๋Œ€ํ•ด, ์ž„์˜์˜ ๋‹ต ์˜ ์ถ”์ • ํ™•๋ฅ ,
SC์™€์˜ ์ฐจ์ด์ ์„ ๋ช…ํ™•ํžˆ ํ•˜๋ฉด,
  • SC: ๋‹ต์ด ์ธ path์˜ ๊ฐœ์ˆ˜๋ฅผ ์„ธ์„œ ์œผ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค (๋นˆ๋„ ๊ธฐ๋ฐ˜)
  • PC: ๋‹ต์ด ์ธ path์˜ ํ™•๋ฅ ์„ ํ•ฉ์‚ฐํ•ฉ๋‹ˆ๋‹ค (ํ™•๋ฅ  ๊ฐ€์ค‘)

Theorem 4: PC Reasoning Error Decomposition

(๋‹ต์ด ์ธ ๊ณ ์œ  path ์ˆ˜), ๋กœ ์ •์˜ํ•˜๋ฉด:
PC์˜ ์ด์ :
  • Estimation Error ์ˆ˜๋ ด์œจ: ์œผ๋กœ ์ง€์ˆ˜์  ์ˆ˜๋ ด โ€” PPL๊ณผ ๋™๊ธ‰
  • Model Error: SC์™€ ๋™์ผํ•œ ํ˜•ํƒœ โ€” SC์™€ ๋™๊ธ‰
์ฆ‰, PC๋Š” PPL์˜ ๋น ๋ฅธ ์ˆ˜๋ ด๊ณผ SC์˜ ๋‚ฎ์€ ๋ชจ๋ธ ์˜ค์ฐจ๋ฅผ ๋ชจ๋‘ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ˆ˜๋ ด ํ‡ดํ™” ๋ฌธ์ œ ์กด์žฌ.


๋ชจ๋“ˆ 2: Reasoning Pruning (RP)

์ €ํ™•๋ฅ  reasoning path๋ฅผ ์‚ฌ์ „์— ์ œ๊ฑฐํ•˜์—ฌ PC์˜ ์ˆ˜๋ ด ํ‡ดํ™” ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ๋ณธ ์•„์ด๋””์–ด

ํ™•๋ฅ  ๊ฐ€ ๋งค์šฐ ๋‚ฎ์€ ๋‹ต ๋Š” ์ •๋‹ต์ผ ๊ฐ€๋Šฅ์„ฑ์ด ํฌ๋ฐ•ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฐ path์˜ ์กด์žฌ๊ฐ€ PC์˜ ์ˆ˜๋ ด ์†๋„๋ฅผ ๋Œ์–ด๋‚ด๋ฆฌ๋ฏ€๋กœ, threshold ์ดํ•˜์˜ ๋ˆ„์  ํ™•๋ฅ ์„ ๊ฐ€์ง„ ๋‹ต์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค:

Theorem 7: Pruning์˜ ํšจ๊ณผ ๋ณด์žฅ

์ตœ์  threshold (์ •๋‹ต์˜ ์‹ค์ œ ํ™•๋ฅ )๋กœ ์„ค์ •ํ•˜๋ฉด, RP๋Š” ๋‹ค์Œ ํ™•๋ฅ  ์ด์ƒ์œผ๋กœ ์ตœ์ ์˜ ์˜ค์ฐจ ๊ฐ์†Œ๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค:
์—ฌ๊ธฐ์„œ ๋Š” ๋‹ต์ด ์ธ ์ƒ˜ํ”Œ ์ˆ˜์ž…๋‹ˆ๋‹ค.
์ด๋Š” model error ์ž์ฒด๋„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐ์†Œ์‹œํ‚ต๋‹ˆ๋‹ค. ์ž˜๋ชป๋œ ๋‹ต์— ํ• ๋‹น๋œ ํ™•๋ฅ ์„ ์ œ๊ฑฐํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์ž๋™ Threshold ๊ฒฐ์ •: Weibull Mixture Model


์„ธ ๋ฐฉ๋ฒ•์˜ ์ด๋ก ์  ๋น„๊ต ์š”์•ฝ

์ธก๋ฉด
Self-Consistency (SC)
Perplexity (PPL)
RPC (PC + RP)
Estimation Error ์ˆ˜๋ ด
์„ ํ˜•
์ง€์ˆ˜์ 
์ง€์ˆ˜์ 
Model Error
๋‚ฎ์Œ
๋†’์Œ
๋‚ฎ์Œ (SC ์ˆ˜์ค€)
์ €ํ™•๋ฅ  path ๋Œ€์‘
์ž์—ฐ ์ฒ˜๋ฆฌ
์ˆ˜๋ ด ํ‡ดํ™”
RP๋กœ ์ œ๊ฑฐ
Confidence ํ•ด์„
๋นˆ๋„ ๊ธฐ๋ฐ˜ (์ง๊ด€์ )
ํ™•๋ฅ  ๊ธฐ๋ฐ˜ (ํŽธํ–ฅ ๊ฐ€๋Šฅ)
ํ™•๋ฅ  ๊ฐ€์ค‘ ๋นˆ๋„ (๊ท ํ˜•์ )
ํ•„์š” ์ƒ˜ํ”Œ ์ˆ˜
๋งŽ์Œ (64-128)
์ ์Œ
SC ๋Œ€๋น„ 50% ์ดํ•˜

์‹คํ—˜ ๋ถ„์„

1. ํšจ์œจ์„ฑ (RQ1): ์ƒ˜ํ”Œ๋ง ๋น„์šฉ 50% ์ด์ƒ ์ ˆ๊ฐ

notion image
MathOdyssey์—์„œ๋Š” 71.4%์˜ ์ƒ˜ํ”Œ๋ง ์ ˆ๊ฐ์ด ์ด๋ฃจ์–ด์กŒ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ด๋ก ์—์„œ ์˜ˆ์ธกํ•œ PC์˜ ์ง€์ˆ˜์  ์ˆ˜๋ ด ์†๋„๊ฐ€ ์‹ค์ œ๋กœ ์ž‘๋™ํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

2. ์„ฑ๋Šฅ (RQ2): ๋™์ผ ์ƒ˜ํ”Œ ์ˆ˜์—์„œ ์ตœ๊ณ  ์ •ํ™•๋„

ย 
notion image
RPC๋Š” ๊ธฐ์กด ๋ฐฉ๋ฒ• ๋Œ€๋น„ ํ‰๊ท  1.29% ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค.

3. ์‹ ๋ขฐ์„ฑ (RQ3): ECE(Expected Calibration Error) ๊ฐ์†Œ

notion image
๋ฐฉ๋ฒ•
MATH ECE
MathOdyssey ECE
OlympiadBench ECE
AIME ECE
ํ‰๊ท  ECE
PPL
48.99
67.70
86.90
88.98
73.14
VERB
47.46
69.92
84.68
86.29
72.09
SC
6.71
12.23
20.20
14.35
13.37
RPC
6.41
9.87
18.86
14.32
12.37
RPC์˜ confidence ์ถ”์ •์ด ์‹ค์ œ ์ •๋‹ต๋ฅ ๊ณผ ๋” ์ž˜ calibration ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

4. ์ฝ”๋“œ ์ƒ์„ฑ ํƒœ์Šคํฌ์—์„œ๋„ ์œ ํšจ

notion image
Deepseek-Coder 33B ๋ชจ๋ธ๋กœ ์ฝ”๋“œ ์ƒ์„ฑ ๋ฒค์น˜๋งˆํฌ์—์„œ๋„ RPC๊ฐ€ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์—ฌ, ์ˆ˜ํ•™ ์ถ”๋ก ์— ๊ตญํ•œ๋˜์ง€ ์•Š๋Š” ๋ฒ”์šฉ์„ฑ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•œ๊ณ„

  1. Bernoulli ๊ฐ€์ • ์˜์กด: ์ด๋ก  ๋ถ„์„์ด LLM ์ƒ˜ํ”Œ๋ง์ด Bernoulli ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฐ€์ •์— ๊ธฐ๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ LLM ์ƒ์„ฑ์€ ๋” ๋ณต์žกํ•œ ๋ถ„ํฌ๋ฅผ ๋ณด์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  1. Weibull Mixture ํ”ผํŒ… ๋ถˆ์•ˆ์ •: ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ ๋งค์šฐ ์ ์„ ๋•Œ(n < 16) mixture model ํ”ผํŒ…์ด ๋ถˆ์•ˆ์ •ํ•ด์งˆ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์œผ๋ฉฐ, Truncated Mean์œผ๋กœ ์™„ํ™”ํ•˜์ง€๋งŒ ๊ทผ๋ณธ์  ํ•ด๊ฒฐ์€ ์•„๋‹™๋‹ˆ๋‹ค.
  1. ๊ธด reasoning chain์—์„œ์˜ ํ™•๋ฅ  ์ถ”์ •: path๊ฐ€ ๋งค์šฐ ๊ธธ์–ด์ง€๋ฉด ๋‚ด๋ถ€ ํ™•๋ฅ  ์˜ ๊ฐ’์ด ๊ทน๋„๋กœ ์ž‘์•„์ ธ, ์ˆ˜์น˜์  ์•ˆ์ •์„ฑ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย