@xixi003

About
Blog
LLM
Papers

LLM Knowledge Base

Large Language Model knowledge: papers, interview questions, and core topics.

Interview Questions

2026-04-24 Interview: 如何设计一个能处理异构请求（不同长度、不同模型）的LLM推理集群？
2026-04-23 Interview: Pipeline Parallelism的bubble比例如何计算？有什么方法降低bubble？
2026-04-22 Interview: Tensor Parallelism中AllReduce的通信发生在哪些位置？对延迟的影响有多大？
2026-04-21 Interview: 如何在不重新训练的情况下将一个4K上下文模型扩展到32K？YaRN和NTK-aware的区别？
2026-04-20 Interview: 长上下文模型(100K+)的KV Cache显存如何管理？有哪些压缩技术？
2026-04-19 Interview: 结构化输出(JSON Mode)的实现原理是什么？Constrained Decoding的计算开销多大？
2026-04-18 Interview: Top-k、Top-p和Temperature三个采样参数之间如何交互？设置不当会怎样？
2026-04-17 Interview: 模型服务的First Token Latency和Throughput之间有什么trade-off？如何平衡？
2026-04-16 Interview: 一个7B模型FP16推理的理论吞吐量瓶颈在哪？是计算还是内存带宽？
2026-04-15 Interview: Prefix Caching在多轮对话场景能节省多少计算？有什么前提条件？
2026-04-14 Interview: Continuous Batching相比Static Batching的优势在哪？实现时有什么挑战？
2026-04-13 Interview: 投机解码(Speculative Decoding)的正确性是如何数学保证的？什么时候效果最好？
2026-04-12 Interview: vLLM的PagedAttention解决了什么问题？与传统静态内存分配相比效率提升多少？
2026-04-11 Interview: GPTQ和AWQ的核心区别是什么？AWQ为什么号称"activation-aware"？
2026-04-10 Interview: 量化中INT4和FP8各适合什么场景？为什么有些层不能量化？
2026-04-09 Interview: MT-Bench与AlpacaEval的评测盲区深度分析
2026-04-08 Interview: 知识蒸馏中Teacher-Student能力差距过大的问题与解决方案
2026-04-07 Interview: 模型合并(Model Merging)的工作原理与TIES-Merging、DARE的核心思想
2026-04-06 Interview: 过度对齐(Over-alignment)的表现形式与检测方法
2026-04-05 Interview: 拒绝采样与Best-of-N在对齐中的优劣势对比
2026-04-04 Interview: 多轮对话训练中loss计算策略的深度分析
2026-04-03 Interview: Chat Template设计的重要性与模板不兼容导致的问题
2026-04-02 Interview: 对齐税(Alignment Tax)的本质、量化与缓解策略
2026-04-01 Interview: Constitutional AI的自我批评机制原理与局限性
2026-03-31 Interview: DPO与PPO的真正优劣势对比及DeepSeek-R1回归PPO的原因
2026-03-31 Interview: PPO在LLM对齐中的训练难度分析与KL散度惩罚的核心作用
2026-03-28 Interview: LoRA的rank选择理论指导与不同任务上rank敏感性分析
2026-03-27 Interview: 指令数据的多样性与质量权衡及数据质量量化方法
2026-03-26 Interview: LIMA论文"1000条数据足够SFT"的结论适用边界分析
2026-03-25 Interview: SFT学习率为何远低于预训练及SFT常见问题分析
2026-03-24 Interview: Tensor Parallelism与Pipeline Parallelism的适用场景深度对比
2026-03-23 Interview: Gradient Checkpointing的时空权衡比例与checkpoint层选择策略
2026-03-22 Interview: ZeRO Stage 1/2/3的分片内容与Stage 3通信量分析
2026-03-21 Interview: 分布式训练中AllReduce通信量计算与Ring vs Tree AllReduce对比
2026-03-20 Interview: 为特定领域扩充Tokenizer词表的完整流程与注意事项
2026-03-19 Interview: 灾难性遗忘的本质原因与经典方法在LLM场景的局限
2026-03-18 Interview: 持续预训练(CPT)时领域数据与通用数据的配比策略
2026-03-17 Interview: 数据去重为什么如此重要？完全不去重和过度去重分别会导致什么问题？
2026-03-16 Interview: 预训练数据中代码数据的占比对模型推理能力有什么影响？有什么实验证据？
2026-03-15 Interview: Chinchilla定律在工业界为什么经常被违反？over-training的合理性在哪？
2026-03-14 Interview: Scaling Laws说loss随计算量呈幂律下降，但这个规律有没有失效的时候？
2026-03-13 Interview: 预训练loss突然出现spike，你的排查思路和应对策略是什么？
2026-03-12 Interview: 混合精度训练中BF16比FP16更适合LLM训练的根本原因是什么？
2026-03-11 Interview: Adam和AdamW的区别不只是名字——解释weight decay在这两个优化器中的数学差异。
2026-03-10 Interview: 预训练时Batch Size和Learning Rate如何协同调整？Linear Scaling Rule的局限性是什么？
2026-03-09 Interview: 交叉熵损失函数的label smoothing在LLM中有什么作用？什么时候该用什么时候不该用？
2026-03-08 Interview: BPE分词算法的合并策略对模型性能有什么影响？中文场景的分词有什么特殊考虑？
2026-03-07 Interview: 如果让你从零设计一个7B参数的LLM架构，你会如何分配层数、隐藏维度和头数？
2026-03-06 Interview: Transformer的计算瓶颈在Attention还是FFN？训练和推理时有什么不同？
2026-03-05 Interview: RMSNorm相比LayerNorm去掉了什么？为什么去掉均值中心化反而更好？
2026-03-04 Interview: SwiGLU比ReLU/GELU好在哪里？为什么现代LLM几乎都切换到了SwiGLU？
2026-03-03 Interview: MoE架构的Router负载均衡为什么是一个难题？DeepSeek-V2是怎么解决的？
2026-03-02 Interview: KV Cache在推理时如何工作？它的显存占用公式是什么？什么因素影响最大？
2026-03-01 Interview: GQA和MQA相比标准MHA牺牲了什么换取了什么？为什么LLaMA-2 70B选择GQA？
2026-02-28 Interview: Flash Attention没有改变数学计算结果，为什么能加速2-4倍？瓶颈到底在哪？
2026-02-27 Interview: 为什么Decoder-only架构在大规模预训练中胜出？Encoder-Decoder架构真的不行吗？
2026-02-26 Interview: Transformer中FFN的作用到底是什么？有研究认为FFN是知识存储的主要载体，你怎么看？
2026-02-25 Interview: RoPE相比绝对位置编码和ALiBi各有什么优劣？RoPE为什么能支持长度外推？
2026-02-24 Interview: 为什么现代LLM都用Pre-Norm而不是Post-Norm？Post-Norm有没有优势？
2026-02-23 Interview: Multi-Head Attention的参数量和Single-Head完全相同，那多头的优势本质上来自哪里？
2026-02-23 Interview: Self-Attention为什么需要Q、K、V三个矩阵？用同一个矩阵行不行？
2026-02-23 Interview: 为什么Transformer使用缩放点积注意力而不是加性注意力？缩放因子1/√d_k的数学直觉是什么？

Topics

2026-04-24 推理优化全景图
2026-04-23 知识蒸馏(Distillation)
2026-04-22 模型合并(Model Merging)技术
2026-04-21 过度对齐(Over-alignment)问题
2026-04-20 人类评估vs自动评估
2026-04-19 MT-Bench/AlpacaEval评估方法
2026-04-18 安全对齐与Red Teaming
2026-04-17 SPIN自我博弈微调
2026-04-16 Self-Play自博弈训练
2026-04-15 拒绝采样(Rejection Sampling)
2026-04-14 多轮对话训练技巧
2026-04-13 System Prompt工程与最佳实践
2026-04-12 Chat Template与对话格式
2026-04-11 对齐税(Alignment Tax)
2026-04-10 数据质量vs数据数量的权衡
2026-04-09 Constitutional AI：AI自我约束
2026-04-08 ORPO/SimPO/KTO新型对齐算法
2026-04-07 DPO直接偏好优化
2026-04-06 PPO算法在LLM中的应用
2026-04-05 Reward Model训练详解
2026-04-04 RLHF概述：从人类反馈中学习
2026-04-03 Full Fine-tuning vs LoRA对比
2026-04-02 QLoRA：4bit量化微调
2026-04-01 LoRA低秩适配原理
2026-03-31 指令数据构造方法论
2026-03-31 SFT监督微调详解
2026-03-28 SwiGLU激活函数
2026-03-28 RMSNorm：更高效的归一化
2026-03-28 KV Cache缓存机制
2026-03-27 Flash Attention原理与实现
2026-03-26 GQA/MQA注意力优化
2026-03-25 RoPE旋转位置编码
2026-03-24 LLaMA架构深度解析
2026-03-23 GPT系列架构演进(GPT-1到GPT-4)
2026-03-22 Chinchilla定律：最优训练配置
2026-03-21 Scaling Laws：模型规模的科学
2026-03-20 Tokenizer训练：构建你的词表
2026-03-19 数据去重与质量过滤技术
2026-03-18 预训练数据清洗与质量控制
2026-03-17 持续预训练(Continual Pre-training)
2026-03-16 掩码语言模型(Masked LM)
2026-03-15 自回归语言模型(Causal LM)
2026-03-14 预训练(Pre-training)概述
2026-03-13 模型参数量与计算量(FLOPs)估算
2026-03-12 GPU显存占用分析与计算
2026-03-11 分布式训练基础(DP/DDP)
2026-03-10 混合精度训练(FP16/BF16)
2026-03-09 过拟合与正则化策略
2026-03-08 梯度下降与优化器(Adam/AdamW)
2026-03-07 Batch Size与Learning Rate的关系
2026-03-06 困惑度(Perplexity)：衡量语言模型好坏
2026-03-05 交叉熵损失函数在LLM中的应用
2026-03-04 BPE/WordPiece/SentencePiece分词算法
2026-03-03 Softmax函数与温度参数
2026-03-02 Layer Normalization与残差连接
2026-03-01 Encoder与Decoder的区别与联系
2026-02-28 Transformer整体架构解析
2026-02-27 位置编码(Positional Encoding)
2026-02-26 Multi-Head Attention多头注意力
2026-02-25 Self-Attention自注意力详解
2026-02-24 注意力机制(Attention)的本质
2026-02-23 Embedding向量表示：从离散到连续
2026-02-23 Token与分词：LLM如何理解文字

Papers

2026-04-24 Paper: Text Embeddings by Weakly-Supervised Contrastive Pre-training (E5)
2026-04-23 Paper: Gorilla: Large Language Model Connected with Massive APIs
2026-04-22 Paper: Toolformer: Language Models Can Teach Themselves to Use Tools
2026-04-21 Paper: Active Retrieval Augmented Generation (FLARE)
2026-04-20 Paper: Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
2026-04-19 Paper: GraphRAG: Unlocking LLM Discovery on Narrative Private Data
2026-04-18 Paper: Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE)
2026-04-17 Paper: Corrective Retrieval Augmented Generation (CRAG)
2026-04-16 Paper: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
2026-04-15 Paper: REALM: Retrieval-Augmented Language Model Pre-Training
2026-04-14 Paper: ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction
2026-04-13 Paper: Dense Passage Retrieval for Open-Domain Question Answering
2026-04-12 Paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
2026-04-11 Paper: RETRO: Improving Language Models by Retrieving from Trillions of Tokens
2026-04-10 Paper: Model Soups: Averaging Weights of Multiple Fine-tuned Models
2026-04-09 Paper: NEFTune: Noisy Embeddings Improve Instruction Finetuning
2026-04-08 Paper: DoRA: Weight-Decomposed Low-Rank Adaptation
2026-04-07 Paper: QLoRA: Efficient Finetuning of Quantized Language Models
2026-04-06 Paper: LoRA: Low-Rank Adaptation of Large Language Models
2026-04-05 Paper: Scaling Data-Constrained Language Models
2026-04-04 Paper: Curriculum Learning for LLMs
2026-04-03 Paper: Deduplication与数据质量
2026-04-02 Paper: Textbooks Are All You Need II: phi-1.5
2026-04-01 Paper: Code Llama: Open Foundation Models for Code
2026-03-31 Paper: DeepSeek-Coder: When the Large Language Model Meets Programming
2026-03-31 Paper: StarCoder: May the Source Be with You
2026-03-28 Paper: Rejection Sampling与Best-of-N在对齐中的应用
2026-03-27 Paper: UltraFeedback: Boosting Language Models with High-quality Feedback
2026-03-26 Paper: Zephyr: Direct Distillation of LM Alignment
2026-03-25 Paper: Orca: Progressive Learning from Complex Explanation Traces
2026-03-24 Paper: WizardLM: Empowering LLMs to Follow Complex Instructions (Evol-Instruct)
2026-03-23 Paper: SPIN: Self-Play Fine-Tuning
2026-03-22 Paper: Proximal Policy Optimization Algorithms (PPO)
2026-03-21 Paper: KTO: Model Alignment as Prospect Theoretic Optimization
2026-03-20 Paper: ORPO: Monolithic Preference Optimization without Reference Model
2026-03-19 Paper: Direct Preference Optimization (DPO)
2026-03-18 Paper: Constitutional AI: Harmlessness from AI Feedback
2026-03-17 Paper: LIMA: Less Is More for Alignment
2026-03-16 Paper: Stanford Alpaca: An Instruction-following LLaMA Model
2026-03-15 Paper: Self-Instruct: Aligning Language Models with Self-Generated Instructions
2026-03-14 Paper: Training language models to follow instructions with human feedback
2026-03-13 Paper: Scaling Laws for Neural Language Models
2026-03-12 Paper: RoFormer: Enhanced Transformer with Rotary Position Embedding
2026-03-11 Paper: GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
2026-03-10 Paper: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
2026-03-09 Paper: PaLM: Scaling Language Modeling with Pathways
2026-03-08 Paper: DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2026-03-07 Paper: Textbooks Are All You Need
2026-03-06 Paper: RWKV: Reinventing RNNs for the Transformer Era
2026-03-05 Paper: Mamba: Linear-Time Sequence Modeling with Selective State Spaces
2026-03-04 Paper: Mistral 7B
2026-03-03 Paper: LLaMA: Open and Efficient Foundation Language Models
2026-03-02 Paper: Training Compute-Optimal Large Language Models
2026-03-01 Paper: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2026-02-28 Paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2026-02-27 Paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2026-02-26 Paper: Language Models are Few-Shot Learners
2026-02-25 Paper: Language Models are Unsupervised Multitask Learners
2026-02-24 Paper: Improving Language Understanding by Generative Pre-Training
2026-02-23 Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2026-02-23 Paper: Attention Is All You Need