BASICS Workshop 2025 报告详情

上午

李建清华大学 @ 8:30

压缩视角下的大模型扩展律与知识学习机制

大型语言模型（LLM）在众多任务中展现出卓越能力，但其内在机制以及诸如缩放定律、幻觉现象等相关行为仍缺乏系统的理论解释。本研究基于Kolmogorov复杂度与Shannon信息论，重新审视压缩与预测之间的经典关系，从而为理解LLM行为提供一个新的理论视角。通过运用Kolmogorov结构函数并将LLM压缩解读为两阶段编码过程，我们从理论上说明了LLM随模型和数据规模扩展获取并存储信息的规律：LLM首先学习普遍存在的句法模式，再获取逐渐稀疏和低频次的知识。受此理论视角及Heap定律、Zipf定律启发，我们提出一个简化但具代表性的贝叶斯数据生成框架——句法知识模型(Syntax-Knowledge model)。在贝叶斯框架下，我们证明该模型中可以推导出LLM多样化的学习行为与缩放规律。特别地，我们的理论分析为数据与模型缩放定律、训练与微调过程中的知识获取动态、LLM事实性知识幻觉等现象提供了直观的理论解释。实验结果验证了理论预测的合理性。

熊英飞北京大学 @ 9:05

程序设计语言知识增强的深度代码模型

大语言模型在代码编写等领域表现出强大的能力，但传统大语言模型主要把代码当作普通文本来训练，忽略了语法、类型、语义等程序设计语言知识。本报告介绍如何设计新型的代码表示，引导神经网络学习程序设计语言的知识，以在代码任务上达到更好的效果。本报告的技术已经在部分业界的模型中得到应用，比如DeepSeek和快手。

孙晓明中国科学院计算技术研究所 @ 9:40

量子线路综合优化

尹一通南京大学 @ 10:40

计算采样的理论基础

理论计算机科学的一个宗旨，是系统地解释与刻画计算问题因何而易、又因何而难，从而理解计算的本质与界限。如何依特定概率分布生成随机样本，即采样问题，是一类经典的计算问题。早在计算机诞生之初，冯·诺依曼等人便提出了蒙特卡罗法，通过随机采样高效估算难以确定的物理量。随着计算机科学步入数据与智能科学的新时代，从高维概率分布中高效采样已成为这一时代的基石性计算需求，在各类智能化计算任务中发挥着至关重要的作用。从高维吉布斯分布中采样，由于其计算复杂性展现出类似物理相变的临界现象，因而受到理论计算机科学领域的关注。本报告将系统地探讨吉布斯分布采样的“计算相变”（computational phase transition）现象，以及约束满足解采样的“局部引理”（local lemma）——这两者分别刻画了成对（pairwise）与高阶（high-order）相互作用变量采样的计算临界现象。此外，本报告还将探讨马尔可夫链-蒙特卡罗（MCMC）采样的并行化与局部化求解，这也为大数据时代的MCMC方法奠定了理论基础。

张驰豪上海交通大学 @ 11:15

On the complexity of sampling from non-log-concave distributions

Sampling from a d-dimensional distribution \(\mu\) with density \(p_{\mu}(x) \propto e^{-V(x)}\) is a central problem in many areas, including theoretical computer science, statistical physics, and machine learning. It is well-known that when the potential function \(V(x)\) is convex (or equivalently, when \(p_{\mu}\) is log-concave), or more generally, when \(\mu\) satisfies good isoperimetric inequalities, efficient sampling algorithms exist in various computational models. A common belief is that the sampling task becomes more difficult when \(V(x)\) is non-convex. On the other hand, data-based algorithms (e.g., denoising diffusion probabilistic models) developed in the machine learning community are very successful in practice when dealing with highly non-log-concave distributions (such as in image generation), and provide new insights into designing efficient sampling algorithms.

In this talk, we will start with a general tight (exponential) sampling complexity bound for any non-log-concave distribution \(\mu\) satisfying mild regularity conditions. Then, we will show how a common strengthening of these regularity conditions leads to an efficient (polynomial) sampling algorithm. Finally, we will discuss future directions for understanding the complexity of sampling from general distributions.

下午

陈林浙江大学 @ 13:00

Long Arithmetic Progressions in Sumsets and Subset Sums: Constructive Proofs and Efficient Witnesses

Existence of long arithmetic progression in sumsets and subset sums has been studied extensively in additive combinatorics. These results play a central role in the recent progress of fundamental problems in theoretical computer science including Knapsack and Subset Sum. The non-constructiveness of relevant additive combinatorics results affects their application in algorithms. In this paper, we provide constructive proofs for finite addition theorems [Sárközy'89 '94], which are fundamental results in additive combinatorics concerning the existence of long arithmetic progression in sumsets and subset sums. Our constructive proofs yield a near-linear time algorithm that returns an arithmetic progression explicitly, and moreover, for each term in the arithmetic progression, it also returns its representation as the sum of elements in the base set.

As an application, we can obtain an \(\tilde{O}(n)\)-time algorithm for the search version of dense subset sum. Another application of our result is Unbounded Subset Sum, where each input integer can be used an infinite number of times. A classic result on the Frobenius problem [Erdös and Graham '72] implies that for all \(t \geq 2a^2_{\max}/n\), the decision version can be solved trivially in linear time. It remains unknown whether the search version can be solved in the same time. Our result implies that for all \(t \geq ca^2_{\max}/n\) for some constant \(c\), a solution for Unbounded Subset Sum can be obtained in \(O(n \log a_{\max})\) time.

李绿周中山大学 @ 13:35

基于量子游走的算法设计

量子计算的有用性有赖于量子算法。然而，相比于经典算法领域，量子算法设计缺少方法论，没有太多成熟的“套路”可以遵循。幸运的是，过去20余年里，量子游走已经发展成为量子算法设计的一种重要模型，很多具有加速优势的量子算法都是基于量子游走设计的。本报告在回顾相关历史发展过程之后，重点汇报研究团队近年在这方面的初步研究成果，主要聚焦于图上各类搜索问题基于量子游走进行算法设计以及去随机化研究。

张民华东师范大学 @ 14:10

Safeguarding Neural Network-Controlled Systems via Formal Methods: From Safety-by-Design to Runtime Assurance

Neural networks (NNs) exhibit remarkable potential in decision-making and control systems. While neural networks can be trained by sophisticated Deep Reinforcement Learning (DRL) techniques to achieve optimal system performance under various constraints, a significant concern persists: the lack of provable safety guarantees for the trained decision-making models. The intrinsic complexity and opacity of these models make it profoundly challenging to rigorously guarantee their safety under various hosting environments, including the systems they control. Drawing on our recent works (e.g., CAV'22, CAV'24, and NeurIPS'23), we contend that formal methods are crucial for developing neural network controllers that are not only robust but also certifiable, thereby ensuring system safety from training through deployment. We demonstrate that integrating formal methods into learning and decision-making processes is essential to provide a comprehensive safety guarantee for the controlled systems across their entire design, training, and execution lifecycle.

黄增峰复旦大学 @ 15:10

Simple and Optimal Algorithms for Heavy Hitters and Frequency Moments in Distributed Models

We consider the problems of distributed heavy hitters and frequency moments in both the coordinator model and the distributed tracking model. We present simple and optimal algorithms for heavy hitters and frequency moments estimation in these distributed models. For \(\ell_p\) heavy hitters in the coordinator model, our algorithm requires only one round and uses \(\tilde{O}(k^{p-1}/\epsilon^p)\) bits of communication. For \(p > 2\), this is the first near-optimal result. By combining our algorithm with the standard recursive sketching technique, we obtain a near-optimal two-round algorithm for \(F_p\) in the coordinator model, matching a significant result from recent work by Esfandiari et al. (STOC 2024). Our algorithm and analysis are much simpler and have better cost with respect to logarithmic factors. Due to the simplicity of our heavy hitter algorithms, we manage to adapt them to the distributed tracking model with only a \(\texttt{polylog}(n)\) increase in communication. This presents the first near-optimal algorithm for heavy hitters. By applying the recursive sketching technique, we also provide the first near-optimal algorithm for \(F_p\) in the distributed tracking model for all \(p \geq 2\). Even for \(F_2\), our result improves upon the bounds established by Cormode, Muthukrishnan, and Yi (SODA 2008) and Woodruff and Zhang (STOC 2012), nearly matching the existing lower bound for the first time.

宋富中国科学院软件研究所 @ 15:45

A Formally Verified Procedure for Width Inference in FIRRTL

IRRTL is an intermediate representation language for Register Transfer Level (RTL) hardware designs. In FIRRTL programs, the bit widths of some components may not be given explicitly, thus they must be inferred during compilation. In mainstream FIRRTL compilers such as firtool, the width inference is conducted by a compilation pass called InferWidths, which may fail even for simple FIRRTL programs. In this talk, we investigate the width inference problem for FIRRTL programs. We show that if the constraint obtained from a FIRRTL program is satisfiable, there must exist a unique least solution. Based on this result, we propose a complete procedure for solving the width inference problem, which can handle programs while firtool may fail. We implement the procedure in Rocq and prove its correctness. From the Rocq implementation, we extract an OCaml implementation, which is the first formally verified InferWidths pass. Extensive experiments demonstrate that it can solve more instances than the official InferWidths pass in firtool using less time.