Daily Papers Arch&EAI

2026-02-26 11:21
Snapshot: 20260226_1121
TimeBlind: A Spatio-Temporal Compositionality Benchmark for Video LLMs
Authors: Baiqi Li, Kangyi Zhao, Ce Zhang, Chancharik Mitra, Jean de Dieu Nyandwi, Gedas Bertasius
First: 2026-01-30T20:21:46+00:00 · Latest: 2026-02-25T18:57:52+00:00
Comments: For code and data, see https://baiqi-li.github.io/timeblind_project/
Abstract
Fine-grained spatio-temporal understanding is essential for video reasoning and embodied AI. Yet, while Multimodal Large Language Models (MLLMs) master static semantics, their grasp of temporal dynamics remains brittle. We present TimeBlind, a diagnostic benchmark for compositional spatio-temporal understanding. Inspired by cognitive science, TimeBlind categorizes fine-grained temporal understanding into three levels: recognizing atomic events, characterizing event properties, and reasoning about event interdependencies. Unlike benchmarks that conflate recognition with temporal reasoning, TimeBlind leverages a minimal-pairs paradigm: video pairs share identical static visual content but differ solely in temporal structure, utilizing complementary questions to neutralize language priors. Evaluating over 20 state-of-the-art MLLMs (e.g., GPT-5, Gemini 3 Pro) on 600 curated instances (2400 video-question pairs), reveals that the Instance Accuracy (correctly distinguishing both videos in a pair) of the best performing MLLM is only 48.2%, far below the human performance (98.2%). These results demonstrate that even frontier models rely heavily on static visual shortcuts rather than genuine temporal logic, positioning TimeBlind as a vital diagnostic tool for next-generation video understanding. Dataset and code are available at https://baiqi-li.github.io/timeblind_project/ .
Summary / 总结
Fine-grained spatio-temporal understanding is essential for video reasoning and embodied AI.
Recursive Belief Vision Language Action Models
Authors: Vaidehi Bagaria, Bijo Sebastian, Nirav Kumar Patel
First: 2026-02-24T08:02:16+00:00 · Latest: 2026-02-25T17:38:24+00:00
Abstract
Vision-language-action models must enable agents to execute long-horizon tasks under partial observability. However, most existing approaches remain observation-driven, relying on short context windows or repeated queries to vision-language models (VLMs). This leads to loss of task progress, action repetition under perceptual aliasing, and high inference latency. While semantic grounding is important, long-horizon manipulation fundamentally requires persistent, action-conditioned state representations. Current VLAs lack such representations and exhibit limited temporal and physical reasoning, making them ill-suited for multi-stage control. This paper introduces RB-VLA, a belief-centric architecture trained with self-supervised world-model objectives that maintains a compact latent state encoding task-relevant history, dynamics, and object interactions. Queried once per task, the VLM provides high-level intent, while the belief tracks task progress and enables phase-aware, causally grounded control under partial observability without storing raw observations or scaling memory with time. The belief and intent jointly condition a diffusion policy for robust closed-loop execution. RB-VLA outperforms prior VLAs on long-horizon benchmarks, achieving 52.5 percent and 37.5 percent higher success rates on multi-stage pick-and-place and stacking tasks, respectively, compared to pi_0. It also reduces inference latency by up to five times relative to baselines and eliminates memory growth across timesteps observed in existing VLAs. Ablations show the belief module is the primary driver of performance, increasing success rates from 32.5 percent without belief to 77.5 percent with belief.
Summary / 总结
Vision-language-action models must enable agents to execute long-horizon tasks under partial observability.
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models
Authors: Jingxuan Zhang, Yunta Hsieh, Zhongwei Wan, Haokun Lin, Xin Wang, Ziqi Wang, Yingtie Lei, Mi Zhang
First: 2026-02-23T19:55:54+00:00 · Latest: 2026-02-25T17:11:08+00:00
Comments: CVPR2026
Abstract
Vision-language-action (VLA) models unify perception, language, and control for embodied agents but face significant challenges in practical deployment due to rapidly increasing compute and memory demands, especially as models scale to longer horizons and larger backbones. To address these bottlenecks, we introduce QuantVLA, a training-free post-training quantization (PTQ) framework that, to our knowledge, is the first PTQ approach for VLA systems and the first to successfully quantize a diffusion transformer (DiT) action head. QuantVLA incorporates three scale-calibrated components: (1) a selective quantization layout that integerizes all linear layers in both the language backbone and the DiT while keeping attention projections in floating point to preserve the original operator schedule; (2) attention temperature matching, a lightweight per-head scaling mechanism that stabilizes attention logits and is folded into the dequantization scales at inference; and (3) output head balancing, a per-layer residual interface calibration that mitigates post-projection energy drift. The framework requires no additional training, uses only a small unlabeled calibration buffer, and supports integer kernels for low-bit weights and activations while leaving the architecture unchanged. Across representative VLA models on LIBERO, QuantVLA exceeds the task success rates of full-precision baselines, achieves about 70% relative memory savings on the quantized components, and delivers a 1.22x speedup in end-to-end inference latency, providing a practical pathway toward scalable low-bit embodied intelligence under strict compute, memory, and power constraints.
Summary / 总结
Vision-language-action (VLA) models unify perception, language, and control for embodied agents but face significant challenges in practical deployment due to rapidly increasing compute and memory demands, especially as models scale to longer horizons and larger backbones.
FlowCorrect: Efficient Interactive Correction of Generative Flow Policies for Robotic Manipulation
Authors: Edgar Welte, Yitian Shi, Rosa Wolf, Maximillian Gilles, Rania Rayyes
First: 2026-02-25T16:06:49+00:00 · Latest: 2026-02-25T16:06:49+00:00
Comments: 8 pages, 5 figures
Abstract
Generative manipulation policies can fail catastrophically under deployment-time distribution shift, yet many failures are near-misses: the robot reaches almost-correct poses and would succeed with a small corrective motion. We present FlowCorrect, a deployment-time correction framework that converts near-miss failures into successes using sparse human nudges, without full policy retraining. During execution, a human provides brief corrective pose nudges via a lightweight VR interface. FlowCorrect uses these sparse corrections to locally adapt the policy, improving actions without retraining the backbone while preserving the model performance on previously learned scenarios. We evaluate on a real-world robot across three tabletop tasks: pick-and-place, pouring, and cup uprighting. With a low correction budget, FlowCorrect improves success on hard cases by 85\% while preserving performance on previously solved scenarios. The results demonstrate clearly that FlowCorrect learns only with very few demonstrations and enables fast and sample-efficient incremental, human-in-the-loop corrections of generative visuomotor policies at deployment time in real-world robotics.
Summary / 总结
Generative manipulation policies can fail catastrophically under deployment-time distribution shift, yet many failures are near-misses: the robot reaches almost-correct poses and would succeed with a small corrective motion.
World Guidance: World Modeling in Condition Space for Action Generation
Authors: Yue Su, Sijin Chen, Haixin Shi, Mingyu Liu, Zhengshen Zhang, Ningyuan Huang, Weiheng Zhong, Zhengbang Zhu, Yuxiao Liu, Xihui Liu
First: 2026-02-25T15:27:09+00:00 · Latest: 2026-02-25T15:27:09+00:00
Comments: Project Page: https://selen-suyue.github.io/WoGNet/
Abstract
Leveraging future observation modeling to facilitate action generation presents a promising avenue for enhancing the capabilities of Vision-Language-Action (VLA) models. However, existing approaches struggle to strike a balance between maintaining efficient, predictable future representations and preserving sufficient fine-grained information to guide precise action generation. To address this limitation, we propose WoG (World Guidance), a framework that maps future observations into compact conditions by injecting them into the action inference pipeline. The VLA is then trained to simultaneously predict these compressed conditions alongside future actions, thereby achieving effective world modeling within the condition space for action inference. We demonstrate that modeling and predicting this condition space not only facilitates fine-grained action generation but also exhibits superior generalization capabilities. Moreover, it learns effectively from substantial human manipulation videos. Extensive experiments across both simulation and real-world environments validate that our method significantly outperforms existing methods based on future prediction. Project page is available at: https://selen-suyue.github.io/WoGNet/
Summary / 总结
Leveraging future observation modeling to facilitate action generation presents a promising avenue for enhancing the capabilities of Vision-Language-Action (VLA) models.
Are Foundation Models the Route to Full-Stack Transfer in Robotics?
Authors: Freek Stulp, Samuel Bustamante, João Silvério, Alin Albu-Schäffer, Jeannette Bohg, Shuran Song
First: 2026-02-25T15:19:44+00:00 · Latest: 2026-02-25T15:19:44+00:00
Comments: 12 pages, 4 figures
Abstract
In humans and robots alike, transfer learning occurs at different levels of abstraction, from high-level linguistic transfer to low-level transfer of motor skills. In this article, we provide an overview of the impact that foundation models and transformer networks have had on these different levels, bringing robots closer than ever to "full-stack transfer". Considering LLMs, VLMs and VLAs from a robotic transfer learning perspective allows us to highlight recurring concepts for transfer, beyond specific implementations. We also consider the challenges of data collection and transfer benchmarks for robotics in the age of foundation models. Are foundation models the route to full-stack transfer in robotics? Our expectation is that they will certainly stay on this route as a key technology.
Summary / 总结
In humans and robots alike, transfer learning occurs at different levels of abstraction, from high-level linguistic transfer to low-level transfer of motor skills.
Compact Circulant Layers with Spectral Priors
Authors: Joseph Margaryan, Thomas Hamelryck
First: 2026-02-25T14:48:25+00:00 · Latest: 2026-02-25T14:48:25+00:00
Abstract
Critical applications in areas such as medicine, robotics and autonomous systems require compact (i.e., memory efficient), uncertainty-aware neural networks suitable for edge and other resource-constrained deployments. We study compact spectral circulant and block-circulant-with-circulant-blocks (BCCB) layers: FFT-diagonalizable circular convolutions whose weights live directly in the real FFT (RFFT) half (1D) or half-plane (2D). Parameterizing filters in the frequency domain lets us impose simple spectral structure, perform structured variational inference in a low-dimensional weight space, and calculate exact layer spectral norms, enabling inexpensive global Lipschitz bounds and margin-based robustness diagnostics. By placing independent complex Gaussians on the Hermitian support we obtain a discrete instance of the spectral representation of stationary kernels, inducing an exact stationary Gaussian-process prior over filters on the discrete circle/torus. We exploit this to define a practical spectral prior and a Hermitian-aware low-rank-plus-diagonal variational posterior in real coordinates. Empirically, spectral circulant/BCCB layers are effective compact building blocks in both (variational) Bayesian and point estimate regimes: compact Bayesian neural networks on MNIST->Fashion-MNIST, variational heads on frozen CIFAR-10 features, and deterministic ViT projections on CIFAR-10/Tiny ImageNet; spectral layers match strong baselines while using substantially fewer parameters and with tighter Lipschitz certificates.
Summary / 总结
Critical applications in areas such as medicine, robotics and autonomous systems require compact (i.e., memory efficient), uncertainty-aware neural networks suitable for edge and other resource-constrained deployments.
Adaptive Diffusion Constrained Sampling for Bimanual Robot Manipulation
Authors: Haolei Tong, Yuezhe Zhang, Sophie Lueth, Georgia Chalvatzaki
Venue: ICRA 2026
First: 2025-05-19T19:12:29+00:00 · Latest: 2026-02-25T14:04:56+00:00
Comments: Accepted by IEEE International Conference on Robotics and Automation 2026(ICRA 2026)
Abstract
Coordinated multi-arm manipulation requires satisfying multiple simultaneous geometric constraints across high-dimensional configuration spaces, which poses a significant challenge for traditional planning and control methods. In this work, we propose Adaptive Diffusion Constrained Sampling (ADCS), a generative framework that flexibly integrates both equality (e.g., relative and absolute pose constraints) and structured inequality constraints (e.g., proximity to object surfaces) into an energy-based diffusion model. Equality constraints are modeled using dedicated energy networks trained on pose differences in Lie algebra space, while inequality constraints are represented via Signed Distance Functions (SDFs) and encoded into learned constraint embeddings, allowing the model to reason about complex spatial regions. A key innovation of our method is a Transformer-based architecture that learns to weight constraint-specific energy functions at inference time, enabling flexible and context-aware constraint integration. Moreover, we adopt a two-phase sampling strategy that improves precision and sample diversity by combining Langevin dynamics with resampling and density-aware re-weighting. Experimental results on dual-arm manipulation tasks show that ADCS significantly improves sample diversity and generalization across settings demanding precise coordination and adaptive constraint handling.
Summary / 总结
Coordinated multi-arm manipulation requires satisfying multiple simultaneous geometric constraints across high-dimensional configuration spaces, which poses a significant challenge for traditional planning and control methods.
GFPL: Generative Federated Prototype Learning for Resource-Constrained and Data-Imbalanced Vision Task
Authors: Shiwei Lu, Yuhang He, Jiashuo Li, Qiang Wang, Yihong Gong
First: 2026-02-25T12:57:45+00:00 · Latest: 2026-02-25T12:57:45+00:00
Abstract
Federated learning (FL) facilitates the secure utilization of decentralized images, advancing applications in medical image recognition and autonomous driving. However, conventional FL faces two critical challenges in real-world deployment: ineffective knowledge fusion caused by model updates biased toward majority-class features, and prohibitive communication overhead due to frequent transmissions of high-dimensional model parameters. Inspired by the human brain's efficiency in knowledge integration, we propose a novel Generative Federated Prototype Learning (GFPL) framework to address these issues. Within this framework, a prototype generation method based on Gaussian Mixture Model (GMM) captures the statistical information of class-wise features, while a prototype aggregation strategy using Bhattacharyya distance effectively fuses semantically similar knowledge across clients. In addition, these fused prototypes are leveraged to generate pseudo-features, thereby mitigating feature distribution imbalance across clients. To further enhance feature alignment during local training, we devise a dual-classifier architecture, optimized via a hybrid loss combining Dot Regression and Cross-Entropy. Extensive experiments on benchmarks show that GFPL improves model accuracy by 3.6% under imbalanced data settings while maintaining low communication cost.
Summary / 总结
Federated learning (FL) facilitates the secure utilization of decentralized images, advancing applications in medical image recognition and autonomous driving.
MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation
Authors: Iman Ahmadi, Mehrshad Taji, Arad Mahdinezhad Kashani, AmirHossein Jadidi, Saina Kashani, Babak Khalaj
First: 2026-02-18T21:28:56+00:00 · Latest: 2026-02-25T11:49:07+00:00
Abstract
Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings. MALLVI presents a Multi Agent Large Language and Vision framework that enables closed-loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVI generates executable atomic actions for a robot manipulator. After action execution, a Vision Language Model (VLM) evaluates environmental feedback and decides whether to repeat the process or proceed to the next step. Rather than using a single model, MALLVI coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level planning. An optional Descriptor agent provides visual memory of the initial state. The Reflector supports targeted error detection and recovery by reactivating only relevant agents, avoiding full replanning. Experiments in simulation and real-world settings show that iterative closed loop multi agent coordination improves generalization and increases success rates in zero shot manipulation tasks. Code available at https://github.com/iman1234ahmadi/MALLVI .
Summary / 总结
Task planning for robotic manipulation with large language models (LLMs) is an emerging area.
DexRepNet++: Learning Dexterous Robotic Manipulation with Geometric and Spatial Hand-Object Representations
Authors: Qingtao Liu, Zhengnan Sun, Yu Cui, Haoming Li, Gaofeng Li, Lin Shao, Jiming Chen, Qi Ye
Venue: IEEE Transactions on Robotics, vol. 42, pp. 799-818, 2026
First: 2026-02-25T11:38:07+00:00 · Latest: 2026-02-25T11:38:07+00:00
Comments: Accepted by IEEE Transactions on Robotics (T-RO), 2026
Abstract
Robotic dexterous manipulation is a challenging problem due to high degrees of freedom (DoFs) and complex contacts of multi-fingered robotic hands. Many existing deep reinforcement learning (DRL) based methods aim at improving sample efficiency in high-dimensional output action spaces. However, existing works often overlook the role of representations in achieving generalization of a manipulation policy in the complex input space during the hand-object interaction. In this paper, we propose DexRep, a novel hand-object interaction representation to capture object surface features and spatial relations between hands and objects for dexterous manipulation skill learning. Based on DexRep, policies are learned for three dexterous manipulation tasks, i.e. grasping, in-hand reorientation, bimanual handover, and extensive experiments are conducted to verify the effectiveness. In simulation, for grasping, the policy learned with 40 objects achieves a success rate of 87.9% on more than 5000 unseen objects of diverse categories, significantly surpassing existing work trained with thousands of objects; for the in-hand reorientation and handover tasks, the policies also boost the success rates and other metrics of existing hand-object representations by 20% to 40%. The grasp policies with DexRep are deployed to the real world under multi-camera and single-camera setups and demonstrate a small sim-to-real gap.
Summary / 总结
Robotic dexterous manipulation is a challenging problem due to high degrees of freedom (DoFs) and complex contacts of multi-fingered robotic hands.
Parallel Split Learning with Global Sampling
Authors: Mohammad Kohankhaki, Ahmad Ayad, Mahdi Barhoush, Anke Schmeink
First: 2024-07-22T15:41:23+00:00 · Latest: 2026-02-25T10:39:10+00:00
Comments: Accepted at the 2025 IEEE 3rd International Conference on Foundation and Large Language Models (FLLM). This version corresponds to the accepted manuscript
Abstract
Distributed deep learning in resource-constrained environments faces scalability and generalization challenges due to large effective batch sizes and non-identically distributed client data. We introduce a server-driven sampling strategy that maintains a fixed global batch size by dynamically adjusting client-side batch sizes. This decouples the effective batch size from the number of participating devices and ensures that global batches better reflect the overall data distribution. Using standard concentration bounds, we establish tighter deviation guarantees compared to existing approaches. Empirical results on a benchmark dataset confirm that the proposed method improves model accuracy, training efficiency, and convergence stability, offering a scalable solution for learning at the network edge.
Summary / 总结
Distributed deep learning in resource-constrained environments faces scalability and generalization challenges due to large effective batch sizes and non-identically distributed client data.
Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild
Authors: Hao Luo, Ye Wang, Wanpeng Zhang, Haoqi Yuan, Yicheng Feng, Haiweng Xu, Sipeng Zheng, Zongqing Lu
First: 2026-02-25T09:46:42+00:00 · Latest: 2026-02-25T09:46:42+00:00
Comments: CVPR2026
Abstract
Despite progress, Vision-Language-Action models (VLAs) are limited by a scarcity of large-scale, diverse robot data. While human manipulation videos offer a rich alternative, existing methods are forced to choose between small, precisely-labeled datasets and vast in-the-wild footage with unreliable hand tracking labels. We present JALA, a pretraining framework that learns Jointly-Aligned Latent Actions. JALA bypasses full visual dynamic reconstruction, instead learns a predictive action embedding aligned with both inverse dynamics and real actions. This yields a transition-aware, behavior-centric latent space for learning from heterogeneous human data. We scale this approach with UniHand-Mix, a 7.5M video corpus (>2,000 hours) blending laboratory and in-the-wild footage. Experiments demonstrate that JALA generates more realistic hand motions in both controlled and unconstrained scenarios, significantly improving downstream robot manipulation performance in both simulation and real-world tasks. These results indicate that jointly-aligned latent actions offer a scalable pathway for VLA pretraining from human data.
Summary / 总结
Despite progress, Vision-Language-Action models (VLAs) are limited by a scarcity of large-scale, diverse robot data.
Accelerating Recommender Model ETL with a Streaming FPGA-GPU Dataflow
Authors: Yu Zhu, Wenqi Jiang, Piyumi Jasin Pathiranage, Yongjun He, Gustavo Alonso
First: 2025-01-21T10:53:17+00:00 · Latest: 2026-02-25T09:32:40+00:00
Abstract
The real-time performance of recommender models depends on the continuous integration of massive volumes of new user interaction data into training pipelines. While GPUs have scaled model training throughput, the data preprocessing stage - commonly expressed as Extract-Transform-Load (ETL) pipelines - has emerged as the dominant bottleneck. Production systems often dedicate clusters of CPU servers to support a single GPU node, leading to high operational cost. To address this issue, we present PipeRec, a hardware-accelerated ETL engine co-designed with online recommender model training. PipeRec introduces a training-aware ETL abstraction that exposes freshness, ordering, and batching semantics while compiling software-defined operators into reconfigurable FPGA dataflows and overlaps ETL with GPU training to maximize utilization under I/O constraints. To eliminate CPU bottlenecks, PipeRec implements a format-aware packer that streams training-ready batches directly into GPU memory via P2P DMA transfers, enabling zero-copy ingest and efficient GPU consumption. Our evaluation on three datasets shows that PipeRec accelerates ETL throughput by over 10x compared to CPU-based pipelines and up to 17x over state-of-the-art GPU ETL systems. When integrated with training, PipeRec maintains 64-91% GPU utilization and reduces end-to-end training time to 9.94% of the time taken by CPU-GPU pipelines.
Summary / 总结
The real-time performance of recommender models depends on the continuous integration of massive volumes of new user interaction data into training pipelines.
PD-VLA: Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
Authors: Wenxuan Song, Jiayi Chen, Pengxiang Ding, Han Zhao, Wei Zhao, Zhide Zhong, Zongyuan Ge, Zhijun Li, Donglin Wang, Jun Ma, Lujia Wang, Haoang Li
Venue: IROS 2025
First: 2025-03-04T06:12:08+00:00 · Latest: 2026-02-25T09:26:06+00:00
Comments: Accepted by IROS 2025, updated results on LIBERO
Abstract
Vision-Language-Action (VLA) models demonstrate remarkable potential for generalizable robotic manipulation. The performance of VLA models can be improved by integrating with action chunking, a critical technique for effective control. However, action chunking linearly scales up action dimensions in VLA models with increased chunking sizes. This reduces the inference efficiency. To tackle this problem, we propose PD-VLA, the first parallel decoding framework for VLA models integrated with action chunking. Our framework reformulates autoregressive decoding as a nonlinear system solved by parallel fixed-point iterations. This approach preserves model performance with mathematical guarantees while significantly improving decoding speed. In addition, it enables training-free acceleration without architectural changes, as well as seamless synergy with existing acceleration techniques. Extensive simulations validate that our PD-VLA maintains competitive success rates while achieving 2.52 times execution frequency on manipulators (with 7 degrees of freedom) compared with the fundamental VLA model. Furthermore, we experimentally identify the most effective settings for acceleration. Finally, real-world experiments validate its high applicability across different tasks.
Summary / 总结
Vision-Language-Action (VLA) models demonstrate remarkable potential for generalizable robotic manipulation.
Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
Authors: Huilin Xu, Zhuoyang Liu, Yixiang Luomei, Feng Xu
First: 2025-12-09T14:25:24+00:00 · Latest: 2026-02-25T09:25:38+00:00
Comments: Under Review, 15 pages, 11 figures
Abstract
Aerial Vision-and-Language Navigation (VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and navigate complex urban environments using onboard visual observation. This task holds promise for real-world applications such as low-altitude inspection, search-and-rescue, and autonomous aerial delivery. Existing methods often rely on panoramic images, depth inputs, or odometry to support spatial reasoning and action planning. These requirements increase system cost and integration complexity, thus hindering practical deployment for lightweight UAVs. We present a unified aerial VLN framework that operates solely on egocentric monocular RGB observations and natural language instructions. The model formulates navigation as a next-token prediction problem, jointly optimizing spatial perception, trajectory reasoning, and action prediction through prompt-guided multi-task learning. Moreover, we propose a keyframe selection strategy to reduce visual redundancy by retaining semantically informative frames, along with an action merging and label reweighting mechanism that mitigates long-tailed supervision imbalance and facilitates stable multi-task co-training. Extensive experiments on the AerialVLN and OpenFly benchmark validate the effectiveness of our method. Under the challenging monocular RGB-only setting, our model achieves strong results across both seen and unseen environments. It significantly outperforms existing RGB-only baselines and narrows the performance gap with state-of-the-art panoramic RGB-D counterparts. Comprehensive ablation studies further demonstrate the contribution of our task design and architectural choices.
Summary / 总结
Aerial Vision-and-Language Navigation (VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and navigate complex urban environments using onboard visual observation.
Primary-Fine Decoupling for Action Generation in Robotic Imitation
Authors: Xiaohan Lei, Min Wang, Wengang Zhou, Xingyu Lu, Houqiang Li
Venue: ICLR
First: 2026-02-25T08:36:45+00:00 · Latest: 2026-02-25T08:36:45+00:00
Comments: The Fourteenth International Conference on Learning Representations (ICLR), 2026
Abstract
Multi-modal distribution in robotic manipulation action sequences poses critical challenges for imitation learning. To this end, existing approaches often model the action space as either a discrete set of tokens or a continuous, latent-variable distribution. However, both approaches present trade-offs: some methods discretize actions into tokens and therefore lose fine-grained action variations, while others generate continuous actions in a single stage tend to produce unstable mode transitions. To address these limitations, we propose Primary-Fine Decoupling for Action Generation (PF-DAG), a two-stage framework that decouples coarse action consistency from fine-grained variations. First, we compress action chunks into a small set of discrete modes, enabling a lightweight policy to select consistent coarse modes and avoid mode bouncing. Second, a mode conditioned MeanFlow policy is learned to generate high-fidelity continuous actions. Theoretically, we prove PF-DAG's two-stage design achieves a strictly lower MSE bound than single-stage generative policies. Empirically, PF-DAG outperforms state-of-the-art baselines across 56 tasks from Adroit, DexArt, and MetaWorld benchmarks. It further generalizes to real-world tactile dexterous manipulation tasks. Our work demonstrates that explicit mode-level decoupling enables both robust multi-modal modeling and reactive closed-loop control for robotic manipulation.
Summary / 总结
Multi-modal distribution in robotic manipulation action sequences poses critical challenges for imitation learning.
Self-Correcting VLA: Online Action Refinement via Sparse World Imagination
Authors: Chenyv Liu, Wentao Tan, Lei Zhu, Fengling Li, Jingjing Li, Guoli Yang, Heng Tao Shen
First: 2026-02-25T06:58:06+00:00 · Latest: 2026-02-25T06:58:06+00:00
Abstract
Standard vision-language-action (VLA) models rely on fitting statistical data priors, limiting their robust understanding of underlying physical dynamics. Reinforcement learning enhances physical grounding through exploration yet typically relies on external reward signals that remain isolated from the agent's internal states. World action models have emerged as a promising paradigm that integrates imagination and control to enable predictive planning. However, they rely on implicit context modeling, lacking explicit mechanisms for self-improvement. To solve these problems, we propose Self-Correcting VLA (SC-VLA), which achieve self-improvement by intrinsically guiding action refinement through sparse imagination. We first design sparse world imagination by integrating auxiliary predictive heads to forecast current task progress and future trajectory trends, thereby constraining the policy to encode short-term physical evolution. Then we introduce the online action refinement module to reshape progress-dependent dense rewards, adjusting trajectory orientation based on the predicted sparse future states. Evaluations on challenging robot manipulation tasks from simulation benchmarks and real-world settings demonstrate that SC-VLA achieve state-of-the-art performance, yielding the highest task throughput with 16% fewer steps and a 9% higher success rate than the best-performing baselines, alongside a 14% gain in real-world experiments. Code is available at https://github.com/Kisaragi0/SC-VLA.
Summary / 总结
Standard vision-language-action (VLA) models rely on fitting statistical data priors, limiting their robust understanding of underlying physical dynamics.
Tacmap: Bridging the Tactile Sim-to-Real Gap via Geometry-Consistent Penetration Depth Map
Authors: Lei Su, Zhijie Peng, Renyuan Ren, Shengping Mao, Juan Du, Kaifeng Zhang, Xuezhou Zhu
First: 2026-02-25T06:40:59+00:00 · Latest: 2026-02-25T06:40:59+00:00
Comments: 8 pages
Abstract
Vision-Based Tactile Sensors (VBTS) are essential for achieving dexterous robotic manipulation, yet the tactile sim-to-real gap remains a fundamental bottleneck. Current tactile simulations suffer from a persistent dilemma: simplified geometric projections lack physical authenticity, while high-fidelity Finite Element Methods (FEM) are too computationally prohibitive for large-scale reinforcement learning. In this work, we present Tacmap, a high-fidelity, computationally efficient tactile simulation framework anchored in volumetric penetration depth. Our key insight is to bridge the tactile sim-to-real gap by unifying both domains through a shared deform map representation. Specifically, we compute 3D intersection volumes as depth maps in simulation, while in the real world, we employ an automated data-collection rig to learn a robust mapping from raw tactile images to ground-truth depth maps. By aligning simulation and real-world in this unified geometric space, Tacmap minimizes domain shift while maintaining physical consistency. Quantitative evaluations across diverse contact scenarios demonstrate that Tacmap's deform maps closely mirror real-world measurements. Moreover, we validate the utility of Tacmap through an in-hand rotation task, where a policy trained exclusively in simulation achieves zero-shot transfer to a physical robot.
Summary / 总结
Vision-Based Tactile Sensors (VBTS) are essential for achieving dexterous robotic manipulation, yet the tactile sim-to-real gap remains a fundamental bottleneck.
ADM-DP: Adaptive Dynamic Modality Diffusion Policy through Vision-Tactile-Graph Fusion for Multi-Agent Manipulation
Authors: Enyi Wang, Wen Fan, Dandan Zhang
Venue: ICRA 2026
First: 2026-02-25T06:35:19+00:00 · Latest: 2026-02-25T06:35:19+00:00
Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA 2026)
Abstract
Multi-agent robotic manipulation remains challenging due to the combined demands of coordination, grasp stability, and collision avoidance in shared workspaces. To address these challenges, we propose the Adaptive Dynamic Modality Diffusion Policy (ADM-DP), a framework that integrates vision, tactile, and graph-based (multi-agent pose) modalities for coordinated control. ADM-DP introduces four key innovations. First, an enhanced visual encoder merges RGB and point-cloud features via Feature-wise Linear Modulation (FiLM) modulation to enrich perception. Second, a tactile-guided grasping strategy uses Force-Sensitive Resistor (FSR) feedback to detect insufficient contact and trigger corrective grasp refinement, improving grasp stability. Third, a graph-based collision encoder leverages shared tool center point (TCP) positions of multiple agents as structured kinematic context to maintain spatial awareness and reduce inter-agent interference. Fourth, an Adaptive Modality Attention Mechanism (AMAM) dynamically re-weights modalities according to task context, enabling flexible fusion. For scalability and modularity, a decoupled training paradigm is employed in which agents learn independent policies while sharing spatial information. This maintains low interdependence between agents while retaining collective awareness. Across seven multi-agent tasks, ADM-DP achieves 12-25% performance gains over state-of-the-art baselines. Ablation studies show the greatest improvements in tasks requiring multiple sensory modalities, validating our adaptive fusion strategy and demonstrating its robustness for diverse manipulation scenarios.
Summary / 总结
Multi-agent robotic manipulation remains challenging due to the combined demands of coordination, grasp stability, and collision avoidance in shared workspaces.
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
Authors: Shuang Zeng, Dekang Qi, Xinyuan Chang, Feng Xiong, Shichao Xie, Xiaolong Wu, Shiyi Liang, Mu Xu, Xing Wei, Ning Guo
Venue: ICLR 2026
First: 2025-09-26T16:29:37+00:00 · Latest: 2026-02-25T06:06:37+00:00
Comments: Accepted to ICLR 2026. Project page: https://miv-xjtu.github.io/JanusVLN.github.io/
Abstract
Vision-and-Language Navigation requires an embodied agent to navigate through unseen environments, guided by natural language instructions and a continuous video stream. Recent advances in VLN have been driven by the powerful semantic understanding of Multimodal Large Language Models. However, these methods typically rely on explicit semantic memory, such as building textual cognitive maps or storing historical visual frames. This type of method suffers from spatial information loss, computational redundancy, and memory bloat, which impede efficient navigation. Inspired by the implicit scene representation in human navigation, analogous to the left brain's semantic understanding and the right brain's spatial cognition, we propose JanusVLN, a novel VLN framework featuring a dual implicit neural memory that models spatial-geometric and visual-semantic memory as separate, compact, and fixed-size neural representations. This framework first extends the MLLM to incorporate 3D prior knowledge from the spatial-geometric encoder, thereby enhancing the spatial reasoning capabilities of models based solely on RGB input. Then, the historical key-value caches from the spatial-geometric and visual-semantic encoders are constructed into a dual implicit memory. By retaining only the KVs of tokens in the initial and sliding window, redundant computation is avoided, enabling efficient incremental updates. Extensive experiments demonstrate that JanusVLN outperforms over 20 recent methods to achieve SOTA performance. For example, the success rate improves by 10.5-35.5 compared to methods using multiple data types as input and by 3.6-10.8 compared to methods using more RGB training data. This indicates that the proposed dual implicit neural memory, as a novel paradigm, explores promising new directions for future VLN research. Ours project page: https://miv-xjtu.github.io/JanusVLN.github.io/.
Summary / 总结
Vision-and-Language Navigation requires an embodied agent to navigate through unseen environments, guided by natural language instructions and a continuous video stream.
LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies
Authors: Yue Yang, Shuo Cheng, Yu Fang, Homanga Bharadhwaj, Mingyu Ding, Gedas Bertasius, Daniel Szafir
First: 2026-02-25T03:33:39+00:00 · Latest: 2026-02-25T03:33:39+00:00
Abstract
General-purpose robots must master long-horizon manipulation, defined as tasks involving multiple kinematic structure changes (e.g., attaching or detaching objects) in unstructured environments. While Vision-Language-Action (VLA) models offer the potential to master diverse atomic skills, they struggle with the combinatorial complexity of sequencing them and are prone to cascading failures due to environmental sensitivity. To address these challenges, we propose LiLo-VLA (Linked Local VLA), a modular framework capable of zero-shot generalization to novel long-horizon tasks without ever being trained on them. Our approach decouples transport from interaction: a Reaching Module handles global motion, while an Interaction Module employs an object-centric VLA to process isolated objects of interest, ensuring robustness against irrelevant visual features and invariance to spatial configurations. Crucially, this modularity facilitates robust failure recovery through dynamic replanning and skill reuse, effectively mitigating the cascading errors common in end-to-end approaches. We introduce a 21-task simulation benchmark consisting of two challenging suites: LIBERO-Long++ and Ultra-Long. In these simulations, LiLo-VLA achieves a 69% average success rate, outperforming Pi0.5 by 41% and OpenVLA-OFT by 67%. Furthermore, real-world evaluations across 8 long-horizon tasks demonstrate an average success rate of 85%. Project page: https://yy-gx.github.io/LiLo-VLA/.
Summary / 总结
General-purpose robots must master long-horizon manipulation, defined as tasks involving multiple kinematic structure changes (e.g., attaching or detaching objects) in unstructured environments.
EO-1: An Open Unified Embodied Foundation Model for General Robot Control
Authors: Delin Qu, Haoming Song, Qizhi Chen, Zhaoqing Chen, Xianqiang Gao, Dong Wang, Xinyi Ye, Qi Lv, Modi Shi, Guanghui Ren, Cheng Ruan, Maoqing Yao, Haoran Yang, Jiacheng Bao, Bin Zhao, Xuelong Li
First: 2025-08-28T17:26:15+00:00 · Latest: 2026-02-25T03:30:53+00:00
Abstract
The human ability to seamlessly perform multimodal reasoning and physical interaction in the open world is a core goal for general purpose embodied intelligent systems. Recent vision-language-action (VLA) models, which are co-trained on large-scale robot and visual-text data, have demonstrated notable progress in general robot control. However, they still fail to achieve human-level flexibility in interleaved reasoning and interaction. In this work, we introduce EO-Robotics, consists of EO-1 model and EO-Data1.5M dataset. EO-1 is a unified embodied foundation model that achieves superior performance in multimodal embodied reasoning and robot control through interleaved vision-text-action pre-training. The development of EO-1 is based on two key pillars: (i) a unified architecture that processes multimodal inputs indiscriminately (image, text, video, and action), and (ii) a massive, high-quality multimodal embodied reasoning dataset, EO-Data1.5M, which contains over 1.5 million samples with emphasis on interleaved vision-text-action comprehension. EO-1 is trained through synergies between auto-regressive decoding and flow matching denoising on EO-Data1.5M, enabling seamless robot action generation and multimodal embodied reasoning. Extensive experiments demonstrate the effectiveness of interleaved vision-text-action learning for open-world understanding and generalization, validated through a variety of long-horizon, dexterous manipulation tasks across multiple embodiments. This paper details the architecture of EO-1, the data construction strategy of EO-Data1.5M, and the training methodology, offering valuable insights for developing advanced embodied foundation models. Project Page: https://eo-robotics.ai/eo-1.
Summary / 总结
The human ability to seamlessly perform multimodal reasoning and physical interaction in the open world is a core goal for general purpose embodied intelligent systems.
Constructive Vector Fields for Path Following in Fully-Actuated Systems on Matrix Lie Groups
Authors: Felipe Bartelt, Vinicius M. Gonçalves, Luciano C. A. Pimenta
First: 2026-02-25T00:06:08+00:00 · Latest: 2026-02-25T00:06:08+00:00
Abstract
This paper presents a novel vector field strategy for controlling fully-actuated systems on connected matrix Lie groups, ensuring convergence to and traversal along a curve defined on the group. Our approach generalizes our previous work (Rezende et al., 2022) and reduces to it when considering the Lie group of translations in Euclidean space. Since the proofs in Rezende et al. (2022) rely on key properties such as the orthogonality between the convergent and traversal components, we extend these results by leveraging Lie group properties. These properties also allow the control input to be non-redundant, meaning it matches the dimension of the Lie group, rather than the potentially larger dimension of the space in which the group is embedded. This can lead to more practical control inputs in certain scenarios. A particularly notable application of our strategy is in controlling systems on SE(3) -- in this case, the non-redundant input corresponds to the object's mechanical twist -- making it well-suited for controlling objects that can move and rotate freely, such as omnidirectional drones. In this case, we provide an efficient algorithm to compute the vector field. We experimentally validate the proposed method using a robotic manipulator to demonstrate its effectiveness.
Summary / 总结
This paper presents a novel vector field strategy for controlling fully-actuated systems on connected matrix Lie groups, ensuring convergence to and traversal along a curve defined on the group.
VLA Knows Its Limits
Authors: Haoxuan Wang, Gengyu Zhang, Yan Yan, Ramana Rao Kompella, Gaowen Liu
First: 2026-02-24T23:48:48+00:00 · Latest: 2026-02-24T23:48:48+00:00
Comments: Project page at https://hatchetproject.github.io/autohorizon/
Abstract
Action chunking has recently emerged as a standard practice in flow-based Vision-Language-Action (VLA) models. However, the effect and choice of the execution horizon - the number of actions to be executed from each predicted chunk - remains underexplored. In this work, we first show that varying the execution horizon leads to substantial performance deviations, with performance initially improving and then declining as the horizon increases. To uncover the reasons, we analyze the cross- and self-attention weights in flow-based VLAs and reveal two key phenomena: (i) intra-chunk actions attend invariantly to vision-language tokens, limiting adaptability to environmental changes; and (ii) the initial and terminal action tokens serve as stable anchors, forming latent centers around which intermediate actions are organized. Motivated by these insights, we interpret action self-attention weights as a proxy for the model's predictive limit and propose AutoHorizon, the first test-time method that dynamically estimates the execution horizon for each predicted action chunk to adapt to changing perceptual conditions. Across simulated and real-world robotic manipulation tasks, AutoHorizon is performant, incurs negligible computational overhead, and generalizes across diverse tasks and flow-based models.
Summary / 总结
Action chunking has recently emerged as a standard practice in flow-based Vision-Language-Action (VLA) models.
Provably Safe Generative Sampling with Constricting Barrier Functions
Authors: Darshan Gadginmath, Ahmed Allibhoy, Fabio Pasqualetti
First: 2026-02-24T23:06:58+00:00 · Latest: 2026-02-24T23:06:58+00:00
Comments: 25 pages, 7 figures
Abstract
Flow-based generative models, such as diffusion models and flow matching models, have achieved remarkable success in learning complex data distributions. However, a critical gap remains for their deployment in safety-critical domains: the lack of formal guarantees that generated samples will satisfy hard constraints. We address this by proposing a safety filtering framework that acts as an online shield for any pre-trained generative model. Our key insight is to cooperate with the generative process rather than override it. We define a constricting safety tube that is relaxed at the initial noise distribution and progressively tightens to the target safe set at the final data distribution, mirroring the coarse-to-fine structure of the generative process itself. By characterizing this tube via Control Barrier Functions (CBFs), we synthesize a feedback control input through a convex Quadratic Program (QP) at each sampling step. As the tube is loosest when noise is high and intervention is cheapest in terms of control energy, most constraint enforcement occurs when it least disrupts the model's learned structure. We prove that this mechanism guarantees safe sampling while minimizing the distributional shift from the original model at each sampling step, as quantified by the KL divergence. Our framework applies to any pre-trained flow-based generative scheme requiring no retraining or architectural modifications. We validate the approach across constrained image generation, physically-consistent trajectory sampling, and safe robotic manipulation policies, achieving 100% constraint satisfaction while preserving semantic fidelity.
Summary / 总结
Flow-based generative models, such as diffusion models and flow matching models, have achieved remarkable success in learning complex data distributions.
Active Tactile Exploration for Rigid Body Pose and Shape Estimation
Authors: Ethan K. Gordon, Bruke Baraki, Hien Bui, Michael Posa
Venue: ICRA 2026
First: 2025-10-15T14:24:57+00:00 · Latest: 2026-02-24T22:21:28+00:00
Comments: Presented at ICRA 2026; 8 pages, 6 figures
Abstract
General robot manipulation requires the handling of previously unseen objects. Learning a physically accurate model at test time can provide significant benefits in data efficiency, predictability, and reuse between tasks. Tactile sensing can compliment vision with its robustness to occlusion, but its temporal sparsity necessitates careful online exploration to maintain data efficiency. Direct contact can also cause an unrestrained object to move, requiring both shape and location estimation. In this work, we propose a learning and exploration framework that uses only tactile data to simultaneously determine the shape and location of rigid objects with minimal robot motion. We build on recent advances in contact-rich system identification to formulate a loss function that penalizes physical constraint violation without introducing the numerical stiffness inherent in rigid-body contact. Optimizing this loss, we can learn cuboid and convex polyhedral geometries with less than 10s of randomly collected data after first contact. Our exploration scheme seeks to maximize Expected Information Gain and results in significantly faster learning in both simulated and real-robot experiments. More information can be found at https://dairlab.github.io/activetactile
Summary / 总结
General robot manipulation requires the handling of previously unseen objects.
NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
Authors: Ishaan Rawal, Shubh Gupta, Yihan Hu, Wei Zhan
Venue: CVPR 2026
First: 2026-02-24T18:17:21+00:00 · Latest: 2026-02-24T18:17:21+00:00
Comments: Accepted to CVPR 2026
Abstract
Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving). Compared to existing VLAs, \modelname achieves competitive performance while being fine-tuned on $<$60\% of the data and no reasoning annotations, resulting in 3$\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \modelname overcomes this by incorporating Dr.~GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, \modelname achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems.
Summary / 总结
Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures.
ActionReasoning: Robot Action Reasoning in 3D Space with LLM for Robotic Brick Stacking
Authors: Guangming Wang, Qizhen Ying, Yixiong Jing, Olaf Wysocki, Brian Sheil
First: 2026-02-24T18:07:06+00:00 · Latest: 2026-02-24T18:07:06+00:00
Comments: 8 pages, 5 figures, accepted by the 2026 IEEE International Conference on Robotics and Automation
Abstract
Classical robotic systems typically rely on custom planners designed for constrained environments. While effective in restricted settings, these systems lack generalization capabilities, limiting the scalability of embodied AI and general-purpose robots. Recent data-driven Vision-Language-Action (VLA) approaches aim to learn policies from large-scale simulation and real-world data. However, the continuous action space of the physical world significantly exceeds the representational capacity of linguistic tokens, making it unclear if scaling data alone can yield general robotic intelligence. To address this gap, we propose ActionReasoning, an LLM-driven framework that performs explicit action reasoning to produce physics-consistent, prior-guided decisions for robotic manipulation. ActionReasoning leverages the physical priors and real-world knowledge already encoded in Large Language Models (LLMs) and structures them within a multi-agent architecture. We instantiate this framework on a tractable case study of brick stacking, where the environment states are assumed to be already accurately measured. The environmental states are then serialized and passed to a multi-agent LLM framework that generates physics-aware action plans. The experiments demonstrate that the proposed multi-agent LLM framework enables stable brick placement while shifting effort from low-level domain-specific coding to high-level tool invocation and prompting, highlighting its potential for broader generalization. This work introduces a promising approach to bridging perception and execution in robotic manipulation by integrating physical reasoning with LLMs.
Summary / 总结
Classical robotic systems typically rely on custom planners designed for constrained environments.
HALO: A Unified Vision-Language-Action Model for Embodied Multimodal Chain-of-Thought Reasoning
Authors: Quanxin Shou, Fangqi Zhu, Shawn Chen, Puxin Yan, Zhengyang Yan, Yikun Miao, Xiaoyi Pang, Zicong Hong, Ruikai Shi, Hao Huang, Jie Zhang, Song Guo
First: 2026-02-24T18:04:31+00:00 · Latest: 2026-02-24T18:04:31+00:00
Abstract
Vision-Language-Action (VLA) models have shown strong performance in robotic manipulation, but often struggle in long-horizon or out-of-distribution scenarios due to the lack of explicit mechanisms for multimodal reasoning and anticipating how the world will evolve under action. Recent works introduce textual chain-of-thought or visual subgoal prediction within VLA models to reason, but still fail to offer a unified human-like reasoning framework for joint textual reasoning, visual foresight, and action prediction. To this end, we propose HALO, a unified VLA model that enables embodied multimodal chain-of-thought (EM-CoT) reasoning through a sequential process of textual task reasoning, visual subgoal prediction for fine-grained guidance, and EM-CoT-augmented action prediction. We instantiate HALO with a Mixture-of-Transformers (MoT) architecture that decouples semantic reasoning, visual foresight, and action prediction into specialized experts while allowing seamless cross-expert collaboration. To enable HALO learning at scale, we introduce an automated pipeline to synthesize EM-CoT training data along with a carefully crafted training recipe. Extensive experiments demonstrate that: (1) HALO achieves superior performance in both simulated and real-world environments, surpassing baseline policy pi_0 by 34.1% on RoboTwin benchmark; (2) all proposed components of the training recipe and EM-CoT design help improve task success rate; and (3) HALO exhibits strong generalization capabilities under aggressive unseen environmental randomization with our proposed EM-CoT reasoning.
Summary / 总结
Vision-Language-Action (VLA) models have shown strong performance in robotic manipulation, but often struggle in long-horizon or out-of-distribution scenarios due to the lack of explicit mechanisms for multimodal reasoning and anticipating how the world will evolve under action.
History
20260226_1100 20260226_0725 20260225_2020 20260225_0404 20260224_0406 20260223_0338 20260222_0339 20260221_0345 20260220_0348 20260219_0358 20260218_0358 20260217_0343 20260216_0339 20260215_0338 20260213_0401 20260212_0404 20260210_0409 20260208_0339 20260207_0349 20260206_0347 20260205_0346 20260204_0354 20260202_0337 20260201_0333 20260131_0345 20260130_0341 20260129_0344 20260128_0341 20260127_0338 20260126_0330 20260125_0329 20260124_0337 20260123_0337 20260122_0343 20260121_0424 20260119_0329 20260118_0327 20260117_0332 20260116_0339 20260115_0334 20260114_0333 20260113_0334 20260112_0331 20260111_0329 20260110_0333 20260109_0334 20260108_0335 20260107_0330 20260106_0336 20260105_0328 20260104_0328 20260103_0325 20260102_0339 20260101_0329 20251231_0333 20251230_0332 20251229_0329 20251228_0332 20251227_0329 20251226_0330 20251225_0329 20251224_0331 20251223_0332 20251222_0328 20251221_0329 20251220_0330 20251219_0330 20251218_0345 20251217_0332 20251216_0333 20251215_0333 20251214_0327 20251212_0333 20251211_0331 20251210_0332 20251209_0331 20251208_0328 20251207_0327 20251206_0330 20251205_0331 20251204_0331 20251203_0333 20251202_0335 20251201_0328 20251130_0327 20251129_0328 20251128_0327 20251127_0327 20251126_0329 20251125_0327 20251124_0327 20251123_0326 20251122_0328 20251121_0328 20251120_0329 20251119_0328 20251118_0328 20251117_0326 20251116_0325 20251115_0327 20251114_0328 20251113_0330 20251112_0329 20251111_0328 20251110_0325 20251109_0326 20251108_0328 20251107_0328 20251106_0329 20251105_0326 20251104_0327 20251103_0324 20251102_0326 20251101_0324 20251031_0328 20251030_0330 20251029_0329 20251028_0329 20251027_0322 20251026_0327 20251025_0331 20251024_0329 20251023_0329 20251022_0330 20251021_0331 20251020_0328 20251019_0321 20251018_0327 20251017_0320 20251016_0328 20251015_0328 20251014_0323 20251011_0328 20251010_0330 20251009_0321 20251008_0343 20251007_0353 20251006_0325 20251005_0350 20251004_0352 20251003_0352 20251002_0356 20251001_0321 20250925_0335 20250924_0350 20250923_0348 20250922_0346 20250921_0345 20250920_0342 20250919_0346 20250918_0342 20250917_0336 20250916_0333 20250915_0333 20250914_0328 20250913_0322 20250912_0335 20250911_0337 20250910_0338 20250909_0341 20250908_0342 20250907_0333 20250906_0350 20250905_0319 20250904_0323 20250903_0355 20250902_0325 20250901_0355 20250831_0355 20250830_0356 20250829_0355 20250828_0333 20250827_1654 20250827_1602 20250827_1557 20250827_0320 20250826_0320 20250825_1752 20250825_1709 20250825_1652 20250825_1647 20250825_1645 20250825_1631 20250825_1606 20250825_1559 20250825_1558 20250825_1556 20250825_1531 20250825_1525 20250825_1516 20250825_1450 20250825_1444 20250825_1438 20250825_1414 20250825_1413 20250825_1410 20250825_1408 20250825_1405 20250825_1401 20250825_1355 20250825_1347 20250825_1345 20250825_1344 20250825_1343 20250825_1340 20250825_1339 20250825_1333 20250825_1323 20250825_1317 20250825_1243 20250824_0342 20250823_0343 20250823_0142 20250822_2331 20250822_2308 20250822_2258 20250822_2241 20250822_2228 20250822_2206 20250822_2147 20250822_2111 20250822_1259 20250822_1233 20250822_1229 20250822_1223 20250822_1210 20250822_1201 20250822_1111 20250822_1058 20250822_1052 20250822_1045 20250822_0657 20250822_0553