Tsinghua CS · LLM Evaluation · AIOps

Yuhe Liu

M.Eng. candidate in Computer Science and Technology at Tsinghua University. I work on LLM evaluation, AIOps benchmarks, RAG and Agent systems, and domain-specific model adaptation.

Email GitHub CV Blog

Education: Tsinghua University, B.Eng. and M.Eng. in Computer Science
Research: LLM evaluation, operations intelligence, technical support QA
Selected venues: FSE 2025, FSE 2026, TIP 2023, IJCNN 2025
Location: Shenzhen, China

Research map connecting LLM evaluation, AIOps data, RAG pipelines, agent systems, and domain applications. — Research map: evaluation framework, domain data, and deployed LLM systems.

Research Focus

Building reliable LLM systems for operations domains

AIOps evaluation

Benchmarking LLM capabilities across language ability, operations tasks, and data modalities.

Automatic benchmark generation

Using RAG and Agent pipelines to transform product documentation into high-quality domain QA data.

Domain adaptation

Fine-tuning and evaluating models for telecom, technical support, CI logs, and operations workflows.

Publications

Selected papers

Full CV

FSE 2026 · Lead Author

Eagle: A Comprehensive LLM Benchmarking Framework for Operations Capability

A benchmarking framework for evaluating operations-oriented capabilities of large language models.

FSE 2025 · Lead Author

OpsEval: A Comprehensive Benchmark Suite for Evaluating LLMs' Capability in IT Operations Domain

9,000+ AIOps QA pairs, multi-paradigm evaluation, and a public leaderboard for LLMs in IT operations.

Leaderboard

TIP 2023 · Second Author

Skeleton-CutMix: Mixing Up Skeleton with Probabilistic Bone Exchange for Supervised Domain Adaptation

A cross-domain skeleton augmentation method for improving action recognition transfer.

IJCNN 2025 · Third Author

TechSupportEval: An Automated Evaluation Framework for Technical Support Question Answering

An evaluation framework for technical support QA scenarios and model response quality.

Projects and Experience

Applied LLM evaluation and system deployment

2024.08 - 2025.08

AIOps Automatic Evaluation System Construction

Huawei, CAS, CAICT

Designed a taxonomy and three-dimensional evaluation system for AIOps foundation models. Built RAG and Agent based benchmark generation pipelines and delivered evaluation systems inside Huawei and on the CAICT benchmarking platform.

2023.09 - 2024.07

OpsEval Benchmark and Leaderboard

NetMan Lab

Curated 9,000+ AIOps QA pairs, maintained the OpsEval leaderboard, and evaluated 20+ mainstream LLMs with LLM-as-a-Judge, RAGAS, and frequency-based analysis.

2024.08 - 2024.11

LLM-based Log Analysis for CI Pipeline

Tencent TEG, Algorithm Intern

Optimized abnormal log retrieval with keyword tables and context-window matching, then designed automatic answer evaluation with keyword matching, TF-IDF, and LLM-as-a-Judge.

2023.11 - 2024.07

Fine-tuning Large Language Models for the Telecom Domain

ZTE

Prepared telecom-domain QA data and compared parameter-efficient and full fine-tuning methods across PT, SFT, and DPO training paradigms.

Technical Skills

Stack

Teaching and Service

Campus work

Teaching Assistant for Software Engineering, Department of Computer Science, 2023, 2024, and 2025.

Cluster management for NetMan Lab, including internal network accounts and CPU/VM resources.

Research intern in scene reconstruction and second author of Skeleton-CutMix.

Contact

Open to research conversations on LLM evaluation and AIOps systems.

junetheriver@gmail.com Download CV Read Blog