Hey! I'm Luis. I research machine learning models and the systems that train and serve them. From databases to LLMs to computational fluid dynamics.
This was not always the plan. During COVID lockdowns and my last year of high school, I co-founded Faktor30[0]. Our first real client was a bank in Hamburg that needed a system we'd never built before. Six people building software for whoever needed it. A bank. A food delivery company. The code was rarely the bottleneck. It was building a team, defining vision, and making sure everyone moved in the same direction.
I could build software but didn't understand the theory behind it. So I did my undergrad at TU Munich[1] in computer science, building the theoretical foundations you can't get from coding alone. Then I joined Snowflake's Database Connectors team[2] and later Microsoft[3] to see how systems work at enterprise scale. Building data pipelines that thousands of teams depended on showed me what breaks when you go from ten users to ten thousand. I learned to combine theory and practice at scale. But I kept noticing the people who invented the foundational ideas I was building on seemed to see something I didn't. Not just how to extend existing systems, but how to invent new ones.
I wanted to learn that. I pursued my Master's in computer science and spent two semesters studying and researching at UC Berkeley[4], fortunate to work at the Sky Computing Lab[5] with Ion Stoica[6] and Joseph E. Gonzalez[7]. Through Matei Zaharia[8] I also worked with Deepti Raghavan[9] at Stanford[10]. I went there to learn how to deconstruct systems, challenge assumptions at every layer, focus on research problems that actually matter, and keep exploring when the path forward isn't obvious. The work started with batch data analytics[11, MLSys 2025]. Everyone treated it as a model or a hardware problem. It was neither. The inefficiency was in how data flowed through the system layers. Solution? Reorder the content structure of the database tables to maximize KV cache reuse at the GPU level. No model changes. No new hardware. That insight led to vCache[12, ICLR 2026], a production-ready semantic caching system with mathematically proven error rate guarantees. Users define how often the system can be wrong. It guarantees that rate and outperforms every baseline. From there vAttention[13, ICLR 2026] and SkyLight[14] for efficient inference, ALTO[15] for compound AI orchestration, and work on overthinking[16] in agentic systems. I learned that you can't invent real solutions by optimizing one component in isolation. You need to understand the entire system.
Berkeley taught me to think across the entire system. I wanted to apply that somewhere the system didn't exist yet. I wanted to go beyond LLMs. So I joined the founding team at UniversalAGI[17] to build foundation models for physical systems engineering. Aerospace, automotive, defense, energy, and other industries where state-of-the-art simulations require PhD knowledge, take hours to weeks, and only a handful of runs inform design decisions. Month-to-year design cycles. Our models let everyone evaluate engineering designs. In seconds, not weeks. These models are not LLMs. And building reliable ones at scale is uncharted territory. To make this concrete: a single training sample is orders of magnitude larger (tens of gigabytes) than a typical LLM sequence. Standard architectures break. Standard training stacks break. There's no vLLM for inference. So we build our own. Our own model architectures. Our own pre-training, fine-tuning, and inference stack. And our own training data. Architecture and training matter. Data is the bottleneck (as almost always in ML). This feels like the era before GPT-2. A difficult problem with high impact. Building a system, across disciplines, from scratch. That is what makes it exciting.
Picks and shovels.
Feel free to reach out at luis.gasparschroeder[at-symbol]gmail.com