Rendered at 23:01:45 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
jploudre 1 days ago [-]
I do programming as a side project — Marimo has been a huge unlock for me. Part of it has been just watching the videos that are both updates about the software and also little examples of how to think about data science. Marimo also helps curate useful python stuff to try.
Starting to use AI in Marimo, I was able to both ‘learn polars’ for speed, or create a custom AnyWidget so I could make a UI I could imagine that wouldn’t work with standard UI features.
Giving a LLM more context will be fab for me. Now if I could just teach Claude that this really is the ‘graph’ and it can’t ever re-assign a variable. It’s a gotcha of Marimo vs python. Worth it as a hassle for the interactivity. But makes me feel a bit like I’m writing C and the compiler is telling I need a semicolon at the end of the line. I’ve made that error so many times…..
data-ottawa 1 days ago [-]
I started using marimo for the reactive execution, after being spoiled by Observable and Pluto.jl Being able to plug directly into Altair charts and tables was a huge boon. Then I discovered anywidget, which has been a game changer.
Now I use Claude to generate anywidgets for controls I need, and just focus on the heavy lifting with python, it's great. Being able to just have this all run in one flow with pair should make this 10x smoother.
As an example I get spreadsheets sent by clients that all have different file types, formatting, names, and business rules. I had Claude build me a widget to define a set of data-cleaning steps (merge x+y fields, split with regex, etc.). Now this task that used to take a lot of manual work and iteration is just upload a spreadsheet, preview and select my cleaning steps, run my algorithm and wait for it to come out the other side (with labelled progress bars). When it's done I get a table element and some interactive Altair charts to click on to filter and fine-tune, then I can just export the table and send it.
This task used to be done manually by a team, then I turned it into 1-2 hours with Jupyter. Marimo let me turn it into 5-15 minutes. Visually inspecting the results by a human is a requirement, so it's not completely automatable, but 15 mins turnaround every few weeks feels good enough.
Anyways, marimo rocks. The _only_ thing missing is the easy deploy for internal-users story as I cannot use molab (yet?).
manzt 1 days ago [-]
Hey, thanks and glad to hear the marimo + anywidget combo has been an unlock (I'm also the creator of anywidget). Clearly I'm biased, but custom widgets are a powerful primitive (marrying web & data ecosystems), and it's exciting to see coding tools making it even more accessible to build them out for specific or one-off tasks.
Re: deployment, we hear you & stay tuned. You can provide input here [1].
Side note: if you're curious, I have an RFC out for widget composition (widgets within widgets) [2]. Should be shipping soon.
The visual cleaning idea is really interesting. Would you mind sharing more details?
data-ottawa 23 hours ago [-]
It's nothing revolutionary.
It's essentially a table layout with a plus button at the bottom. When you click it adds a new step as a row, then you pick the operation, the input columns and output column name.
If you want to add another step you click the plus again and add another row the same way. Each row can access any table field or output field defined above it in the DAG.
Then in Python a for loop runs over the steps in order and updates the data frame in place (well, in function, returning the new one). It uses a dictionary of function mappings and resolves input fields with kwargs.
manzt 1 days ago [-]
Really glad to hear that! The graph can get complex for big notebooks and maintaining a full picture of variable dependencies across cells is a lot to ask a model to do correctly and hold in context. (It took us a little bit to get the parsing right in marimo!) With pair, it doesn't have to.
The model just "lives" in the environment, and when marimo says "you can't reuse that variable," it renames it and moves on. Hope you give pair a spin!
midnightn 22 hours ago [-]
The reactive execution model as agent memory is clever — I ran into similar tradeoffs building a multi-agent trading system where each agent needs isolated state across cycles. Ended up using a persistent store (BigQuery) rather than in-process memory, but the appeal of having the runtime itself be the memory is that you get reproducibility for free.
I think it will be very interesting to see what this enables
oegedijk 1 days ago [-]
Looks nice! Built a ipython persistent kernel that your agent can operate through cli commands which somewhat goes in a similar direction, but then not with all the Marimo niceties: https://github.com/oegedijk/agentnb
TheTaytay 1 days ago [-]
Thank you for this!
I am a big fan of Marimo and was trying to use it as my agent’s “REPL” a while back, because it’s naturally so good at describing its own current state and structure. It made me think that it would make a better state-preserving environment for the agent to work. I’m very excited to play with this.
akshayka 1 days ago [-]
Thanks for the kind words.
We've had the same thought, and are experimenting in this direction in the context of recursive language models.
Let us know if you have feedback!
t-kalinowski 1 days ago [-]
Very cool!
We’ve been exploring a similar direction too, but with a plain REPL and a much thinner tool surface. In our case, it’s basically one tool for sending input, with interrupts and restarts handled through that same path. Marimo seems to expose much richer notebook structure and notebook-manipulation semantics, which is a pretty different point in the design space.
It seems like the tradeoff is between keeping the interaction model simple and the context small, versus introducing notebook structure earlier so the model works toward an artifact at the same time it iterates and explores. Curious how you think about that balance.
I think the tradeoff is less stark than it seems. Our tool surface is also basically just "run Python", but the model may additionally use a semi-private API ("code mode") within that execution context to inspect and modify the notebook itself. So the notebook structure isn't something the model has to manage. Marimo's reactivity handles the graph automatically, and the model gets artifact construction as a side effect of exploration.
Where I'd imagine the approaches diverge more is in what the runtime gives back. In a plain REPL, the model carries state in its context. In marimo, it can offload state to the notebook and query it back: inspect variables, read cell outputs, check the dependency graph. That turns the notebook into working memory the model can use without it all living in the conversation. Over longer sessions especially, that matters.
This looks promising, especially the idea of using reactive notebooks as environments for agents.
How do you manage state consistency when multiple agents are interacting with the same notebook?
llamavore 1 days ago [-]
Looks cool. I love notebooks.
I built something similar with just plain cli agent harnesses for Jupyter a while back.
It supports codex subscriptions and pi, (used to support Claude subs, might still be okay since I didn’t modify the system prompt).
Has some bugs and needs some work but getting help and code changes inline in Jupyter is way better than copy pasta hard to select text from cells and cell output all day.
This is cool. Do you still use this? There has been ideas thrown around to add "prompt" cells to marimo that can similarly create outputs or downstream cells and the prompts are serialized to the notebook py file and part of the DAG.
llamavore 15 hours ago [-]
Yeah I use it myself when I need to quickly debug something in Jupyter because I can just pip install cleon anywhere and then do @ question to get codex involved.
I took Jeremy’s solveit course and built this in homage to the concept of AI agent within Jupyter.
Keen to collaborate with anyone if they want to take this paradigm further.
llamavore 15 hours ago [-]
I will add I’m sad to see Anthropic go anti-community on their Claude subscription usage but since Codex works in pi then using pi as the underlying harness is probably the best thing to do. I did use open source codex rust in cleon to easily bring that into python directly. Big thanks and props to OpenAI for building their coding harness in rust and doing it open source.
manzt 4 days ago [-]
One of the authors here, happy to answer questions.
Building pair has been a different kind of engineering for me. Code mode is not a versioned API. Its consumer is a model, not a program. The contract is between a runtime and something that reads docs and reasons about what it finds.
We've changed the surface several times without migrating the skill. The model picks up new instructions and discovers its capabilities within a session, and figures out the rest.
gobdovan 1 days ago [-]
You could wrap pyobject via a proxy that controls context and have AI have a go at it.
You can customise that interface however you want, have a stable interface that does things like
This way you get a general interface for AI interacting with your data, while still keeping a very fluid interface.
Built a custom kernel for notebooks with PDB and a similar interface, the trick is to also have access to the same API yourself (preferably with some extra views for humans), so you see the same mediated state the AI sees.
By 'wrap' I mean build a capability-based, effect-aware, versioned-object system on top of objects (execs and namespaces too) instead of giving models direct access. Not sure if your specific runtime constraints make this easier or harder. Does this sound like something you'd be moving towards?
manzt 1 days ago [-]
Really interesting idea! Part of the ethos here is that models are already really good at writing Python, and we want to bet on that rather than mediate around it. Python has the nice property of failing loudly (e.g., unknown keywords, type errors, missing attributes) so models can autocorrect quickly. And marimo's reactivity adds another layer of guardrails on top when it comes to managing context/state.
Anecdotally working on pair, I've found it really hard to anticipate what a model might find useful to accomplish a task, and being too prescriptive can break them out of loops where they'd otherwise self-correct. We ran into this with our original MCP approach, which framed access to marimo state as discrete tools (list_cells, read_cell, etc.). But there was a long tail of more tools we kept needing, and behind the scenes they were all just Python functions exposing marimo's state. That was the insight: just let the model write Python directly.
So generally my hesitation with a proxy layer is that it risks boxing the agent in. A mediated interface that helps today might become a constraint tomorrow as models get more capable.
gobdovan 1 days ago [-]
Yeah, I'm talking more about a wrapper over the python data model (pyobject) rather than an MCP-style API for kernel interaction. I'm not proposing you abstract interactions under a rigid proxy, but that you can use proxy objects to virtualise access to the runtime. You could still let the model believe it is calling normal python code, but in actuality, it goes via your control plane. Seeing the demo I'd imagine you already have parts of this nailed down tho.
manzt 1 days ago [-]
Ah, I think I misread your earlier comment. That's a more interesting version of the idea than what I responded to. We don't do this today, but marimo's reactivity already gives us some control plane benefits without virtualizing object access. That said, I can imagine there are many more things a proxy layer could do. Need to think on it, thanks for the clarification :)
mscolnick 1 days ago [-]
How do you teach the model to use this new API? Wouldn't they be more effective just using the polars/pandas API which is has been well trained with?
gobdovan 1 days ago [-]
Codex just picks it up. The surface is basically a guarded object model, so pandas/polars-style operations stay close to the APIs the model already knows. There's some extra-tricks but they're probably out of scope for an HN comment.
In practice, Pandas/Polars API would lower to:
proxy -> attr("iloc") -> getitem(slice(1,10,None))
BloodAndCode 1 days ago [-]
Super loved the idea about maintaining consistency!
Artifacts will make it possible to not lose the thread and reproduce results when working in a team. Love it.
If a cell happens to take a long time to compute (large dataset) — how does the agent behave? Does it wait or keep going?
manzt 1 days ago [-]
Claude Code supports running long-lived shell commands in the background dynamically. Since marimo pair’s tool (run Python) is implemented as a bash script, the same applies. Also Ctrl-C ing the script interrupts the kernel so you can cancel various long-run tasks individually that way.
bharat1010 1 days ago [-]
The idea of an agent having actual working memory inside a live notebook session rather than just firing off ephemeral scripts is genuinely clever — this feels like a much more natural way for humans and models to collaborate.
bojangleslover 1 days ago [-]
This rules. Just closed on a bunch of data science I was doing on the Medicaid dataset thanks to this. Very timely, zero bugs.
Well done Trevor and team!
danieltanfh95 1 days ago [-]
built https://github.com/danieltanfh95/replsh to pair with local python sessions without additional dependencies, allowing LLMs to directly ground their investigation and coding against local repos and environments. Now supporting docker as well, ssh support will come in the near future.
millbj92 1 days ago [-]
Genuinely cool. As a cool side-effect you could use notebooks to store your prompts and never lose a prompt again.
Starting to use AI in Marimo, I was able to both ‘learn polars’ for speed, or create a custom AnyWidget so I could make a UI I could imagine that wouldn’t work with standard UI features.
Giving a LLM more context will be fab for me. Now if I could just teach Claude that this really is the ‘graph’ and it can’t ever re-assign a variable. It’s a gotcha of Marimo vs python. Worth it as a hassle for the interactivity. But makes me feel a bit like I’m writing C and the compiler is telling I need a semicolon at the end of the line. I’ve made that error so many times…..
Now I use Claude to generate anywidgets for controls I need, and just focus on the heavy lifting with python, it's great. Being able to just have this all run in one flow with pair should make this 10x smoother.
As an example I get spreadsheets sent by clients that all have different file types, formatting, names, and business rules. I had Claude build me a widget to define a set of data-cleaning steps (merge x+y fields, split with regex, etc.). Now this task that used to take a lot of manual work and iteration is just upload a spreadsheet, preview and select my cleaning steps, run my algorithm and wait for it to come out the other side (with labelled progress bars). When it's done I get a table element and some interactive Altair charts to click on to filter and fine-tune, then I can just export the table and send it.
This task used to be done manually by a team, then I turned it into 1-2 hours with Jupyter. Marimo let me turn it into 5-15 minutes. Visually inspecting the results by a human is a requirement, so it's not completely automatable, but 15 mins turnaround every few weeks feels good enough.
Anyways, marimo rocks. The _only_ thing missing is the easy deploy for internal-users story as I cannot use molab (yet?).
Re: deployment, we hear you & stay tuned. You can provide input here [1].
Side note: if you're curious, I have an RFC out for widget composition (widgets within widgets) [2]. Should be shipping soon.
[1] https://github.com/marimo-team/marimo/issues/5963
[2] https://github.com/manzt/anywidget/pull/942
It's essentially a table layout with a plus button at the bottom. When you click it adds a new step as a row, then you pick the operation, the input columns and output column name.
If you want to add another step you click the plus again and add another row the same way. Each row can access any table field or output field defined above it in the DAG.
Then in Python a for loop runs over the steps in order and updates the data frame in place (well, in function, returning the new one). It uses a dictionary of function mappings and resolves input fields with kwargs.
The model just "lives" in the environment, and when marimo says "you can't reuse that variable," it renames it and moves on. Hope you give pair a spin!
Jeremy Howard from fast.ai/answer.ai also works on similar stuff with solveit (https://solve.it.com) and ipyai (https://github.com/AnswerDotAI/ipyai)
I think it will be very interesting to see what this enables
I am a big fan of Marimo and was trying to use it as my agent’s “REPL” a while back, because it’s naturally so good at describing its own current state and structure. It made me think that it would make a better state-preserving environment for the agent to work. I’m very excited to play with this.
We've had the same thought, and are experimenting in this direction in the context of recursive language models.
Let us know if you have feedback!
We’ve been exploring a similar direction too, but with a plain REPL and a much thinner tool surface. In our case, it’s basically one tool for sending input, with interrupts and restarts handled through that same path. Marimo seems to expose much richer notebook structure and notebook-manipulation semantics, which is a pretty different point in the design space.
It seems like the tradeoff is between keeping the interaction model simple and the context small, versus introducing notebook structure earlier so the model works toward an artifact at the same time it iterates and explores. Curious how you think about that balance.
Repo: https://github.com/posit-dev/mcp-repl
I think the tradeoff is less stark than it seems. Our tool surface is also basically just "run Python", but the model may additionally use a semi-private API ("code mode") within that execution context to inspect and modify the notebook itself. So the notebook structure isn't something the model has to manage. Marimo's reactivity handles the graph automatically, and the model gets artifact construction as a side effect of exploration.
Where I'd imagine the approaches diverge more is in what the runtime gives back. In a plain REPL, the model carries state in its context. In marimo, it can offload state to the notebook and query it back: inspect variables, read cell outputs, check the dependency graph. That turns the notebook into working memory the model can use without it all living in the conversation. Over longer sessions especially, that matters.
How do you manage state consistency when multiple agents are interacting with the same notebook?
I built something similar with just plain cli agent harnesses for Jupyter a while back.
It supports codex subscriptions and pi, (used to support Claude subs, might still be okay since I didn’t modify the system prompt).
Has some bugs and needs some work but getting help and code changes inline in Jupyter is way better than copy pasta hard to select text from cells and cell output all day.
https://github.com/madhavajay/cleon
I took Jeremy’s solveit course and built this in homage to the concept of AI agent within Jupyter.
Keen to collaborate with anyone if they want to take this paradigm further.
Building pair has been a different kind of engineering for me. Code mode is not a versioned API. Its consumer is a model, not a program. The contract is between a runtime and something that reads docs and reasons about what it finds.
We've changed the surface several times without migrating the skill. The model picks up new instructions and discovers its capabilities within a session, and figures out the rest.
proxy.describe() proxy.list_attrs() proxy.get_attr("columns")
This way you get a general interface for AI interacting with your data, while still keeping a very fluid interface.
Built a custom kernel for notebooks with PDB and a similar interface, the trick is to also have access to the same API yourself (preferably with some extra views for humans), so you see the same mediated state the AI sees.
By 'wrap' I mean build a capability-based, effect-aware, versioned-object system on top of objects (execs and namespaces too) instead of giving models direct access. Not sure if your specific runtime constraints make this easier or harder. Does this sound like something you'd be moving towards?
Anecdotally working on pair, I've found it really hard to anticipate what a model might find useful to accomplish a task, and being too prescriptive can break them out of loops where they'd otherwise self-correct. We ran into this with our original MCP approach, which framed access to marimo state as discrete tools (list_cells, read_cell, etc.). But there was a long tail of more tools we kept needing, and behind the scenes they were all just Python functions exposing marimo's state. That was the insight: just let the model write Python directly.
So generally my hesitation with a proxy layer is that it risks boxing the agent in. A mediated interface that helps today might become a constraint tomorrow as models get more capable.
In practice, Pandas/Polars API would lower to: proxy -> attr("iloc") -> getitem(slice(1,10,None))
Well done Trevor and team!