[OC] Das „Schiff des Theseus“-Paradoxon in der Software: Überlebende Codezeilen in Projekten wie React, Langchain und Numpy, kategorisiert nach dem ursprünglichen Commit-Jahr.
[OC] Das „Schiff des Theseus“-Paradoxon in der Software: Überlebende Codezeilen in Projekten wie React, Langchain und Numpy, kategorisiert nach dem ursprünglichen Commit-Jahr.
Tools: Python (ETL data pipeline and historical git blame extraction), GitHub Actions (automated monthly delta-processing), and React with Recharts for the interactive frontend visualization.
Context: I wanted to explore the philosophical paradox of the Ship of Theseus applied to software engineering. If every line of code in a repository is eventually rewritten, is it still the same project? This stacked area chart shows the surviving lines of code categorized by the year they were originally written. As time moves forward on the X-axis, you can see the foundational code shrinking as it gets refactored and replaced.
You can play with the interactive version and toggle between the different case studies here: [https://asifdotexe.github.io/Theseus/](https://asifdotexe.github.io/Theseus/)
How does old code disappear completely and come back later on? Is it reverted?
OldSports-- on
As a reader of philosophical texts, here are some perspectives on this topic:
* **Mereological Essentialism**: The code base loses its identity as soon as the original code structure is altered through refactoring or the deletion of the initial commit.
* **Spatiotemporal Continuity**: The code base remains identical as long as a continuous Git history and an uninterrupted development process exist within the same repository.
* **Perdurantism**: The code base is understood as a four-dimensional object consisting of the sum of all its versions and developmental stages over time.
* **Functionalism**: The code base maintains its identity through the unchanged API specification and the continued fulfillment of the defined software purpose.
* **The Fork Dilemma**: A project reconstructed from the original source code competes with the modernized main version for the status of true identity.
HommeMusical on
That sounds fascinating! I wish I could read the graphic, but dark purple on black isn’t really a very felicitous combination.
swierdo on
Hey, this might actually be pretty useful.
Langchain was the first project I was aware of that encapsulated LLMs in software, and I used it a bit. But then it got really popular and codebase blew up seemingly without much planning and design (at which point I wrote off langchain as a mess), your visual shows this explosion of code really well. It also shows that a lot of that messy code is gone now.
So I’ll have to try langchain again.
Basic_Aside_8764 on
The digital graveyard is always more crowded than people realize. It’s fascinating how libraries like React carry ghosts from a decade ago while the newer parts are just footnotes in the making.
Most builders think they’re creating something permanent, but we’re all just curators of decaying logic. This is a great reminder that the ’soul‘ of a project is in the lineage, not just the syntax. Nice work uncovering these artifacts.
FirstTasteOfRadishes on
What counts as ‚original code‘?
LouisDuret on
I like the idea, but the realization appears to be completely broken.
1. While the rainbow colors are fine, why add a strong gradient at the bottom of the chart ? We basically can’t see what is going on with the original code even by squinting. In general everything is so dark it hurts.
2. Identity mode may have been interesting in some repositories, but not those, and it is basically unreadable (only two colors, the original year blue, and all the others the same shade of orange with tiny black stroke).
3. Some of the processing appears to be just bugged ? For instance Numpy shows a major refactoring in March 2024. When hovering on the chart at that date, it says 99.7% refactored, even though the chart itself appears to be over 90% still original code from 2001.
4. The choice of repositories is very strange, only Numpy shows interesting graphs. Why not the repo of Git, Unix, NodeJS, VSCode ?
5. And React may be interesting but I suppose it is bugged ? Maybe I don’t know the history of React, but I doubt the whole code base was removed 3 times in its history and each time restored a full year later.
TheOneNeartheTop on
This is one of the more interesting things I’ve seen this year. Shows a very clear change in how things are being developed.
timbomcchoi on
WOW I never would’ve guessed that numpy of all codes changes so much…!
Don_Kino on
nice! Maybe I missed it, but can we use an already cloned repo ? (my code is not on github)
Extras on
This is cool, I wonder what the charts would look like for some older code. I’d love to see like what OpenStack’s nova project looks like or something like apache.
Leave A Reply
Du musst angemeldet sein, um einen Kommentar abzugeben.
14 Kommentare
Source: Git commit history and git blame data extracted directly from the official GitHub repositories of major open-source projects including [React](https://github.com/facebook/react), [NumPy](https://github.com/numpy/numpy), [LangChain](https://github.com/langchain-ai/langchain), [Claude Code](https://github.com/anthropics/claude-code) and [Zed](https://github.com/zed-industries/zed)
Tools: Python (ETL data pipeline and historical git blame extraction), GitHub Actions (automated monthly delta-processing), and React with Recharts for the interactive frontend visualization.
Context: I wanted to explore the philosophical paradox of the Ship of Theseus applied to software engineering. If every line of code in a repository is eventually rewritten, is it still the same project? This stacked area chart shows the surviving lines of code categorized by the year they were originally written. As time moves forward on the X-axis, you can see the foundational code shrinking as it gets refactored and replaced.
You can play with the interactive version and toggle between the different case studies here: [https://asifdotexe.github.io/Theseus/](https://asifdotexe.github.io/Theseus/)
The source code for the automated data engine is here: [https://github.com/Asifdotexe/Theseus](https://github.com/Asifdotexe/Theseus)
Why does reacts code basis collapse multiple times? What happened in 2019 and 2023-2024?
You are btw missing a space when there is not „alarm“ text box showing up
https://preview.redd.it/jfd42a0qc4zg1.png?width=378&format=png&auto=webp&s=0f1b80ef28ca329f18116cc61f30afaba5e012a9
How does old code disappear completely and come back later on? Is it reverted?
As a reader of philosophical texts, here are some perspectives on this topic:
* **Mereological Essentialism**: The code base loses its identity as soon as the original code structure is altered through refactoring or the deletion of the initial commit.
* **Spatiotemporal Continuity**: The code base remains identical as long as a continuous Git history and an uninterrupted development process exist within the same repository.
* **Perdurantism**: The code base is understood as a four-dimensional object consisting of the sum of all its versions and developmental stages over time.
* **Functionalism**: The code base maintains its identity through the unchanged API specification and the continued fulfillment of the defined software purpose.
* **The Fork Dilemma**: A project reconstructed from the original source code competes with the modernized main version for the status of true identity.
That sounds fascinating! I wish I could read the graphic, but dark purple on black isn’t really a very felicitous combination.
Hey, this might actually be pretty useful.
Langchain was the first project I was aware of that encapsulated LLMs in software, and I used it a bit. But then it got really popular and codebase blew up seemingly without much planning and design (at which point I wrote off langchain as a mess), your visual shows this explosion of code really well. It also shows that a lot of that messy code is gone now.
So I’ll have to try langchain again.
The digital graveyard is always more crowded than people realize. It’s fascinating how libraries like React carry ghosts from a decade ago while the newer parts are just footnotes in the making.
Most builders think they’re creating something permanent, but we’re all just curators of decaying logic. This is a great reminder that the ’soul‘ of a project is in the lineage, not just the syntax. Nice work uncovering these artifacts.
What counts as ‚original code‘?
I like the idea, but the realization appears to be completely broken.
1. While the rainbow colors are fine, why add a strong gradient at the bottom of the chart ? We basically can’t see what is going on with the original code even by squinting. In general everything is so dark it hurts.
2. Identity mode may have been interesting in some repositories, but not those, and it is basically unreadable (only two colors, the original year blue, and all the others the same shade of orange with tiny black stroke).
3. Some of the processing appears to be just bugged ? For instance Numpy shows a major refactoring in March 2024. When hovering on the chart at that date, it says 99.7% refactored, even though the chart itself appears to be over 90% still original code from 2001.
4. The choice of repositories is very strange, only Numpy shows interesting graphs. Why not the repo of Git, Unix, NodeJS, VSCode ?
5. And React may be interesting but I suppose it is bugged ? Maybe I don’t know the history of React, but I doubt the whole code base was removed 3 times in its history and each time restored a full year later.
This is one of the more interesting things I’ve seen this year. Shows a very clear change in how things are being developed.
WOW I never would’ve guessed that numpy of all codes changes so much…!
nice! Maybe I missed it, but can we use an already cloned repo ? (my code is not on github)
This is cool, I wonder what the charts would look like for some older code. I’d love to see like what OpenStack’s nova project looks like or something like apache.