What docs-as-code means

75 points by remoquete 21 hours ago

I think this is a really good point in the post:

> If you don’t review, check, and merge docs the same way your org reviews, checks, and merges code, you’re not doing docs-as-code — you’re doing docs-as-bore.

While some WYSIWYG cloud-based docs platforms make it easier to make changes, that's not necessarily what you want. Docs are a critical component of how your users perceive your product - you want to have checks that it meets certain quality and accuracy standards. Just like your code.

And if you're an engineering lead company, you probably want your docs updates to be coordinated with your product releases. Git is just the logical place to put your docs in that case.

I've even created a company specifically to help with this workflow: https://www.doctave.com

Also, lots of comments here seem to be thinking of docstrings and other in-code documentation. I think that's really a different category that has a different set of goals and issues. This post is specifically about customer-facing documentation.

robertlagrant 12 hours ago

> And if you're an engineering lead company,
Nit: engineering-led

euroderf 14 hours ago

Having worked both as a developer and as a technical communicator (for software), I'm thinking that low friction for developers is paramount, and that therefore the way to get developers to put some effort into it is to have documents both (a) written in Markdown (or adoc or rST or typst) and (b) co-located with code under version control. Change the code, change the docs, no screwing around, BUT a quick & simple brain dump can suffice because of [see next paragraph].

To whit: I have yet to hear of a documentation system that provides fully bidirectional updates between such documents-as-code and edits made further downstream along the documentation production pipeline. That is to say, when a TC person or a reviewer makes edits to content, these changes should propagate back to the docs-as-code material.

Then everyone benefits, including senior devs whose scarce time is optimised by having TC people expand and polish their hasty scribblings, and junior devs who have well-maintained documentation at-hand in-place.

My 0.02€, YMMV. Maybe too niche.

smokel 14 hours ago

Documentation near code is a good idea, but unfortunately it covers only part of the problem.
There is a big portion of documentation that should be available to "other persons", such as architects or project managers, who may not want to visit the codebase.
Another challenge is that for these people, diagrams are typically more useful than text. This still requires some manual effort which is difficult to achieve with Markdown.
- mfuzzey 12 hours ago
  
  Agreed with your point of accessibility to people who don't live in the code.
  But that doesn't preculude documentation near code nor Markdown. It just means that you need a CI job to publish the doc stored in the git repo as (for example) HTML or PDF.
  For diagrams stuff like PlantUML are great, edit as text, publish as images.
- ElevenLathe 8 hours ago
  
  I recently started keeping docs in Markdown but with a CD job to render them and publish to Confluence. We only have internal users but many of them are non-technical. This seems to be a sweet spot -- easy for us to update, but also easy for users to find.
  It also opens up the possibility of generating the docs themselves from any data that are available at build time. For example, we have a page with a pretty complex Graphviz visualization that gets built from some of the data files we ship with the product. When the data changes, so do these diagrams. They literally can't get out of sync without us noticing (the build breaks). I see more opportunities for this kind of thing all the time now that I'm looking.
  
  hinkley 7 hours ago
  
  I don’t know if it’s built in or a plugin but there is a way to embed a markdown document in version control into a confluence page. I don’t recall if it was any markdown page or had to be in bitbucket.
  I used a simple intro and then embedded the docs from version control for a couple chunks of our architecture. Helped a lot.
- spockz 9 hours ago
  
  We now use MermaidJS to render diagrams from text source in our documentation and that works pretty well!
- euroderf 13 hours ago
  
  Agreed on all points.
  Regarding para 2, I am assuming that there is something like a CMS in place that can pluck docfiles from version control and massage them and insert them into an outline/ToC. (And then propagate changes back to version control.)
  Regarding para 3, there's now many GUI tools for working with Mermaid et al. But are any of them properly integrated into documentation systems ?
- robertlagrant 12 hours ago
  
  Architects should definitely be visiting the codebase. Why would project managers be updating docs? I'd have thought stuff like docs translations done by translators would be a better candidate for non-code editing.
Nathanba 6 hours ago

the second you put docs into version control next to code it's no longer low friction enough. Suddenly you have peer review for docs change is far too much work.

sshine 20 hours ago

The problem with "documentation" in commercial code is that nobody reads it.

Any "documentation" that isn't embedded in code will never be opened again.

Any "documentation" in the form of doc-strings and comments will, over time, lie.

This is because competent programmers cannot agree on whether to even comment.

When any large percentage of competent programmers do not comment, it is better to not rely on comments.

Here's another pitch:

  Code-as-Docs

simonw 18 hours ago

If people aren't reading your documentation it means they don't trust your documentation.
The solution is to build that trust, by building a culture of active documentation maintenance.
My favorite trick for that is to keep the docs in the same repo as the code and actively enforce relevant documentation updates as part of the code review process.
Once developers learn that new code cannot land on main without accompany documentation updates they learn to trust that documentation pretty fast.
- sshine 6 hours ago
  
  > My favorite trick is to [...] actively enforce [...]
  If active enforcement were an option, and it led to success, I don't imagine there would be a problem.
  My favorite trick is to tell people really nicely they should write comments (preferrably doc comments on internally exported interfaces).
  Yet, they don't. And so we're back to the dilemma:
  If a significant percentage of competent programmers choose to not comment their code, or update the comments that are in the code, the comments will lie over time. And so it is probably better to not include them.
  We haven't even reached the point in the conversation where we ask "But why is it you, dear competent programmers with decades of experience, think we should continue to minimize the amount of comments with good conscience?" or ask "Why is it, dear project managers, that we tolerate a total lack of communication through any other means than executable code?" And the answer is probably: Because a lot of autists produce a lot of valuable code, and they don't like to talk if they can avoid it, and we value their work and relay what they made to the wordy-word people.
  > If people aren't reading your documentation it means they don't trust your documentation.
  I don't trust your documentation unless there's alignment among developers. Alignment is a luxury you don't get in legacy shops.
  
  simonw 3 hours ago
  
  If people don't add documentation to their PR, I add that documentation to the PR for them (under my own name as a separate commit), then land the code.
  Do that often enough and people get the idea that documentation isn't optional.
btbuildem 10 hours ago

I read the article as referring to documentation for end users -- not internal documentation / comments written by developers who thought they knew what the code they wrote was doing.
j-krieger 19 hours ago

This is pretty much why Rust has doctests. You include a small test in your docstring.
- sshine 18 hours ago
  
  Executable tests in Rustdoc are amazing. For those not involved, they are run when you write `cargo test` and they are included as markdown code blocks in your crate’s documentation.
  They’re not an excellent place for extensive testing. But they are super useful for making sure your documentation examples are updated and functional.
  #![deny(missing_docs)] is also a great way to ensure you don’t forget to document things.
- enraged_camel 18 hours ago
  
  Elixir as well.
YawningAngel 19 hours ago

Lots of people, myself included, read documentation.
- ks2048 19 hours ago
  
  Ditto. The problem is self-reinforcing: people don’t read docs because the docs are bad… we don’t spend time on docs because no one reads them…
  
  sshine 18 hours ago
  
  Not only is it self-reinforcing:
  It is inevitable because some competent programmers deliberately don’t comment their code, read comments, or delete other people’s comments when they are stale.
- sshine 18 hours ago
  
  I personally read and write a lot. I track things in git messages by cross-referencing issues, and I comment my code.
  But my point is: when you have a cultural divide among competent programmers on whether to comment, not commenting game-theoretically wins, because the outcome where you have comments that get updated some percent of the time is worse than not having those comments.
  Instead, embed what you want to say in a comment in the code itself, or in a test.
  Documents your libraries and APIs if they are used by people outside your team.
llm_trw 20 hours ago

Here's another: literate programming.
If it's good enough for Knuth it's good enough for me.
- oersted 19 hours ago
  
  I tried to do it seriously a number of times. Perhaps I am doing it wrong, but my productivity drops like a stone.
  Writing down your thoughts as you go and maintaining them takes serious time. The romantic notion of writing (and reading) code like a book is appealing, but writing books is hard and arduous, it cannot be underestimated as a craft on its own right, and there is coding to be done.
  There is also the question of structuring the literate code. Telling a story of how you are building it or explaining how it works has a very different flow and order to how code is usually structured for good maintainability.
  Please correct me if I’m wrong, because I would love to dive into it, but I don’t think there has ever been any major piece of software developed following literate programming (at least as Knuth envisioned it). I also don’t think there is any significant book that contains a sizeable working program embedded in it throughout, that can be compiled and executed as-is.
  In practice Knuth was most concerned with embedding short code snippets in his papers and books. Having the whole thing be an actual compilable program was secondary, and it was mostly short academic proof-of-concept prototypes and algorithms.
  Don’t get me wrong, I love the concept, that’s why I have given it multiple serious tries over the years, likely I will again, and why I think I have some insight of what happens when you use it for “real-world” work.
  
  llm_trw 19 hours ago
  
  NoWeb can support multi-thousand page documents which can compile to tens of thousands of lines of code.
  I used it at a deep tech startup I worked a number of years ago to document the theory behind why the code was doing what it was. Doubly useful since I could just use a regular bibtex citation system for papers which had done some part of what we were trying to do.
  My code became the defacto technical onboarding document, still in use today, despite the fact that none of the code in it has been updated since I left.
  For examples you can read: http://brokestream.com/tex-web.html TeX is written entirely in WEB.
  
  oersted 19 hours ago
  
  Thanks for the insight. Just to be clear, writing documentation after the fact with lots of code snippets is obviously good and is standard practice.
  You can take the next step and ensure that the entirety of the code is in your documentation as snippets. This usually doesn’t make much sense, there’s lots of code that it is not worth explaining in a literate style. And what’s the point of the documentation containing the whole program if it is written after the fact and you already have a standard more maintainable codebase as the ground truth? The fact that your literate code didn’t get touched says a lot.
  To me the name Literate Programming implies that you write the code in a literate style from the outset. If you make it literate after the fact it is just normal documentation with snippets isn’t it?
  
  llm_trw 18 hours ago
  
  >The fact that your literate code didn’t get touched says a lot.
  The fact it's still used 5 years later without any of the code still being in production says more.
  >To me the name Literate Programming implies that you write the code in a literate style from the outset.
  This seems like a fundamental misunderstanding of what it means to write. You should perhaps look at how people who do it for a living write books or articles. The final document has little to do with what you spend most your time editing.
  I absolutely hacked on the tangled source code of the program when trying to fix bugs or extend capabilities. Once I knew what I wanted to do I put it back in the literate program, usually finding a lot more bugs in the process.
  
  sundarurfriend 18 hours ago
  
  > I absolutely hacked on the tangled source code of the program when trying to fix bugs or extend capabilities. Once I knew what I wanted to do I put it back in the literate program
  Does NoWeb automate this "untangling" process in any way? I sometimes use Weave.jl [1] when I'm thinking out loud through code, and at times it would be nice to just work on the tangled code, refactor and reorganize things, and have it all untangle back into the original in some way. I have no idea how that would work though, and it would likely be pretty limited even if it existed, but I'm curious what the usual approach you take to this is.
  [1] https://weavejl.mpastell.com/stable/
  
  llm_trw 17 hours ago
  
  No, in noweb programs you insert chunks of code in multiple places and have conflicts when you try and automatically merge the code back too often.
  Org mode has a function which does this, but they didn't allow for arbitrary chunk nesting the last time I looked.
  Emacs has a number of very useful features in the modes for noweb/tex, one of which is jumping to the chunk which the code came from in the tangled source code on the pretty printed PDF. This follows the spirit of what you want. In fact SyncTeX support comes pretty much out of the box for noweb files and makes their editing a breeze, either as text or code.
  Of course if you're not on Emacs than god help you.
  
  textread 11 hours ago
  
  Are you using the following workflow?
  orgmode file --> export to pdf (aka weave)
  orgmode file --> org babel tangle
  Would you please help me understand your workflow for > jumping to the chunk which the code came from in the tangled source code on the pretty printed PDF
  Do the codeblocks in your pdfs contain hyperlinks back to the org file where they came from?
  
  llm_trw 9 hours ago
  
  No I'm using noweb. There is an option in noweb to add comments in the tangled code with the line and file from which they originate. Then there's an Emacs mode that let's you jump to that code. I wrote a little function that let you instead jump to the line in the same like in pdf using SyncTeX.
  
  oersted 18 hours ago
  
  Perhaps I came across as too critical, I have a lot of respect for what you did and for the craft of technical writing. And I definitely understand that writing is not done linearly and is very iterative.
  Correct me if I'm wrong, but it sounds like you were documenting existing code and that the result was very valuable as documentation, but not necessarily as code. You were acting as a technical writer not a programmer, it's a bit of a disconnect to call it Literate Programming, even if you were using Literate Programming tooling (NoWeb).
  This kind of documentation is common practice all over industry and it is valued, but I don't think Literate Programming is considered to be widely adopted because of that.
  
  llm_trw 18 hours ago
  
  I'm having a hard time even understanding what the question is here.
  You seem to be confusing the tools with the work being done.
  You can write a prototype of a C function in Python to see if you understand the requirements before you commit to the much harder task of writing it in C. That doesn't mean you're not writing a C program.
  The same is true for literate programming. I can write code outside the main literate program when I'm not sure it's meant to do before I put it back in.
  
  oersted 17 hours ago
  
  What I'm saying is simply that as you describe it, you are first writing the code normally, and then separately writing some documentation about it accompanied by code snippets for context.
  But if that's Literate Programming then everyone is doing it and it's not a very meaningful label, it's just documentation.
  I do get it, the distinction is that you are using NoWeb and you can convert between the documentation and the code, and that the documentation contains the entirety of the code. I suppose that's neat.
  At some point, this boils down to a pointless discussion of semantics (my fault). "Literate Programming" as you describe it does not sound like a style of "Programming". Actually, when you reverse the Programming/Writing emphasis, it simply becomes "Technical Writing", which is what everyone does, because that's what's actually needed. And it is done by great writers rather than great programmers (which may describe Knuth, with the upmost respect).
  I always interpreted it as writing the text and the code together, logging your thought process, thinking of code like a piece of literature as it is written, rather than adding some documentation to it later. The notion that writing it like this will yield better code, regardless of its value as documentation. I suppose that's why it was unproductive for me, it is a rather romantic interpretation (again, my fault).
  
  llm_trw 10 hours ago
  
  You write it however you want.
  I don't know how you code, but the first draft of code is never what ends up in the code base. Neither is the first draft of the documentation. You can write both together, but until you have an idea of what the structure of the code would look like, and how to split it up then you're better off doing multiple drafts.
  As always code is read much more often than it is written and literate programming is used for the reading part, not the writing part. The efficiency comes in not having to guess what 0x5F3759DF is there for.
  
  oersted 18 hours ago
  
  It seems like the site got the HN hug-of-death. Mirror: https://web.archive.org/web/20221130150047/http://brokestrea...
  I have skimmed the TeX literate PDF (I did a number of times in the past too). Frankly isn’t it just like normal code with verbose comments? I have seen lots of code like that and it is not referred to as literate. The only difference is that this is a PDF, which makes it less practical and it is still not particularly readable as a book.
  It might have great book-like typography but not the "narrative" structure that helps you properly understand how it works without getting bogged down in details first. There's no coherent outline, no chapters or sections for major systems or design decisions, no overarching overviews, no relating different parts and giving context. There's also no story of how it was built or a log of his thoughts throughout problem-solving process, that would have been another good angle. Instead it's just the code from top to bottom with embedded very local commentary. The code itself is actually rather hard to parse visually by modern typographic standards.
  The issue is that probably I am misinterpreting what Knuth intended. The Literate Programming concept was a product of its time, and it has evolved into more practical modern documentation standards that are not so tightly linked to the code and don't exhaustively cover every line. The only problematic thing about it might just be the grandiose name Literate Programming, without that it's mainly good common-sense advice for quality documentation, but not necessarily a practical programming paradigm like the name implies.
  
  llm_trw 17 hours ago
  
  Again, I'm having a hard time understanding what the issue is. It seems like you are deeply confused about what literate programming is and how it works.
  Have you read the original paper here in full: http://www.literateprogramming.com/knuthweb.pdf ?
  All of the navigation issues are taken care of by using <<chunks of code>> in a nested structure. You follow the numbers in those, like a follow your own adventure game, to find out whatever you need.
  The index has a listing of everything used in the program along with where it was defined and where it was used in case you want to find something specific.
  More modern tools, like NoWeb, turn all of this into hyperlinks so you can jump around the pretty printed version without having to loop up page numbers.
  
  oersted 17 hours ago
  
  I have read the paper in the past, I am well versed about WEB, and I believe I have done literate programming at length for a number of non-trivial projects.
  I have explained my thinking in a separate comment (apologies for creating two branches). In short, I do think you are right and that I had an overly romantic notion of Literate Programming in mind.
  
  jerf 10 hours ago
  
  Literate programming, as originally described by Knuth, is a good essential idea embodied as a bunch of accidental instantiations of the idea that have gone badly out of date. Knuth's ideas at the time add a layer on top of programming languages to allow you to rearrange the code in a lot of ways that the languages at the time didn't support well or at all. It essentially adds an independent concept of "function", and adds on top of any ability the language had to have documentation its own documentation overlay on top.
  Problem is, in the meantime, languages got a lot better at functions, got more flexible in their organization, built in better capabilities for documentation and comments, and it all goes a different direction than the languages did. The result in the modern era is a rather bizarre multi-headed hydra of conflicting ideas about how things should be documented and tested.
  If someone wanted to resurrect the ideas, they need to not just try to get people to do what Knuth laid out decades ago super harder... they really need to sit down from the very beginning and work out how to update it in the modern era to be less redundant to what we already have. It could be as simple as taking modern doc strings and upgrading them a bit to allow highly-formatted comments to be embedded into code. Or, instead of trying to "weave" the code into a static book, allow the user to specify an entry point and then follow through everything that happens in the functions that it calls and turn that into a book, e.g., say "I'm going to enter this web framework through this path, tell me everything that happens". Or some other idea I don't have yet. Something that harmonizes with modern languages instead of fighting them.
  
  WillAdams 18 hours ago
  
  I've found that Literate Programming suits how I think/approach projects, and it has worked for some quite large projects in the past for me.
  I've been maintaining a list of programs published as books and resources for Literate Programming at:
  https://www.goodreads.com/review/list/21394355-william-adams...
  esp. see:
  https://www.pbrt.org/
  and
  https://mruckert.userweb.mwn.de/understanding_mp3/index.html
  My current project is:
  https://github.com/WillAdams/gcodepreview
  which uses a LaTeX package for this which I put together with a bit of help from tex.stackexchange --- the big advantage to it is that it allows editing the documentation/code with "normal" syntax highlighting, the disadvantage is that the .sty file has to be edited/updated to match the files which are being output and I still don't have a good setup for the readme.md
  I find having the typeset PDF w/ its hyperlinked ToC and marginalia and indices helps a lot in having a "nice" version which I can look through to remind myself of what was intended at a given point, and most importantly, to find _where_ that was written down. Working on a re-write now --- we'll see if this holds up for that.
  
  oersted 17 hours ago
  
  Awesome links, thank you. I did come across "Physically Based Rendering" at some point, I forgot about it. This is definitely an excellent example of Literate Programming.
- GuB-42 19 hours ago
  
  Knuth is not the average programmer by far. And I am not talking about coding skills. Knuth is a writer at heart. He was also from a time where writing code on paper was the norm. Literate programming is good for Knuth, but maybe not for most coders today, who grew up on fast computers and IDEs.
  
  llm_trw 19 hours ago
  
  >who grew up on fast computers and IDEs.
  >>When I was a child, I spake as a child, I understood as a child, I thought as a child: but when I became a man, I put away childish things.
mycall 20 hours ago

Depends how you write code. When I use Semantic Kernel, my KernelFunctions include a well-defined documentation for the inputs and outputs, then using the System Prompt you can provide the concepts and glue between the various plugins. It is the function specification as a whole. Precision is important, although GPT is not yet perfect -- perhaps in another year or two it will be.
MrHamburger 18 hours ago

code-as-docs will never tell you why method or a module exist in the first place.
- sshine 18 hours ago
  
  The commit history will.
  
  MrHamburger 18 hours ago
  
  Well from my experience, people who does not write comments are not keen on writing commit messages either. "Some updates", "Module ABC" "Changes"
  
  simonw 17 hours ago
  
  This is why I like all commit messages to link to an issue thread somewhere (such as GitHub issues).
  The great thing about issue threads is that you can add more context to them later on - unlike commit messages which are frozen in time.
  
  MathMonkeyMan 3 hours ago
  
  And, again, issue threads are only helpful if the people corresponding in them are good communicators who put in the work to leave a legible record of what happened.
  Many Github issue threads that I read are incomprehensible. The thread is not where the story is (it's in the memories of the people involved). Explaining is work. This is why they make kids write reports as assignments in school.
  
  sshine 18 hours ago
  
  It seems like the “why” is eternally lost on us in this scenario.
  If I could wave a magic wand and make all the colleagues verbalise what they’re doing, I would.
  In the meantime, I’ll take the downvotes for pointing out that it’s fundamentally caused by a cultural gap, not just “those who don’t get it yet, and those who do.”
exe34 19 hours ago

my favourite documentation is minimal running code examples. give me example inputs to get the job done - i.e. not just "inputs: x is a y", but actually create a minimal version of y and show it going into f(x) and coming back out as a genuine object (as opposed to a mock) that I can inspect/prod until I understand what's happening.
xtiansimon 18 hours ago

Not at any level? My imagination is reeling—I’m thinking of Cold War era Spy Novels describing siloed groups who don’t know what any other group is doing. Each is laser focused on their own tasks and everything is a tactical choice. It’s the great reveal in the Movie adaptation when the separate threads come together and the grand national strategy is revealed. And now I’m also imagining the comedy versions of this genre, where the reveal has no purpose.
At some level there is _structure_, and it can be communicated. If for no other reason than to validate the evident structure in code.
> “…sitting at the same table as the almighty coding knights? […] Remove documentation…and your products cease to exist, their inner workings left to the imagination of…”
The author finished the sentence with “users”, but who are they talking about? Those coding knights and their imagination which sort of parallels my Spy genre example.
That said, the article seems a bit naive about the depth and breadth—reads like Ra-Ra.
- remoquete 18 hours ago
  
  Author here. What do you mean by depth and breadth?
  
  xtiansimon 11 hours ago
  
  -1, huh.
  Take my opinion with a grain of salt. Today I work for a small private firm, but years ago I worked in a corporate job (with tech writers). I drank the Kool-Aid. I believed the work I was doing was important and critical because someone up the vertical decided it was and they hired someone who hired someone who hired me to do a job--its corporate world.
  The tell for preaching to the choir, what triggered my response, are the hyperbolic strong statements ("products cease to exist", "your business don’t crumble overnight", "failures are dramatic") which sound defensive. The rest of the blog post reads as defining what exists and setting standards, and further explanation of why it's all important. This is the breadth and depth. What I'm not reading are on/off-ramps and shortcuts that all of the developers who disagree about the mission critical position of documentation would regard as sensible accommodations.
  Today I work in a small firm and we need documentation. What I learned from underpaying small business is documentation is important and has a role (otherwise our small firm wouldn't waste our time on it). But if I took the position argued here, I would be told I was wasting time, that I should focus, get it done, and move on my job tasks. If you're working closely with writers, then you need to believe what they're doing is important without being told. If you don't believe this, then maybe your role does not need documentation (yet? or your group is small enough and people like the job security or you're just overworked, or something else). When I'm working collaboratively and I hear people tell me what I'm doing is not important, I have to believe there is some truth to it. A limit.
  Good luck in your work.
  
  remoquete 3 hours ago
  
  > When I'm working collaboratively and I hear people tell me what I'm doing is not important, I have to believe there is some truth to it.
  No, you don't have to. And that's the whole point.
intelVISA 19 hours ago

Good code is fairly self-documenting, but alas.
- axelfontaine 18 hours ago
  
  That may be true for the "what", but certainly not for the "why".
  
  intelVISA 18 hours ago
  
  Why is best inferred from context imo, otherwise it's a smell for a system with too much scope.
- planetafro 18 hours ago
  
  I can't even count how many times that I've had the DRY vs. verbosity of code conversation in trying to norm a team.
  I'm in the camp of a little verbosity and repetition for the sake of clarity is worth it.
ransom1538 19 hours ago

I prefer when developers and project managers create massive google docs for specs and descriptions. Double points if you share the document with only a handful of favorite employees. Also, ignore all requests to get permission to this document. Eye roll in all meetings if someone hasn't read this document. You can get to god mode if you hide comments around the doc.

th0masfrancis 19 hours ago

Love the theme of your blog

remoquete 19 hours ago

Thanks! It's a tweaked version of https://github.com/mrmierzejewski/hugo-theme-console

jerrac 18 hours ago

So, I feel like "Docs-as-Code" has some context I'm missing, so I'm going to comment on docs in general.

I think there multiple kinds of docs for software.

* Comments explaining a specific section of code.

* API docs describing functions/classes/etc.

* Docs on how to use a library/class/etc. Usually including simple, isolated, examples.

* Tutorials on how to create simplified applications using the developed tools.

* Docs on how to deploy, configure, and maintain an application.

* Docs on how to use an application.

* Docs on how to troubleshoot an application.

* Docs on how to integrate applications.

* And likely others I'm missing.

Personally, I've been seriously frustrated by how bad most of the open source (haven't done much with proprietary code) documentation is. Case in point is Drupal and Symfony. Trying to use api.drupal.org is not fun, and Symfony's docs always cover the basics, and then there's nothing on pulling everything together into something complicated. So you try to dig into the actual code, and end up finding multiple layers of uncommented abstractions. Yes, I can eventually figure out what is going on if I put the effort in, but that's a lot of time that could be save by a few lines of comments.

I usually end up asking JetBrains AI about what I need, then use what it says after I fix the errors it makes... It's also very good at summarizing everything I'd find if I used a normal search. But that all only works if others have already asked and answered my questions.

Some things I've been trying to do to improve my own code's documentation:

* Unless the line is super obvious, even if I think it is obvious, I try to leave a comment. Yes, it seems pointless, but I have gone back to old code I remember being obvious without said comment enough times that I think it is worth it.

* Avoiding "elegance" in favor of "explicitness". For example, I use full `if` statements instead of ternary operators even when ternary operators would look better. For whatever reason the syntax of ternary operators has never sunk in for me, and the explicitness of `if` is much easier to parse. I also use very descriptive function and variable names. Basically, if I have to think about what something means, I try to change it so I don't have to.

* Split out functions into smaller functions as much as I can't. This means I can use descriptive function names. And I'm pretty sure it's just good practice.

I also have been trying to figure out ways to keep higher level docs closer to my code. I have some ideas, but haven't tried them yet. Has anyone ever written something that detects changes to a method/function, and then when you save your file it pops up asking if related docs need updating? Maybe add comments to the method pointing to where related docs live, and then your IDE/tool uses that to know what docs need updating?

simonw 18 hours ago

"Has anyone ever written something that detects changes to a method/function, and then when you save your file it pops up asking if related docs need updating?"
I've got a partial solution to that: I have automated tests that introspect my code for things that need documentation and then fail if those items aren't at least mentioned in the docs. Works really well.
I wrote about that here: https://simonwillison.net/2018/Jul/28/documentation-unit-tes...
- jerrac 17 hours ago
  
  That's a good way to do it. I was actually thinking of a Git hook or something in the ci pipeline as a place to start. So reading about how you implemented it was helpful. Thanks for sharing!

OhNoNotAgain_99 9 hours ago

[dead]

throwaway984393 20 hours ago

[dead]

cheschire 20 hours ago

This feels like a job ripe for startup disruption.

LLM documentation generators tend to benefit from context, and nothing provides better context than a mostly functional code base. The best part is the code doesn’t even need to compile for the LLM to build the context needed.

chaffroomba 19 hours ago

Problem I'm having as a developer with LLM documentation is their reliability, or rather lack of it. Every time there is an assertation I end up having to double-confirm it because they tend to be wrong as often as they're right. Reading imaginary hallucinated documentation is just about as useful as zero documentation.
While I could keep doing this for the rest of my life, my employer doesen't really appreciate the extra expense. A technical writer is much, much cheaper than the dozens of developers trying to confirm the docs.
- exe34 19 hours ago
  
  > Reading imaginary hallucinated documentation is just about as useful as zero documentation.
  no, it's worse. it's closer to reading outdated documentation that outright lies and gives examples that don't work, and will cause you to waste hours/days learning things that aren't relevant to the api anymore.
smokel 19 hours ago

The main problem is that documentation should be written on why code is written the way it is, and why it exists in the first place. This context is typically not available in the code itself. In the best case, this is encoded in requirements and design documentation, but more often than not, the information remains only in the heads of customers, architects, and developers.
The code itself merely describes how something is solved. Summarizing that in documentation can be useful at times, but it is not the full story. Especially for code that lives on for some time, the original design philosophy is often lost, and it is forced in horrible directions.
- spencerchubb 8 hours ago
  
  This article mostly seems to be about documentation for users of the code, not documentation within the code.
remoquete 20 hours ago

Author here. No, I don't think it does. https://passo.uno/ai-anxiety-tech-writer-howto/
- cheschire 20 hours ago
  
  You're speaking from a position of survivorship. The jobs you've been hired to perform give you confidence. It's all the jobs you haven't been hired for that I'm more focused on.
  
  remoquete 20 hours ago
  
  And the resulting failed documentation, I guess.
janice1999 19 hours ago

Code doesn't capture architectural decision making, one of the most important being why other solutions were not used. LLMs would need to be nearly omniscient (understanding underlying hardware, customer needs, budgets even) to derive the reasoning behind those decisions afterwards from code alone.
Etheryte 18 hours ago

This completely misses what valuable code documentation is for. Anyone can read the code and figure out what it does, what as a documentation can be convenient, but it's not really all that useful, especially if it rots over time. Even if it's painful at times, the what is all there in front of you in the code. Valuable documentation explains the why, why was one approach chosen over another, why do we need to do this to begin with, why are the edge cases the way they are. This information is not present in the code and no tool can ever extract it since it isn't there.
from-nibly 19 hours ago

A source code repository is actually a terrible place to get that context. All of the things like decisions and how this relates to the customer are completely missing.
joshuanapoli 19 hours ago

I think we will go the other way: A person will write the documentation and the AI will build the system that delivers the described product.
- chaffroomba 19 hours ago
  
  Isn't that what sw development today is? A description of a system and what we want it to do. With the advance of compilers, libraries, frameworks, linters, autocomplete systems and so on, we're already very close to describing the minimum amount of information the system needs in order to produce the correct result. To my knowledge actually physically writing the software has not been a bottleneck in a very, very long time.
  
  joshuanapoli 14 hours ago
  
  Right now, it takes skill and labor to move descriptions between representations for business goals, engineering (where we have frameworks and linters, etc.), and external/customer facing documentation.
  The customer is faced with an output from the design process. I think that we can turn that around now. Let customers edit part of the documentation, and let the AI adapt the system to their need.
  
  philipwhiuk 11 hours ago
  
  https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...
- euroderf 19 hours ago
  
  I would think that Requirements Engineering would be the ideal prep for AI code generation. A natural language counterpart to (e.g.) TLA+.
  Is this too damned obvious? Or am I missing something?
keybored 9 hours ago

> LLM documentation generators
I can’t wait for the LLM digression-generators to quit.

codelion 19 hours ago

With LLMs it is now quite easy to generate docs as needed. In fact we built a service to do just that - https://docs.codes/

Here is an example of how it is very useful especially for newer libraries.

remoquete 19 hours ago

You need good source material, including docs, to have LLMs generate docs that are accurate, reliable, and safe. LLMs have interesting applications in areas like SDK and API docs, for sure, but can't replace an entire function.
- codelion 18 hours ago
  
  Correct but this is a good starting point for code that is written after the cut off of language models training data as you cannot otherwise debate accurate code form then for the newer versions of the library.
codelion 13 hours ago

Just realised that I missed adding the link to the example - https://github.com/unclecode/crawl4ai/issues/126