Here's an easy, if not always precise way to remember:
* Hyphens connect things, such as compound words: double-decker, cut-and-dried, 212-555-5555.
* EN dashes make a range between things: Boston–San Francisco flight, 10–20 years: both connect not only the endpoints, but define that all the space between is included. (Compare the last usage with the phone number example under Hyphens.)
* EM dashes break things, such as sentences or thoughts: 'What the—!'; A paragraph should express one idea—but rules are made to be broken.
Unicode has the original ASCII hyphen-minus (U+002d), as well as a dedicated hyphen (U+2010), other functional hyphens such as soft and non-breaking hyphens, and a dedicated minus sign (U+2212), and some variations of minus such as subscript, superscript, etc.
There's also the figure dash "‒" (U+2012), essentally a hyphen-minus that's the same width as numbers and used aesthetically for typsetting, afaik. And don't overlook two-em-dashes "⸺" and three-em-dashes "⸻" and horizontal bars "―", the latter used like quotation marks!
> EM dashes break things, such as sentences or thoughts
Some style guides recommend "space, en dash, space" for this, and I prefer that myself – mainly because some software doesn't treat em dashes correctly as word separators for double click selection purposes.
For example, I'm pretty sure that at least some Kindle models would highlight both the word before and after the em dash when selecting one of them, which makes using the dictionary very annoying.
It's actually only your post that made me realize people don't normally put spaces around em dash. In French, Russian and a bunch of other languages proper typesetting is to use em dash as a standard dash character, and you always put spaces around them. So I did it in English as well, for many years now.
(I also now looked up and found out that in Spanish, apparently, you are supposed to put space only on one side of the dash, when used as a direct speech separator.)
I also put spaces around em dashes. It looks wrong—subtly wrong—to me to have the words glued together around the dash. It looks right — completely right — to me to have the dash standing on its own, as if it was a word in its own right.
The reason not to do this is observable in your post on my phone. The spaces cause the word wrapping algorithm to leave a dangling dash at the end of the line which looks ugly. Omitting spaces prevents the word break.
I mentioned that as an advantage in one of my other comments. An advantage both ways, because it depends on preference. I have the same preference as hansvm: I would rather see the dangling dash at the end of the line, so I prefer putting spaces around the dashes. Having the entire word-dash-word structure move to the next line feels ugly to me. As with most things, de gustibus non est disputandum. (And also, quidquid Latine dictum sit altum videtur).
It's the dangling dash at the beginning of the line that gets me. I see a lot of word break algorithms, including the one WebKit (and I suspect Blink) uses, which are happy to break "foo—bar" on either side of the em dash.
Funny, I'd rather have the break at the start or end of the emdash-implied break than just before or after it, not having to mentally handle some single dangling word divorced from its compatriots.
> The reason not to do this is observable in your post on my phone. The spaces cause the word wrapping algorithm to leave a dangling dash at the end of the line which looks ugly. Omitting spaces prevents the word break.
That's an interesting practicality but I don't think it's the cause of the rule: The rule probably long predates automated line breaking. Also, I think automatic line breaking will break compound words at the hyphen; it doesn't require spaces (which is also obvious from a software development point of view: the logic is relatively simple either way):
Lorem ipsum dolor sit amet, consectetur adipiscing double-
decker lorem ipsum dolor sit amet, consectetur ...
Ironically, on my phone the only line that ends with an em dash has no spaces in it.
If you want to not have a line break, you shouldn't rely on arbitrary behavior. You should use non-breaking characters like non-breaking spaces and word joiners.
To each their own: fully agreed, even though our tastes differ. I will mention one advantage of the spaces-around-dashes method: word wrap with default settings will break on the spaces around the dashes so that the entire word one, dash, word two combo doesn't end up pulled onto the next line as a whole unit. Whereas the advantage of the no-spaces method that you prefer is that word wrap will pull the entire word one, dash, word two combo onto the next line as a whole unit.
Why yes, I did list the opposite behavior as an advantage of each. Because that, too, is up to individual preference. :-)
That depends on the layout engine, I believe. Just tried it in Firefox (on macOS; not sure if it uses Core Text or something custom there), and it does sometimes break around the em dash in "foo—bar" style, not just "foo – bar" style.
I've definitely noticed the behavior you describe on some layout engines, too, and it's another reason why I personally prefer "foo – bar" style.
I've wondered about this for similar reasons. I usually omit the spaces but as I said in an earlier post I'll sometimes include them when I think the typography calls for it or when I want to add extra emphasis.
I've come to the conclusion it boils down to which style manual one follows. I've taken a careful look at numbers of high-end books which no doubt have been carefully typeset and I've found EM dashes with and without spaces.
It seems there is no definitive rule but I might be wrong.
For what it's worth, I was in the last class in my high school to learn typing on IBM Selectric typewriters. We were taught to type two spaces, two hyphens, then two spaces. Incidentally, we were taught two spaces after periods and colons. To this day, I find it hard to read text that doesn't have proper spacing after periods. (HTML and WYSIWYG word processors handle formatting, but e.g. fixed-font text editors don't)
Its funny that people think that conventions for typewritten text built around the limitations of typewriters define what is “proper” in environments where typewriters and their limitations are not involved.
Yes, this always grinds my gears too. There is already a slightly larger space after periods in contemporary typefaces.
The old typewriter typefaces were monospaced, ie. every character was the same width, but this is no longer the case. Virtually all typefaces today are proportionally spaced, not monospaced. So it’s redundant to leave extra room after periods.
What does this have to do with what I wrote? I said nothing of the sort. In fact, I explicitly pointed out that HTML and WYSIWYG word processors address it automatically.
I was under the impression that you do "-" for hyphen, "--" for En dash, and "---" for Em dash. IIRC, LaTeX (or maybe the editor, it has been some time) even helpfully changes that for you to the correct dash.
> I was under the impression that you do "-" for hyphen, "--" for En dash, and "---" for Em dash. IIRC, LaTeX (or maybe the editor, it has been some time) even helpfully changes that for you to the correct dash.
The conversion of '--' to an en dash and '---' to an em dash is done by the TeX compiler, and appears in the rendered file, but I think that most TeX editors don't change the TeX code itself. (This is distinct from XeTeX-based compilers, which can handle non-ASCII Unicode characters like the em dash '—' directly in the source.)
(I think that the article's point is that, in some fonts, -- (two hyphens) is literally the (approximate) size of an em dash, not that it is always understood as meaning an em dash. At least in my font, --- (three hyphens) is far too long to literally look like an em dash:
---
--
—
–
(in order, three hyphens, two hyphens, em dash, en dash).)
British typesetting style is a little different from US style in the way dashes are presented. In the UK, you might see a thin-space--en-dash---thin-space where a US typesetter would use a em-dash. Typewriter style generally follows books style. Since typesetters no longer use an extra space after punctuation, it's vestigial in typing.
How so? One is the only way to approximate an en or em dash on a typewriter or in a charset that doesn’t have one, the other seems like a workaround of a typesetting bug at best.
-, --, --- is, IIRC, how it is done in LaTex and would be exceedingly simple to do on a typewriter. That being said, to break up sentences I use " -- " because I think it looks nicer than "---". I'll go now ;)
LaTeX is a markup language though, not ASCII art. I can get behind two dashes as a substitute if no en dash is available, but three seems too much and looks like halfway to a horizontal line to me ;)
TeX puts more space after periods/fullstops (which is why you're supposed to do special markup or other measures to mark '.' in the middle of sentences which aren't sentence-enders (e.g. like e.g.)). But it's generally smaller than the equivalent of two manual spaces.
(A nice thing in (La)TeX is that one could follow the "two spaces after a full-stop" rule, which then has the advantage of being an explicit marking for sentence boundaries (which your editor might be able to navigate; Emacs has a convention of assuming two spaces after a sentence-ending '.'), but then the TeX typesetting will take care of making it look right. I lost the habit of actually doing this, for better or worse, except when flycheck/checkdoc/package-linter.el makes me do it for docstrings.)
I used to feel similarly. Now I find the double space a visual distraction that doesn't in any way improve readability.
The effect of the double space is, I suspect, a product of the reader's expectations: if you expect it, its absence creates mental work, detracting from readability; if you don't expect it, its presence is what creates mental work.
Hard habit to break. I learned it so long ago too.
Haha I learned to type organically, and it was only in my mid-40s that I retrained myself to type the correct way. It took something like 40 hours of practice on keybr.com before I could get close enough to my regular typing speed, such that I could switch over to the 'correct' method without it impacting my work.
Retraining myself to stop doing double-spaces took maybe a week.
> Some style guides recommend "space, en dash, space" for this
The last paragraph of the article also addressed the subjective nature of spacing around the em dash:
> Spacing around an em dash varies. Most newspapers insert a space before and after the dash, and many popular magazines do the same, but most books and journals omit spacing, closing whatever comes before and after the em dash right up next to it.
As far as the selection detail, did you mean that you replace an em dash used like a comma or parenthesis with spaces and an en dash for specific highlight performance issues? Surely the spaces and an em dash would alleviate the selection highlight behavior and not muddy the waters of when to use an em vs. an en dash?
> Spacing around an em dash varies. Most newspapers insert a space before and after the dash, and many popular magazines do the same, but most books and journals omit spacing, closing whatever comes before and after the em dash right up next to it.
It's funny that they omit to mention the possibility of setting it off with a thin space ' ' or hair space ' ' (those are the thin-space and hair-space Unicode characters, though they show up full width for me), which I thought was preferred typographic practice.
(On Googling, maybe the reason that they don't mention it is that I was imagining it; I can't find any evidence for my belief.)
> those are the thin-space and hair-space Unicode characters, though they show up full width for me
Interestingly, at least in my browser and grabbing the direct link to the comment with curl, show the bytes as 0x20 for both. Perhaps the comment submission handler, or even the browser, collated your more specific U+2009 (thin) and U+200A (hair) spaces into the regular U+0020 space?
> Interestingly, at least in my browser and grabbing the direct link to the comment with curl, show the bytes as 0x20 for both. Perhaps the comment submission handler, or even the browser, collated your more specific U+2009 (thin) and U+200A (hair) spaces into the regular U+0020 space?
Probably! I think HN strips out emoji; maybe it just takes the safest approach and strips out all non-white-listed Unicode.
Company I used to work for used AP for things like press releases and, I think, official blog posts and Chicago plus a couple different tech style guides for everything else.
Basically, we didn’t like some things in AP but we wanted to make it easy for journalists to copy/paste.
The good thing about style guides is that they’re guides, not laws :)
That’s one thing I really like about English: There’s no central authority decreeing what’s right and what’s wrong top down, and it feels like there is some room for individual preferences and experimentation.
Very refreshing, compared to e.g. German, which has more than one semi-official authority gate keeping “correctness” in speech and writing.
A semicolon connects, whereas an em-dash creates more of a pause and therefore separates. In addition, em-dashes can be used in pairs to create a parenthesis, which semicolons can’t. I think with time you will appreciate the difference.
Dashes surround a sub-clause - something like this - which is like a parenthetical addition to a sentence that could stand alone without it; semi-colons (';') connect a further sentence or part of one where perhaps a full-stop and additional word could have been. They also sometimes separate list items following a colon, especially if the things listed are longer sentences perhaps themselves containing commas that'd otherwise be ambiguous.
Em dashes are very similar to semicolons. You use em dashes if your related sentence is in the middle of another sentence, and semicolons if it's at the end.
They're frequently used in skilled and professional grade writing.
So as not to mislead anyone, the parent is mostly incorrect:
Here's an example sentence: Semicolons must have independent clauses—phrases that could form a full sentence on their own—on both sides of them; they are essentially alternatives for periods. Em dashes don't require independent clauses on either side.
In the italicized sentence,
* phrases that could form a full sentence on their own is not an independent clause but is valid between em dashes. on both sides of them, after the em dashes, is also not an independent clause. (The em dashes function like commas or parentheses here.)
* The parts before and after the semicolon are independent clauses. You could replace the semicolon with a period and you'd have perfectly valid grammar. I just chose to connect the two sentences a bit more.
I don't know if you can use em dashes as the parent comment describes, connecting three independent clauses:
* My favorite fruit is peaches—they are very sweet—I eat them all summer.
I think the above is wrong; it should be one of the following:
* My favorite fruit is peaches—they are very sweet—and I eat them all summer.: The last section is a dependent clause made by "and", not an independent clause.
* My favorite fruit is peaches—they are very sweet; I eat them all summer.: One both sides of the semicolon are independent clauses; I could replace the semicolon with a period.
Maybe there are examples I'm not thinking of? I infer that the rule might be that the punctution following the em-dashed clauses should be the punctuation that would have been used without the em-dashed clause, but that's based on very limited evidence.
Many people don't use semicolons (;) in English but many do, and they are certainly part of correct grammar.
Semicolons are generally alternatives to periods, when you want more connection between the two sentences. Like periods, semicolons must have two full sentences—that is, what could be full sentences—on either side of them; the potential 'full sentences' are properly called independent clauses. (A dependent clause needs the rest of the sentence to form valid grammar; it can't function on its own. For example, in this paragraph's first sentence, when you want more connection between the two sentences is a dependent clause. Often they follow commas.)
Another use of semicolons is for lists in a paragraph where one of the list items has a comma in it (similar to the parsing problem for CSVs where some records contain commas): I only like wine; beer, but only ales; and orange juice.
G. Brandon Robinson swears by U+2010 for hyphens in groff's Unicode output [0], but I see it as a hypercorrection. The most common convention by far (among authors who use Unicode and care about dashes) is to use U+002D for hyphens and U+2212 for minus signs. Not even the Unicode Consortium uses U+2010 for hyphens in its documents, and I'm not aware of any major organization that does.
As far as appearance goes, almost all fonts I've looked at make U+2010 identical to U+002D (i.e., they don't put any 'minus' into the 'hyphen-minus'), but a few make U+2010 a smidgeon shorter.
Intl.NumberFormat also prefers it, but then you can't paste negative numbers into most financial software, calculators, spreadsheets. Even back into inputs on the same webpage, if it does custom number parsing. Even though <input type=number> accepts U+2212 as a minus, it turns it into a regular minus when you spin it down to -2.
It looks much better though and more visible: −1 vs -1. I wish hyphen was a separate symbol from the ascii start, or that monospace fonts didn't tend to shorten "-" cause it makes little sense in monospace anyway.
— In the context of automatic text processing, it unambiguously indicates the function of a hyphen, as opposed to a minus
— Fonts can choose to make the hyphen-minus a bit wider than a regular hyphen, to accommodate the usage as a minus sign. In that case, U+2010 would be typographically more appropriate for a hyphen, similar to how U+2212 usually is typographically more appropriate for a minus sign.
Visual style of hyphen-minus depends on font. Some fonts displays it more like a minus, others like a hyphen. So if you care about distinguishing hyphen and minus, it makes sense to use dedicated hyphen and minus, and do not use hyphen-minus at all.
> Unicode has the original ASCII hyphen-minus (U+002d), as well as a dedicated hyphen (U+2010), other functional hyphens…
Which can be fun when parsing CSV files from various sources. I've hit numbers with U2010 or others where you would expect a hyphen-minus should be. Presumably someone² has copied a negative number from a document where one of the alternate symbols was used, and pasted it into everyone's favourite data-mangler¹ which interpreted it as a string, and so on down the chain.
--------
[1] Excel. Sometimes a joy, sometimes the bane of my existence.
[2] It is surprising, horrifying even, how much manual manipulation of data goes on in banking, where you might naturally assume everything is more automated these days. Sometimes a laborious manual process done regularly is seen as cheaper than paying for it to be automated…
It's infuriating that people are drawing this conclusion. LLMs pick up on em dash usage because professional and skilled writers use em dashes. They're a consistently useful, if niche, part of the literary toolkit.
But, no, now it's a problem because the majority of people's experience with writing is graded essays. And because LLMs emulate professionals, it's now a red flag if students write too much like professionals. What a joke.
Ha, good point, and an interesting question: What kinds of dashes did Dickinson intend?
It's a hard one to answer: We could look at published Emily Dickinson books from the time, but did Dickinson really pay that close attention to or have that much control over the type?
We could look at Dickinson's actual personal documents, but if they were handewritten, distinguishing dashes could be difficult even if there was intention there.
Fortunately we have troves of her handwritten documents; all of her poems were first printed posthumously. To me, she's using the punctuation as pacing or tonal markers as opposed to ligatures ("I'll clutch— and clutch— " vs "I'll clutch-and clutch-"). Many publishers style these marks as longer than normal m-dashes for that reason, which makes sense seeing as they are rarely used as asides.
Em-dashes have been the norm in every Dickinson poem I read, and I think it might have derived from the preferences of Victorian publishers, who I understand loved those long dashes.
I imagine it would have been up to the typesetter to make the call. The conventions for dash usage are fairly straightforward. You use em-dashes for asides, en dashes for ranges, and hyphens for most other cases. Its easy to figure out the right character from context (apart from en ranges vs hyphen ranges).
You want Robert Bringhurst, poet and typographic nerd. He gives them special withering attention in his Elements of Typographic Style. I think he referred to them as Victorian excrescences?
However this is the kind of rule that "existed" for a while and most likely will go away as most people can't be bothered with the difference and it all looks similar anyway
Or maybe who knows, it will keep going on because chatgpt knows it
Re last paragraph: dashes, etc. are confusing for perhaps most of us who aren't, say, typesetters, myself included. I use EM dashes a lot usually without a space between words and sometimes with spaces when I think the typography calls for it—or for extra emphasis.
Essentially, most of us guess the rules and often this doesn't matter much but it can in certain circumstances.
For example, in say machine conversion/transliteration. The ASCII dash is often used as a substitute for Unicode minus sign because it's easy to select [it's my usual practice], and anyway many don't know there is an actual difference. Whilst a human will usually know the difference by its use or context a machine may take the literal interpretation which could lead to say a numerical calculation error.
This problem has annoyed me for a long while. Why is it that wordprocessors and editors do not highlight these characters and query whether the usage is correct? Surely this ought not to be that difficult.
Another example is Roman numerals. The average person will enter say an uppercase 'I' for the Roman numeral one. Here's a typical example which is incorrect:
WWII
Here I entered the normal ASCII 'I' because it was too involved to find the correct Unicode character for Roman numeral one.
I'd like to know what others who are in typography, machine learning etc. think about this, and why WP programs and editors don't have simple ergonomics that allow for easy selection of the correct character.
† On a related matter, you'll note I've used single quotes whereas mmooss uses double quotes. This tell me that mmooss is likely in the US whereas I'm not. Again, this is not really a major problem for humans but it can be in transliteration, etc. Also, it's unclear (at least to me) what the default is for quoting quotes, i.e.: "" versus "' (right, I've refrained from using triple quotes).
Again, this seems country specific with I believe the US favoring double followed by single. Even when these rules are defined do people strictly adhere to them?
That doesn't seem to be an array at all, if the idea is to check whether a number is within a range. Seems like an interesting data type though, a combination of a range data type and a map/associative array.
I was thinking of a sparse array but any name will do. obj[~42] ?
One may have a bunch of key ranges each associated with a value or one may have a key that should be "rounded" to the nearest key or retreave the one below or above it.
It feels like something basic enough to have in a language and I found it oddly complicated to write myself. Comparing it with all values doesn't seem like a very good solution.
This one is U+4E00, CJK Unified Ideograph-4E00. So it's a common character between Chinese, Japanese, and Korean. This should be "one" in all three. And it does technically look a little different than a dash: https://unicodeplus.com/U+4E00
And this is different from Japanese's chuuonpu (U+30FC) which is a vowel elongation mark, and it's rendered horizontally or vertically depending on whether the text direction is horizontal or vertical, respectively.
AFAIK most computer keyboards don't have em dashes. Rather than hit ALT+0151 every time, I've always just strung along two hyphens, like: --
Absolutely proper and correct use of em dashes, en dashes, and hyphens is, to me, the most obvious tell of the LLM writer. In fact, I think that you can use it to date internet writing in general. For it seems to me that real em dashes were uncommon pre-2022.
This test feels biased by the fact that, like others have said, macOS provides keyboard shortcuts. For example, I'm only Gen Z and yet have tried for many years to use the proper dash characters in the right places, which is made much easier by virtue of being on a Mac.
Of course, I guess it's entirely possible—even accounting for OS—that this test remains statistically useful. It makes me kinda sad that my (very much human-generated) writing fails the Turing test....
Windows does too now via Windows+. which opens the "emoji keyboard" but you can switch to the "symbols" tab to see unicode. It does have multiple dashes in the quick access bar at the top or you can search.
I've used WinCompose¹² to add key composition to Windows for many years (after discovering the concept in Unix-land), which I still find more convenient than the other options I've tried (including the Windows Emoji keyboard).
[2] Though having checked just now, the sequences for en-dash and em-dash don't seem to be working. Perhaps one of my custom macros is interfering somehow… (it is behaving overall, ellipsis just worked as did the following diacritic and other symbols: áèîöūñ±⁰¹²∞¡¿‽π⬚). I'll have to poke at it later and see what is ary.
That has nothing to do with being on a Mac. Em-dashes and the compose-key work fine on Linux, and Android has them under the '-' of the on-screen keyboard when long-pressed.
(Windows probably has some way, but those are rarely discoverable.)
I disagree, there is absolutely no easy way to do it on Windows. You can install a third party program that emulates the compose key but on macos it "just works". And I think that makes a difference for 95% of users
I've always (well...for 20 years) done a Google search for "em-dash" then copy/paste the character off whatever result page come up. Word and other fancy editors always provided a popup pane where these characters could be clicked to insert.
It's a bit funny. On macOS en and em dashes can be natively typed with alt+- and alt+shift+-. The responses to your comment are apparently suggesting these methods are just as easy as that:
1. Install and configure this extra tool, which also by default enables a ton of other things you may not want, and may as well be a third-party tool even though it's technically built by Microsoft
2. Do a Google search and copy-paste (!)
3. Use a keyboard shortcut to bring up a symbol picker, then click on the tab containing the en and em dashes, then click to type them in
I hate that this feature doesn't have a timeout, so when you want to type "--" you have to "- -" and then go back and delete the space. You can't just wait as with double-space vs space-wait-space. It can be turned off, but that turns off other locale-based punctuation like quotes.
Just install a proper keyboard layout with proper typography support once.
It is maddening that the whole world uses typewriter keyboards with some facelift in the era of Unicode and even blasphemous full color emoji font rendering. What has changed in decades? Windows logo key, power keys, media keys, IE and Outlook logo keys — all Microsoft's fancies.
So initially IBM made some ad hoc decisions on what keys would be suitable for a single user office computer (as opposed to data input and admin terminals they had). Then everyone copied that, because sending unexpected scan codes could lead to bad things (random BIOS and program code couldn't care less about your ideas of forward compatibility). Then Windows became the “basic system” installed on most computers. Microsoft really pushed forward the internationalisation at the time, making a lot of national layouts and code pages (sometimes contradicting the national standards, for better or for worse). Then everyone copied what they decided. What's more important, even single byte code pages had the basic typographic symbols, anyone could've been using them for three decades, but they were not added to most physical keyboard layouts.
I wonder if that was because they wanted Word to seem more sophisticated than it was, and to make people think it was a requirement for “proper documents”, or because programmers still treated all non-ASCII symbols as free data markup constants that would “never appear in a regular text”.
Alt+hyphen or alt+shift+hyphen is an endash/emdash. You may not have been aware of it because it's so subtle, but many people (including myself) used emdashes long before 2022
Typing <word><hyphenminus><hyphenminus><word><space> yields an em dash.
Typing <word><space><hyphenminus><hyphenminus><space><word><space> yields an en dash.
That this has been true for some 3 or 4 decades makes me doubt all the comments that em dashes are a "tell" of LLM authorship. On the other hand, I guess when we confine this possibility to web content, I can see how people haven't used Office for web authoring lately, and whatever they do use (like web-based content management systems) don't tend to have this feature.
> Typing <word><space><hyphenminus><hyphenminus><space><word><space> yields an en dash.
More importantly, typing just a single hyphen minus in this constellation triggers the autoreplace, too. (Typing the double hyphen is only necessary without spaces in order to distinguish between an intentional hyphen and an em dash.)
Good point. Either way, it's kind of peculiar that getting an en dash in this manner demands flanking the hyphen(s) with spaces, and those spaces persist after replacement, when the typical usage of an en dash specifically doesn't demand spaces.
From TFA:
> August 1–August 31
From a top comment:
> Boston–San Francisco flight, 10–20 years
To achieve this using the replacement feature we're talking about would take something like <word><space><hyphenminus><space><word><space><alt+leftarrow><bksp><leftarrow><bksp><alt+rightarrow> which is ridiculous.
In professional typesetting, like a book, I sometimes see spaces flanking an em dash, however.
I can't get this to work in Powerpoint. It's funny, I clicked on this thread because I was struggling with trying to make an "emdash" in Powerpoint yesterday and couldn't find the correct search term for the "long hyphen" that I was looking for.
Works fine for me on PowerPoint for Mac, oddly enough. Unrelatedly, Mac also allows easy (non-alt-code) keyboard entry: option-hyphen yields an en dash, while option-shift-hyphen yields an em dash.
That's one of my favorite features of macOS keyboard layouts, but it's so close to one of my least favorite ones – option + space inserting a non-breaking space.
I almost never want that, and when typing "space, en dash, space", it happens quite easily and is usually impossible to tell visually.
`misc:typo` has been in xkb for about 15 years. There's also xkb-birman (matching the current state of the project that inspired all of it). If your national layout does not have level 3 and 4 symbols set, those should work straight away. If it does, it is highly likely that they clash, so you need to create a suitable subset. It is highly advised to find like-minded people, discuss the best options, and then gently push the result to upstream to make it available for everyone. After all, it's Linux, if you won't do it, no one will.
Certain corners of the world have absolutely cared about and employed the proper use of all the “dashes” well before but all the way up to 2022. I’d imagine LLMs have just consumed some of that material.
Pretty much everything professionally edit and typeset does, and those will generally be retained in Unicode text (obviously, not if it gets converted to ASCII). It’s less common in internet fora because not all users either know the use of dashes or have easy access to them on the devices they are using, and if its not both familiar and easy, people are going to skip it in quick messages.
I always use an em dash when possible when I should, and double en dash when I can't, just because I'm that kind of nerd. But it is the case that a double en dash on iOS autocorrects to an em dash, so I'm suspicious of the claim that em dashes are a tell for LLM writing.
Not in all fonts. In most monospace fonts, two hyphens will show with a small gap between them, for example.
I also personally prefer en dashes, surrounded by whitespace on both sides, over em dashes. Apparently some WYSIWYG software interprets two hyphens as an em dash, while other will interpret that as an en dash, so I'd rather just use the real thing if possible to avoid the ambiguity.
I've tried to use real hyphens and dashes since learning a bit about typography roughly 10–15 years ago. macOS makes it really easy with just alt and hyphen for en-dash, shift+alt and hyphen for em-dash. Definitely not an "obvious tell" of an LLM!
Thanks for the '⇧⌥<dash>' tip— from 2022–2025, I have been using macOS en's thinking they were em's.
(Side note: GTP says apostrophes should be used for pluralizing only for single letters to avoid confusion, but this seems more readable than "ens and ems" IMO.)
I recently got accused of using AI for some writing I submitted because I regularly use both en-dashes and em-dashes, and have for years. I said in another thread recently they are second and third, to semi-colons, as my favourite punctuation marks.
I was able to demonstrate my long use of them, prior to LLMs. And since I write in quarto markdown I don't need keyboard shortcuts.
Automatic conversions have been happening for a long time. In fact, a few years ago there was some combination of settings on my terminal locale settings and man (well, troff/groff most likely) was converting hyphens in param definitions to some sort of dash character, meaning I couldn't copy and paste out of the man page. I think it also affected perldoc for the same reason.
I don't doubt there are publishing platforms that do it automatically as well, so I wouldn't count on seeing them as an indicator of generated output, even if it may be processed in some manner.
It's like we spent twenty years writing (mindlessly copying) web pages with &mdash and only viewing them with lynx, and then somebody makes a graphical browser and the mistake is apparent, but I don't think the browser is in the wrong.
While this is true, this is an amazingly silly omission.
Serbian and Croatian XKB keyboard layouts have had em- and en-dashes since early 2000s even if they were not standardized: AltGr (right Alt) + hyphen (to the left of right Shift) produces an em-dash, and press Shift on top, and you get an en-dash.
This is how long I've had them easily accessible on any keyboard (I even have them converted to MacOS keyboard layouts for use with Karabiner).
The lack of em dash usage in popular culture speaks more about typical people than it does about whether a text's author was an LLM. In fact, the average person has never even noticed—let alone considered—that the em dash exists. If they've read for 20+ years, they've seen at LEAST hundreds of them.
Imagine being an NPC (a human bot), flattering yourself with the thought that people who understand the language are language bots...
21% of adults in the US are illiterate in 2024 and 54% of adults have a literacy below a 6th-grade level[1]. “The average person” isn’t really a high bar, unfortunately.
Is this a legitimate institute? The linked article offers many statistics and even financial figures but cites no sources or studies. There is a “TOLL FREE” (capitalized) phone number in the website footer, and the comments are full of prostitution ads.
Not at all. It's just inconvenient for most of the Windows-using world, as the characters are not accessible. It's ALT+[whatever] or Google-it-and-ctrl+V. Hence an awful lot of internet writing didn't really use any of that stuff properly.
Two chained hypens, as was pretty much the norm back then.
And did you just call me an NPC?!? It's not a matter of "understanding the language" at all. It's a matter of convenience and of a sort of evolved convention.
On mac it's very easy to get an em-dash, just alt+shift+`-`. Though I do concur that it's more likely to come from an LLM, I don't think it should be considered a tell — I find it more of a predictor of the writer's age.
That’s interesting to note. I have usually taken the time to properly use en-dashes when it seems appropriate because I frequently deal with strings that represent academic years. At least where I live, these span two calendar years. I have noticed that a lot of college websites tend to use the en-dash properly (e.g. on their academic calendar webpages).
> Absolutely proper and correct use of em dashes, en dashes, and hyphens is, to me, the most obvious tell of the LLM writer.
Or just someone who likes to use the right characters. There was a report a few months back about how writing from autistic kids keeps getting mislabelled as LLM simply because they use the correct specific terms.
Please stop associating being precise with being an LLM.
(La)TeX would typeset -- as an en dash. --- gets you an em dash.
I, of course, used proper dashes in typeset documents, at least after I'd learnt about them in Knuth's The TeXbook. I have found myself occasionally use them in ASCII contexts just as ---. But I've never sought out the proper unicode character.
The option key is IMHO the most underrated feature of the Mac platform. Having another modifier for character input is insanely handy, and I know where to find numerous characters like trademark™, divide (÷), pound (£), degrees (°), pi (π) and so on.
What i've been using: Install https://github.com/samhocevar/wincompose and you can then press AltGr then three hyphens to insert one. or if you're on Linux just search for "compose key".
More sophisticated clients require we use dashes correctly. I first encountered it pre-pandemic, so in professional contexts it's not a sure-fire signal of LLM use — Should you see em dashes correctly used in the Hacker News comments or Reddit, for that matter, then it's pretty reliable tell... Usually. ;)
I'd like to have the record show that I've been using them since before LLMs :)
Not sure when I started; my guess is that I got into the habit of using them in LaTeX when writing my thesis, and then at some point realized that they are easily reachable on standard macOS keyboard layouts (via "option" + "-").
As I mentioned above, I've had them easily accessible with a keyboard layout for >20 years on all the systems I've used — the only caveat that I find it really ugly with no spaces around em-dashes, which is usually recommended for English.
Usually I split it into two sentences, but yes. I don’t really see semicolons used in most business communications, so I treat them as a tell that the text was generated by LLM. Maybe I’m over reacting and prejudiced against semicolon usage.
Most word processing applications auto-substitute EM dashes as appropriate - some do it for two consecutive hyphens, iirc. I don't know if they substitute EN dashes automatically ... I don't know if there's a logic for that without understanding the text.
I've been using real em- and en-dashes for decades, in more or less the way M-W describes. MacOS and iOS make it easy to do, and growing up Mac kindled a life of typographical nerdage.
I wrote for a magazine during college days a few decades ago that uses the Chicago manual of style. I still use em dashes, en dashes, and hyphens regularly. They don't show up as such in markdown, but they are effectively: one dash for hyphen, two for em-dash, and one with spaces surrounding it for en dash.
The default US English Mac keyboard is so extremely good, and has been the way it is for so long, that I remain baffled that other platforms haven't simply copied it. I came to it relatively late in life and it's one of the reasons I wish I'd started using Macs sooner.
It's pretty decent but the fact that I can't type an arbitrary unicode character has been a huge annoyance of mine since I switched from Windows/WSL to Mac.
They have shortcuts for Í, Î, and Ï but not for many commonly used characters like arrows
You can add the "Unicode Hex Input" keyboard layout, which lets you enter BMP characters by holding down Option and entering its code point in hex (similar to the hex entry on Windows). Expanding the Emoji & Symbols pane minitech mentioned also lets you browse by category (e.g. arrows), and you can customise the categories and add a full Unicode character picker (not limited to BMP like the Windows Character Map) there as well.
It's very easy¹ on MacOS to make yourself a custom layout with the characters you commonly use. Personally, I put arrows on ⌥⇧HJKL, vi-style.² (Doing so for Linux is a little more work, as xkb is more complicated and less capable.)
Aside from the solutions other people have mentioned, if you have often-used symbols, you can set up a text replacement in keyboard settings. For instance, I have :x: for the multiplication sign.
Control+Command+Space or Fn+E or Edit > Emoji & Symbols if you know the character’s name. It’s not very convenient for repeated use, but it gets the job done in a pinch.
Yeah it's not great. Edit isn't always there. Fn+E seems to make the most sense. I've heard about ctrl+cmd+space but commonly forget it. Both of those open the same GUI which combines emojis, stickers, and unicode symbols—preferring the first two categories over the last. To type out a unicode symbol it takes at least three clicks on top of me starting to type in the name of my symbol
> Edit isn't always there. Fn+E seems to make the most sense. I've heard about ctrl+cmd+space but commonly forget it.
You can remap Fn/Globe directly to it if you want. It's also accessible from the Input menu bar item if you show that.
> Both of those open the same GUI which combines emojis, stickers, and unicode symbols—preferring the first two categories over the last. To type out a unicode symbol it takes at least three clicks on top of me starting to type in the name of my symbol
Are you using the expanded Character Viewer window[0], or the default collapsed Emoji & Symbols pane[1]? Because the expanded Character Viewer lets you customise and reorder the categories[2] (though that doesn't affect search), including adding a full Unicode view[3]. And they both default to the search bar when opened (though the Character Viewer opens unfocused for some reason).
This specific key combination is not US keyboard specific.
I like how they managed to group characters that are formally similar by binding them to the same keys.
I appreciate that the designer of the layout clearly attempted to make some kind of mnemonic connection to the degree they could. Makes it easier to discover and remember the key-combos, even without a cheat sheet.
That's c-cedille here, because to write English fluently you need to be able to type French loan words like façade—but not quite so often as someone in Switzerland, probably (especially so in some parts of the country!) so I assume you've got it somewhere even more prominent on your keyboard.
It's pretty bonkers (and mildly depressing, really) to imply that correct grammar and usage is a reason to accuse someone of using an LLM.
I mean if it's an obvious break from their normal style, sure. But by itself? Every time I hear this argument, it just seems like sour grapes from poor writers.
For a while, em dashes were really popular among LLM enthusiasts because of the idea that it would encourage the LLM to draw from training data that contained em dashes—which typically were higher quality training data written by a professional writer or somebody with a professional editor. Subjectively, I think it worked. I suspect that the LLMs trained to be used as chatbots were finetuned to use the em dash liberally for that reason. Now, after a few generations of these models, I think that the em dash is starting to have the effect of drawing from "slop" training data that was written by other LLMs rather than well-written human data.
Using spaces is not wrong. Typographically, a hair space or another thinner than usual space is usually used, but in plain text a space is often preferred. Style guides vary of opinion on this, but newspapers often space them. Without a space they end up looking like elongated hyphens joining the words on both sides. That's not their function.
Its not wrong for en-dashes (and en-dash set open—with space on either side—is generally an alternative to an em-dash set closed.) And its not wrong on the trailing side of an em-dash used in dialogue to show an abrupt stop mid-sentence if the stop is followed by a new sentence. And there's a few other particular uses, but, generally, setting an em-dash open is wrong.
> but newspapers often space them.
I've never seen a newspaper set em-dashes open, but I have seen them use en-dashes set open instead of using em-dashes at all. Given the space premium in print newspapers, em-dashes set open, which would consume enormous horizontal space, would, other concerns aside, be an odd choice.
I'm married to an editor and friends with an editor at work. They both use em dashes appropriately—even with informal writing. I've now learned the keyboard shortcut just to confuse people in the age of AI slop.
As a diligent user of ALT+0151 for many years on Windoes systems, I can contradict that it is a sign of LLM writing — perhaps in combination with other factors it can be used to increase the likelihood of LLM authorship, but alone, nope.
A few years back a journal editor maticulously reviewed all dashes in our manuscript and pointed out places where em dashes should have been used. Since then I started noticing different dashes everywhere around the internet.
That's the comment I was looking for to rally behind. I use the same character `-` for all purposes: minus, hyphen, em/en dash. It's easy to type and it makes practically no difference in meaning or legibility. I refuse to waste my time differentiating between multiple variations of a short horizontal line with a few pixels more or less. Ain't nobody got time for that.
Throwing my hat in here. The sub millimeter difference in the length of a dash conveys no additional meaning or clarity. It is impossible to argue me out of this position.
It's not like you can reliably write these consistently by hand either without going over the top in length to make it extremely obvious.
I'd wager serious money that if you put that on a sign and surveyed people, at least in the US, they'd all still conclude it is a "New York" to "London" flight.
What's the use of a communication tool, if it doesn't actually communicate anything to real people?
In my region at least, -5 ~ -2°C, or -5°C ~ -2°C.
If the something is making people confuse, we replace it with a suitable substitution. Re-educating people is really just last resort. Is there anything keeping us from changing it other than ego?
Em dashes don’t convey much meaning or clarity for me.
Rather, seeing too short of a dash is like putting two clashing colors together or wearing two pieces of clothes that don’t match. It just looks instantly off.
I have read her in the past and can't say there were world's of meaning between -'s. Can you link an example? I looked again and couldn't see any obvious ones. Generally she just completely abused the -. Does she even use a comma once? lol
This sort of anti-intellectualism is the perfect antidote for those who claim that improper grammar is nothing more than evidence of language "evolving."
I think many grammar rules are not intellectual but just randomly evolved conventions.
E.g. some English language rule says that a comma or ending period of a non-quoted sentence goes inside the quotes if there's something quoted at the end of that sentence. That rule feels anti-intellectual to me, as if there's some misunderstanding of how hierarchical placement in one-dimensional space works (since something that's not being quoted is being put inside quotes)
Spelling used to be more fluid and up to the writer/printer. Printers would also use different spellings as a mechanism to change the line width and otherwise format text to their liking.
I was going to post basically this. There is only one dash, and it's the one for which my keyboard has a key. Minus sign, hyphen, or any other use case. When MS word autocorrects to something else, I always angrily undo it, because I don't know or care what it's doing.
I don’t care about the length of the mark, but I did find this idea useful. Prone to excessive detail, I often find myself with a parenthetical inside of parenthetical. The developer in me insists on 2 closing parentheses. But it looks weird and nerdy. Although, using an em dash instead is probably just as nerdy.
> Dashes are used inside parentheses, and vice versa, to indicate parenthetical material within parenthetical material. ...
> The bakery’s reputation for scrumptious goods (ambrosial, even—each item was surely fit for gods) spread far and wide.
This is coming from someone who can only speak English: what a stupid language. How is having 3 symbols that are discernible only by their, almost identical, length a good idea? How would one grade a paper for correct usage, especially if handwritten?
En dashes, I'll grant you, are pointless. Those can go away.
However, em dashes are a different case. The main reason why it's desirable to use em dashes (beside convention) is for clarity of purpose. The hyphen is already a very overloaded character; they're extensively used to denote ranges and link compound words. Importantly, both of those usages do not correspond to pauses in spoken language. If you're voicing a hyphen you're supposed to barrel on through it. An em dash is much closer to a parenthesis, comma, or semicolon. It's a meaningful break in the sentence, in the way that a hyphen isn't.
Now, if it were up to me I'd choose a different character to replace em dashes (maybe underscores), but that's a separate argument.
I take this advice like "do not use a preposition to end a sentence with" and "pay close attention to 'much' and 'many'". Personal preferences from the 1800s taken as gospel by grammatical extremists, to the point where they're taken as some kind of solid rule in a vain attempt to forcefully shape language to a personal preference.
There are cases when you want to follow certain guidelines, for sure. If you write for a publication that adheres to Meriam-Webster, you'd better stay consistent and figure out the right AltGr code to type the right dashes. However, for the 99.99% of written media today, none of that matters.
> Personal preferences from the 1800s taken as gospel by grammatical extremists, to the point where they're taken as some kind of solid rule in a vain attempt to forcefully shape language to a personal preference.
This is also true of "less" and "fewer". I use "less" everywhere.
Ending sentences with prepositions is and had always been fine. It has never been a serious rule of grammar that you may not end a sentence with a preposition. It does sometimes make a sentence sound better to rewrite it so that it doesn't end with one though. For example, "do not use a preposition to end a sentence with" sounds awkward to my ears, probably because you deliberately crafted the sentence to end with a preposition even though that is not naturally what you'd end that sentence with. (The previous sentence doesn't sound awkward to me, interestingly.)
Getting "much" and "many" right is completely different. They mean different things. Confusing them makes you sound stupid. Less vs fewer is the same. It often doesn't matter but in some cases it really grates on the ears (eg "there wasnt much people there" just sounds awful).
Dashes are not in the same category. They are orthographical conventions. They aren't really grammar. They are more like spelling. You can spell things wrong and say it doesn't matter because spelling is arbitrary and you can use the wrong dashes too, but it makes you look either uncaring or ignorant. If you want to give a good first impression, learn the basic conventions of written English and follow them.
Yeah, trying to get people to take Em vs En vs Hyphen seriously is a fool's errand. Only typography nerds would take it seriously and there just aren't enough of them to make a difference. I'd guess that the vast majority of people have never even heard of these distinctions.
i refuse to care about this lowercase letters are all i will ever use i see no possible reason to use the other symbols
Suit yourself, but if you refuse to learn basic grammar you will be treated like you are stupid and uneducated. Like it or not, presentation matters. Getting the basics right, including things like spelling, grammar, etc, shows a basic attention to detail without which your services will likely do more harm than good.
You said "look at any dictionary", so I did. I notice you can't provide a link to a single dictionary that supports you, or even name what dialect supposedly doesn't have "etc."
Etc. is an abbreviation for etcetera. Correctly signifying contractions, abbreviations, and acronyms is far more commonplace than using the correct dash. Almost everyone would have learned about shortening words in high school; many people leave university without ever having heard of an em dash.
Etc is also an abbreviation of et cetera. Only Americans put pointless dots everywhere.
This is all stuff you learn in school. Punctuation isn't obscure or niche. You may not have learnt about semicolons or em dashes in school but you should have and I did. As did anyone that has ever read a novel. There are two semicolons on the first page of the first Harry Potter book, a novel read by approximately every child of my generation. There are loads of examples of the proper use of dashes and other "obscure" punctuation marks in any professionally typeset text.
Robert Bringhurst¹ prefers the en dash in the context of setting off phrases:
"The em dash is the nineteenth-century standard, still prescribed in many editorial style books, but the em dash is too long for use with the best text faces. Like the oversized space between sentences, it belongs to the padded and corseted aesthetic of Victorian typography.
"Used as a phrase marker – thus – the en dash is set with a normal word space either side."
Presently re-reading this book, The Elements of Typographic Style. It’s one of the few books I’ve gone out of my way to get a physical copy of – it’s just beautiful.
And I totally agree, space-set en dashes are vastly superior to em. I dislike the way it connects the word more closely to the word in the next clause than the phrase itself.
E.g.
He left—no explanation. Vs. He left – no explanation.
To me, left—no feels like a weird gluing together than a separator for a different section.
Because I am exactly the kind of person to obsess about this sort of thing, when I was working on my last book, I spent a lot of time deciding how I wanted to style dashed subordinate clauses.
Personally, I think en dashes are too small and look like a mistaken use of a hyphen. I really only use them in their Chicago Manual of Style recommended uses like date ranges.
But I agree that em dashes without spaces around them look wrong. They glue the adjoining words together when the whole point is that the clause is secondary and should be set aside from the surrounding text.
I ended up using em dashes with a little blob of CSS to put a tiny amount of space on either side.
"Used as a phrase marker – thus – the en dash is set with a normal word space either side."
"Used as a phrase marker—thus—the em dash is set without normal word spaces."
>the em dash is too long for use
above, the em-dash without spaces is smaller, at least in this typeface
I've taken to using dash offsets—just as an aside—in many places were I formerly used parentheses; I find it "less interrupts" the flow of the sentence.
So much this. Two weeks ago I learned that en dashes are used for numbers, but I thought they are what em dashes are for. Em dashes for me are too long and ugly.
* Use the minus sign /−/ (U+2212) when formatting numbers, because the default hyphen-minus /-/ (U+2D) just looks wrong: "It is −1 °C vs. -1 °C." Moreover, the correct minus has the same width as plus (− vs. +).
* Rare, but use the figure dash /‒/ (U+2012) or figure space / / (U+2007) if you need a placeholder character that is the same width as a single digit. For example, "Guess the PIN: 1‒34."
Somewhat off topic, however, I'm thoroughly convinced that there is a very high probability something is AI generated when I see Em dashes. Anyone else noticing this?
ChatGPT for example almost always uses them. I'm sure they are more common in academic writing, but its now super common on boards like Reddit.
I've been employing em-dashes extensively since I went on a JD Salinger binge circa 2002. Also, "incidentally", for the same reason. I use "Nb" a lot, from reading a bunch of DFW years ago. Oh, and that very-precise construction he does with "which" all the time, I stole that.
Before LLMs, I think em-dashes mostly signaled that you read books and paid attention to details, to the extent they signaled anything.
To generalize your point: A lot of the "brown m&ms" that we've walked around with for detecting a writers status, education, etc., are less useful in an age of LLMs.[1]
We might even be entering some waves of counter-signaling.
[1] They'll never totally nail all of DFW's mannerisms, though.
When looking at the context of a given text, use of certain words or punctuation, can very well indicate AI use.
The "original" example was delve. There is no doubt that AI (did, or still does) use this word at a significantly higher frequency than the average person. I would say the same about em dashes.
When browsing a Reddit thread about a video game, if you encounter numerous comments written perfectly, especially those containing indicators like em dashes, the word delve, or similar language, it certainly can raise the question: am I genuinely seeing comments from users who write this way in this specific context, or is this content more likely produced by an LLM?
It depends. Em dashes in news articles and written publications? Definitely expected. Em dashes on social media or reddit? Either someone who works in typesetting, or an LLM. Most likely an LLM, giving the dying nature of printed media.
Only typography nerds and professional printers care about things like these. Popular media, even modern professional media, hasn't been paying all that much attention.
I’m not sure the same happened with “delve.” I saw an analysis of paper abstracts showing a clear uptick of “delve” starting with the mass-adoption of ChatGPT. Maybe it suddenly became a trendy word — especially in paper abstracts — or maybe more paper abstracts were edited by ChatGPT.
Combining the various "tells" of an LLM (em dashes, delve, grammatical signs etc) with the context (Reddit comments vs professional setting), you could establish a rough probability it was AI generated. At this point, it's the best we can hope for.
There are regular folk who tend to be pedantic with their writing. I'm not sure this is a good test of whether text is generated by LLM. Consider that some may use LLMs to correct spelling or grammar, and the LLMs may often edit an en dash to em dash.
To be clear, It's essentially impossible to know if a given text is autonomously LLM generated (a bot on social media for example) or is the result of revision of real human effort.
To what extent that distinction matters, I'm not sure.
I've encountered and used em dashes regularly for the last 20 years. If most of your reading and writing are associated with social media, I could see the trend you're describing appearing real within that limited context. But em dashes are not new and have been a feature of high quality writing for many decades.
Yes, several of the most popular (and even lesser-popular but newly open-sourced models such as Gemma 3 27b) overuse Em dashes. Even when prompting them to not use dashes, they almost can't help themselves and include them occasionally anyways as it must be part of their learned stylometry. It's just not a common symbol to use at all as most people generally use commas for the same purpose. I can't even remember learning about Em dashes in my college english classes.
I submitted an application which I typeset using LaTeX, and some people thought it was AI-generated because of en and em dashes. I have been using these since forever.
If it's posted through a publishing platform (not just a commend on one or on a public site), it's very possible they do an automatic conversion of some of the common cases. That could also be filtering down to comment boxes and stuff, I'm not sure.
That's not to say that generated content doesn't use them, just that using them as an indicator might require a bit of nuance based on where you're seeing them.
I saw a reel the other day where some Young People(tm) were talking about "the ChatGPT hyphen" (an em-dash.) There was much wailing and gnashing of (false) teeth from Old People(tm) in the comments.
There is a special kind of irony in the fact that habits that used to set one apart from the unwashed masses (like the proper use of punctuation) now serve as a signal for being non-human.
Everyone I know that writes a lot, especially for copy or product design, seems to use em dashes more heavily. I've even seen a Drake format meme where he is shaking his head at parantheses, commas, and colons but—finally—nodding in approval at the em dash.
Em and en dash usage is officially part of style guides such as The Chicago Manual of Style [1], so it's often a work requirement for many writers and editors to use them in writing. This is why these kinds of dashes are everywhere in newspaper and magazine articles.
Eventually, people learn to include them out of habit—especially as most people see them as aesthetically nicer than a simple hyphen (-).
Exactly. If I see an Em/En dash in a publication of really any kind, I don't think twice. Because that's the traditional context for them. Professional writing.
Yep, definitely been noticing it, especially on Reddit. It almost always makes me navigate away from the post, unless the author mentions that they’re using AI.
Hold on, I'm coming back to this thread, I think I've cracked it guys. Some real alpha for you right here:
If the em dash has spaces around it -- as seen in AP style -- it was probably written by a real human, because that's how it comes out most conveniently on a word processor.
But if the em dash has no spaces around it--Chicago style--there's a good chance you're looking at LLM slop.
I saw this comment a day ago but it only clicked today. The way we tell it's AI is the use of too formal grammar. I think that means they now pass the Turing test. Or at most a hair's breadth from passing.
The only people still using em-dashes are those who think it's somehow a signal of high intellect rather than being (extremely) behind the times. Case in point: this exact comment section where you see it with ~10000x the frequency of standard human writing, or even the average HN thread.
Just makes me roll my eyes really seeing a human use an em-dash. We've in the age of informality, and at least for me personally I've definitely filed the em-dash away as "a near guarantee the text was written by a machine". No matter how much and perhaps especially because HN commentators are coming out of the woodworks to insist they've been using it daily for years.
It is one of those things that doesn’t really matter for readability, but although they can’t necessarily put a finger on why, people may still notice that some documents or pages appear to be set with more care for details than others.
(edit: I guess if you don’t have to search on Google what the hell a ‘Microsoft Word’ is, then you’re officially old)
And the 1 and 8 aren't next to each other anymore, either. (See typewriters from the "18"00s.)
> those smart quotes
Fixing straight quotes is a hard problem[0]. My FOSS text editor, KeenWrite[1], includes my library, KeenQuotes[2], for replacing them at build time. It's not perfect, but can typeset my ~400 page novel without any errors.
> Did you know there is a dedicated ellipsis character?
When typesetting Markdown, KeenWrite first converts the document to XHTML (i.e., XML), then invokes ConTeXt to convert XML into TeX macros. One of those macros handles the ellipses by converting it to \dots{}:
for em dashes and ellipsis at least it's trivial to convert before displaying them... which I do in my own markdown-to-publication toolchain (but not here on HN).
Em dashes without surrounding spaces is such a ugly relic that triggers me to no end and is objectively wrong. The dash object is part of the sentence — not the two words it's separating.
You write pages 1,003–4, instead of typing out 1,003–1,004 which is just unnecessary.
Works the same with two digits, or even three: pp. 1,899–902.
This is standard practice and arguably clearer.
I've only ever seen it done with page ranges, though. I'm not sure if it's done with year ranges? E.g. 1984–5? Or 1989–92? You work with page ranges constantly in academia, I just don't see year ranges much in any form.
Literally never seen this (wish I could grep all comments I've ever replied to) and I do not understand what makes you say that it's clearer when it's dropping information, making it relative rather than a fully qualified number
In speech, it's common, and misunderstandings are usually not a problem (if you're not monologuing on a recording) because someone will just ask; but in writing it looks like the range is the wrong way around. Maybe I expect more care in writing because the feedback loop is longer, or maybe it's just habit and I think it's wrong in writing because I never see it?
Quick, tell me how wide this range is, just as an order of magnitude:
285368737954–285368783645
Would be a lot easier if I only included the range at the end which had actually changed, wouldn't it?
That's why it's clearer. Now obviously that was an extreme example, but it's also easier to see at a glance that 1,387–9 is just three pages, as opposed to 1,387–1,389.
If you format your numbers properly, you get "285,368,737,954–285,368,783,645"
That's a change of about 50K, which isn't really that hard to notice.
"285368737954-83645" is... well I have to assume somewhere in the 10-100K range? Hold on a second while I line up the digits again... uh... let me rewrite that to "37,954 - 83,645", okay now I can read it. No, that wasn't any easier. I kept getting lost tracking where in the first number I was leaving off. Much easier to compare 737 vs 783 - digit groupings are really useful!
(I'll agree that 1387-9 is pretty reasonable, it just breaks down the longer the number is. Also, if the page count is important, you can just say "1387-1389 (3 pages)". This feels like the sort of shorthand you used to get on Twitter)
Taken to an extreme without formatting, sure, but what ranges have that many digits in human-readable situations? And if there are those exception situations, you can word around it for that case ("285368760800±45691" or "45'691 years after 285'368'737'954")
Genuinely trying to think of an examples, since e.g. books aren't ever that long and search results don't have that many pages (that you'd all read and refer back to). A salary range, perhaps, can get into the seven digits in extreme cases (not that you care about any individual digit when you make a lifetime's worth of money in a bit more than a year): "Prospective salary is 2'423'000 to 2'432'000" seems to convey the relevant info as well as "Prospective salary is 2'423'000 to 9'000" does (except that I wouldn't understand the latter and ask what this second number means, but that's plausibly attributable to me as an individual not being used to it)
MLA-style citations call for abbreviating page ranges in that way. I mostly see it in literary papers, and not many other contexts, so it would be easy to notice them rarely if at all. Outside of that context, I occasionally see it used for year ranges.
copy/paste, "print", paste in from page, to to page
Result:
> print pages in range from: 1, 003
> print pages in range to 4
Now have I have two errors to fix: page 1003 to page 1004. Not nice. Who formats like this?!
-------------------
Also, some RPG books or encyclopedias I own have chapter that span like this:
p. 630 to p. 70 (book 2)
To me, now is unclear, is that 70 with a reset page count, or 670 for book 2?
Since I just now learned that a quotation standard somewhere outside Germany exists that omits leading numbers, I now need to manually check where it ends.
TL;DR:
Don't make me think, and allow for automation. So just write on more number.
When I was editing an academic book published by a well-known university press, we were all asked to do that for the references. (And my colleagues, all doctors and lawyers, only knew Word and entered the references manually.)
I read Butterick's Hyphens and dashes some years ago and it stuck with me. Now I regularly use hyphens, en dashes, and em dashes correctly—I even memorized the Unicode sequences and enter them seamlessly on Linux with Ctrl-Shift-U!
We need a blog post documenting the ironic trend of people—themselves NPCs, actual human bots, just now realizing the em dash exists despite seeing it hundreds if not thousands of times before LLMs—flattering themselves by suggesting that anyone who understands the language at above a 5th grade level must be an LLM.
Taking knowledge of the three extra pixels that are "more correct" as some kind of indicator of intelligence is silly. Pretending you're somehow above them is just sad.
The comment above is not about being special, it is about proper typography that is still everywhere around us: books, serious websites, anything done by real designers. Those people had to try hard to miss all of that.
No, it is not “politically incorrect” to call people lacking curiosity and/or education like you see them.
No, someone's personal preferences or transitory fashions are not automatically promoted to the holy reference for the whole world.
One point that is very rarely mentioned is how to place em dashes around quotations marks.
If the em dash indicates an interruption (not a planned pause) of the actual speech, the em dashes go inside the quotes (often just one, before the closing quote).
If the em dash is the narrator interjecting with additional information, the em dashes go outside the quotes.
Besides this, the question of where to put spaces when multiple forms of punctuation are combined can be quite a complex topic.
I use em dashes all the time in writing, but unfortunately ChatGPT and co. use the em dash frequently—and most people use the em dash infrequently, not knowing how to type it on a keyboard—so it's starting to make my writing look AI-generated sometimes. I fear it'll have to go the way of words like "tapestry."
FWIW, you can type an em dash on Mac with shift + option + hyphen.
I like em dashes and use “Option Shift -” to summon them on macOS. However, LLMs tend to overuse them and compose absurdly long sentences. While proofreading a draft, I often instruct an LLM to “keep the original tone intact and don’t create overly complex sentences by fusing together simple ones.” That usually gets the job done.
Writers adores their em dashes. While they can sometimes clarify a concept by adding more context, overusing them can hurt readability. I prefer to read Hemingway-esque sentences that just say what they want to say and end sharply. So that’s how I write too—and sometimes the overuse of em dashes directly conflicts with that, making the content sound as if the author is confused about what they wanted to convey.
If you are looking for alternative to kebab case to write identifier in programming language which reserve the - (U+002d) as an operator, chances are good you can use · (U+00B7 · MIDDLE DOT), that we use in middot case.
So isMorePleasantToRead, is_more_pleasant_to_read or is·more·pleasant·to·read is up to you.
On the bépo layout that I use, extremely well, as it sits between ’ (U+2019 ’ RIGHT SINGLE QUOTATION MARK) and ‑ (U+2011 ‑ NON-BREAKING HYPHEN), each being generated by altgr+shift and x . and k (which are all on the opposite side of the keyboard compared to altgr key).
At least from the point of view of digital gymnastic, it’s not really any worst than camel or snake cases, though direct access to dash could be said to give a small facilitation for input in kebab case.
So it really depends on the keyboard layout used (or whatever input device facility is used). What’s you favorite input method lately? Does it really doesn’t provide a convenient way to input more than ASCII visible glyphs?
Plus, let’s be honest, identifiers are generally written in full expanse only once, then autocompletion is going to do it for us. And we all know we spend more time reading identifiers than declaring new ones.
The reason this works in Rust is that Rust follows Unicode's categorization of which code points are useful as identifiers: https://www.unicode.org/reports/tr31/
MIDDLE DOT is Other_ID_Continue
I know less about the other languages but it wouldn't surprise me if they did similar things.
I use em-dashes correctly because a reader emailed me, and I was dreadfully embarrassed. You can actually see them become correct in my writing after the "I will pile drive you" AI thing.
It never occurred to me that doing this correctly might make people think I use LLMs in my writing.
Edit: I'm sure the many typos protect me from that, actually.
> If you want to be official about things, use the en dash to replace a hyphen in compound adjectives when at least one of the elements is a two-word compound.
How is a literal dictionary making fun of people who "wanna be official about things" lol. That's the entire basis for dictionaries themselves
It's Merriam-Webster - they are descriptivist rather than prescriptivist about language. They don't define correct usage per se, but rather document actual usage, though some usage may be given greater weight than others.
In this case, they are calling out the prescriptivist definition but are implying that it may be overkill and offering the more commonly used alternative.
I had one minor quarrel with this article: The use of spaces (of any kind) before and after the em dash or any dashes.
Personally, I am fond of using either a hair space or a thin space before and after the em dash. Not a full space!
To explore the various options, I wrote a little program to print the various combinations of dashes and spaces. I think what looks best depends a lot on what typeface you're using. But let's see how they look in the Verdana font used here. You should be able to paste this into your favorite word processor to see it in other fonts:
ASCII 0x2D hyphen-with no spaces
ASCII 0x2D hyphen - with U+200A hair spaces
ASCII 0x2D hyphen - with U+2009 thin spaces
ASCII 0x2D hyphen - with 0x20 full spaces
Unicode U+2010 hyphen‐with no spaces
Unicode U+2010 hyphen ‐ with U+200A hair spaces
Unicode U+2010 hyphen ‐ with U+2009 thin spaces
Unicode U+2010 hyphen ‐ with 0x20 full spaces
Unicode U+2013 en dash–with no spaces
Unicode U+2013 en dash – with U+200A hair spaces
Unicode U+2013 en dash – with U+2009 thin spaces
Unicode U+2013 en dash – with 0x20 full spaces
Unicode U+2014 em dash—with no spaces
Unicode U+2014 em dash — with U+200A hair spaces
Unicode U+2014 em dash — with U+2009 thin spaces
Unicode U+2014 em dash — with 0x20 full spaces
It looks like HN is really mangling this. Hair spaces are rendered wider than thin spaces?
If anyone wants to experiment, here is the Python code:
from dataclasses import dataclass
@dataclass
class Character:
char: str
name: str
DASHES = [
Character( "-", "ASCII 0x2D hyphen" ),
Character( "\u2010", "Unicode U+2010 hyphen" ),
Character( "\u2013", "Unicode U+2013 en dash" ),
Character( "\u2014", "Unicode U+2014 em dash" ),
]
SPACES = [
Character( "", "no" ),
Character( "\u200A", "U+200A hair" ),
Character( "\u2009", "U+2009 thin" ),
Character( "\x20", "0x20 full" ),
]
for dash in DASHES:
for space in SPACES:
print( f"{dash.name}{space.char}{dash.char}{space.char}with {space.name} spaces\n" )
If you're on Windows, install PowerToys, and check out the KeyBoard manager. It lets you set up shortcuts. I overload my keys using right alt for greek letters. (science stuff). Could do it for these dashes as well.
I used a lot of these, but actually stopped due to my text sometimes being called out as chatgpt output. I also thorw in the occasional spelling mistake. If a piece of text on reddit/x has "–" (not "-") in it, you can be 95% sure it's an LLM.
That is an interesting observation. I wonder what percentage of the training text data for LLMs contains proper dashes, since a large part of it is user-generated content.
All self-respecting journalistic outlets use proper symbols. Where does the LLM get their opinions on “foreign affairs” from? Probably from the likes of New York Times like a standard lib...
And it shouldn’t be hard for an LLM to learn to use proper symbols when synthesizing content from the everyman. It’s not like it works on the level of literal copy and paste.
For Windows users, PowerToys has a Quick Accent tool, that lets you type in an em dash or figure dash by holding down the hyphen (-) and then toggling the space bar. Interestingly, the en dash is not available.
> The en dash is the least loved of all; it’s not easily rendered by the average keyboard user (one has to select it as a special character, whereas the em dash can be conjured with two hyphens)
on macOS:
- - => - (hyphen/minus)
- ⌥ - => – (en dash)
- ⇧ ⌥ - => — (em dash)
There are so many of these convenient typographical shortcuts that a long time ago I made Apple layouts for Windows and Linux.
And many are mnemonic too, like:
- of course ÷ (division) is ⌥ / (slash, which is poor man's division)
- of course ¿ is ⇧ ⌥ / because ⇧ / is ? so logically ⇧ ⌥ / is ⌥ ? which is ¿
- guess what ≤ ≥ ± ≠ are
- ¬ (logical negation) is ⌥ L because it's a L sideways
- £ (pound) is ⌥ 3 because ⇧ 3 is # (octothorpe, abused as sharp or pound - the other kind)
I genuinely do not care one tiny bit about doing this right. At all. I will use the minus key for all of these like I always have and nothing bad will ever come of it. Find a better way to channel your limited energy.
0 0 000048 48 H LATIN CAPITAL LETTER H
1 1 00006F 6F o LATIN SMALL LETTER O
2 2 000077 77 w LATIN SMALL LETTER W
3 3 000020 20 SPACE
4 4 000074 74 t LATIN SMALL LETTER T
5 5 00006F 6F o LATIN SMALL LETTER O
6 6 000020 20 SPACE
7 7 000055 55 U LATIN CAPITAL LETTER U
8 8 000073 73 s LATIN SMALL LETTER S
9 9 000065 65 e LATIN SMALL LETTER E
10 10 000020 20 SPACE
11 11 000045 45 E LATIN CAPITAL LETTER E
12 12 00006D 6D m LATIN SMALL LETTER M
13 13 000020 20 SPACE
14 14 000044 44 D LATIN CAPITAL LETTER D
15 15 000061 61 a LATIN SMALL LETTER A
16 16 000073 73 s LATIN SMALL LETTER S
17 17 000068 68 h LATIN SMALL LETTER H
18 18 000065 65 e LATIN SMALL LETTER E
19 19 000073 73 s LATIN SMALL LETTER S
20 20 000020 20 SPACE
21 21 000028 28 ( LEFT PARENTHESIS
22 22 002013 E2 80 93 – EN DASH
23 25 000029 29 ) RIGHT PARENTHESIS
I'm just gonna say it: this does not matter. Just use whatever you want. If you're afraid that someone is going to think less of you for it: the people who matter won't.
The more people proliferate this, the worse it'll be—frankly, we should be embarrassed that societal literacy and writing style knowledge is so poor that we jump to the "must be written by an LLM" conclusion whenever we see any sort of exotic character usage!
2) using them without surrounding thin space or hairspace breaks the horizontal rhythm and draws unnecessary attention to the punctuation; but thin and hair spaces are equally hard to type
3) Most people write markdown with mono space fonts, making these dashes and spaces indistinguishable.
That's actually kind of funny. Looks like it's the result of HN's Unicode filtering rules, though; the original website has different characters in its <title> tag.
I could never remember which was the longer dash. Now it's easy, because the en dash – is the approximate length of a capital N, and a em dash — is the approximate length of a capital M. Today I Learned!
I use the hyphen key, and hit it once for a hyphen or for a minus sign, and I use it twice for an em dash.
At some point, many things I type into started replacing "--" with an em dash, but my precambrian computer typing muscle memory is fine with "hyphenhyphen" meaning "em dash".
I will admit right here in front of god & everybody that I'm pretty sure I've never typed an en dash at all.
Fun fact: In Portuguese, the em dash is often used to introduce direct discourse, much like double quotes are used in English, but only when the direct discourse opens the paragraph. So instead of:
I’m all about spelling things correctly. To, too, two or their, there, they’re matter. But using the correct dash/hyphen is way too pedantic to me. In isolation, I can’t tell the difference between them.
I simply do not care. I will just use - (the one next to zero on the keyboard) everywhere. There are a grand total of zero situations where using one in place of the other hampers information reconstruction or reading comprehension (although the latter is subjective, I suppose)
Is this meaning to grep for a double hyphen from standard in, or to mark the start of positional arguments and then grep for a hyphen? If you want both, it should be:
$ python -m this | grep -- -- -
Which is just beautiful
(Your example causes the last hyphen to be grepped for, which happens to only match doubled-up ones because single ones don't occur in that text. The quotes/apostrophes do nothing because they're parsed by (ba)sh and so only the hyphens are passed to grep, not the quotes. The last hyphen can be omitted because reading from stdin is the default if neither filenames nor recursion options are passed.)
Oh, of course simply quoting it doesn't disable the special meaning of --, because quoting is handled by the shell and argument parsing is handled by the program.
Most people don't use the em dash. It's too hard to type and looks too similar to a hyphen.
As a result, a hallmark of GPT-generated text is its (over)using of the em dash--I have stopped using it for this reason an just use two hyphens now instead.
Not necessarily, I don't consider myself a skilled writer by any means but I use em dashes a great deal.
Em dashes allow me to get multiple ideas into a sentence with comparatve ease and have it still make sense. Otherwise I'd have to add additional sentences to a paragraph which itself has issues. With a longer paragraph one has to worry about its readability and comprehensibility, and that means having to restructure it—remove redundancies, etc.—and that takes time.
Good writers can think ahead and do all that restructuring in their heads. When writing about an idea, concept or logical unit thereof they'll write out short, coherent and readable text all in one go, and it will make sense. I only wish I could do that.
As I see it, em dashes are more a crux for bad writers like me (they allow our text to be at least comprehensible).
Yes, true! I was tired when I clumsily made that point above (I am not a skilled writer).
I learned how to use the em dash properly about 6 months before the release of ChatGPT and then when it was released I realized that it used them all the time. So, to convince people that I both know basic grammar and I am human I started to use "--" instead of "—".
It is usually clear that 2-3 thingies means a range of thingies, but I seem to remember there being situations where it could also have been a minus sign. Perhaps it was with placeholders, where 10-N could be either one. Problem is, iirc, the real minus sign is longer than the hyphen, looking like an en dash (the one meant for ranges) and so it defeats the purpose... hence I totally use hyphens as minus signs, but en dashes for ranges, which makes sense in my head because a range has a certain span/length whereas a minus sign is just a little mark to indicate that something is negative. I see lots of people/software use en dashes for ranges but the existence of a real minus sign is, from my perspective, mostly just noted in typographic resources, so I think this reflects most people's usages (for the people that care for these details)
I do like that the em dash is as long as it feels that broken-off thoughts should be
Not everything has to be functional, sometimes things can also just look nice for the sake of it
The correct minus sign looks a lot clearer than a hyphen-minus when printing out negative numbers, especially at small font sizes. I have in the past written code to convert them.
Here's an easy, if not always precise way to remember:
* Hyphens connect things, such as compound words: double-decker, cut-and-dried, 212-555-5555.
* EN dashes make a range between things: Boston–San Francisco flight, 10–20 years: both connect not only the endpoints, but define that all the space between is included. (Compare the last usage with the phone number example under Hyphens.)
* EM dashes break things, such as sentences or thoughts: 'What the—!'; A paragraph should express one idea—but rules are made to be broken.
Unicode has the original ASCII hyphen-minus (U+002d), as well as a dedicated hyphen (U+2010), other functional hyphens such as soft and non-breaking hyphens, and a dedicated minus sign (U+2212), and some variations of minus such as subscript, superscript, etc.
There's also the figure dash "‒" (U+2012), essentally a hyphen-minus that's the same width as numbers and used aesthetically for typsetting, afaik. And don't overlook two-em-dashes "⸺" and three-em-dashes "⸻" and horizontal bars "―", the latter used like quotation marks!
> EM dashes break things, such as sentences or thoughts
Some style guides recommend "space, en dash, space" for this, and I prefer that myself – mainly because some software doesn't treat em dashes correctly as word separators for double click selection purposes.
For example, I'm pretty sure that at least some Kindle models would highlight both the word before and after the em dash when selecting one of them, which makes using the dictionary very annoying.
It's actually only your post that made me realize people don't normally put spaces around em dash. In French, Russian and a bunch of other languages proper typesetting is to use em dash as a standard dash character, and you always put spaces around them. So I did it in English as well, for many years now.
(I also now looked up and found out that in Spanish, apparently, you are supposed to put space only on one side of the dash, when used as a direct speech separator.)
I also put spaces around em dashes. It looks wrong—subtly wrong—to me to have the words glued together around the dash. It looks right — completely right — to me to have the dash standing on its own, as if it was a word in its own right.
The reason not to do this is observable in your post on my phone. The spaces cause the word wrapping algorithm to leave a dangling dash at the end of the line which looks ugly. Omitting spaces prevents the word break.
I mentioned that as an advantage in one of my other comments. An advantage both ways, because it depends on preference. I have the same preference as hansvm: I would rather see the dangling dash at the end of the line, so I prefer putting spaces around the dashes. Having the entire word-dash-word structure move to the next line feels ugly to me. As with most things, de gustibus non est disputandum. (And also, quidquid Latine dictum sit altum videtur).
It's the dangling dash at the beginning of the line that gets me. I see a lot of word break algorithms, including the one WebKit (and I suspect Blink) uses, which are happy to break "foo—bar" on either side of the em dash.
Funny, I'd rather have the break at the start or end of the emdash-implied break than just before or after it, not having to mentally handle some single dangling word divorced from its compatriots.
> The reason not to do this is observable in your post on my phone. The spaces cause the word wrapping algorithm to leave a dangling dash at the end of the line which looks ugly. Omitting spaces prevents the word break.
That's an interesting practicality but I don't think it's the cause of the rule: The rule probably long predates automated line breaking. Also, I think automatic line breaking will break compound words at the hyphen; it doesn't require spaces (which is also obvious from a software development point of view: the logic is relatively simple either way):
Ironically, on my phone the only line that ends with an em dash has no spaces in it.
If you want to not have a line break, you shouldn't rely on arbitrary behavior. You should use non-breaking characters like non-breaking spaces and word joiners.
Preventing the word break doesn't seem very desirable, especially if it causes a large gap.
Funny—I'm the exact opposite. The extra spaces distract my eyes. To each their own! :)
To each their own: fully agreed, even though our tastes differ. I will mention one advantage of the spaces-around-dashes method: word wrap with default settings will break on the spaces around the dashes so that the entire word one, dash, word two combo doesn't end up pulled onto the next line as a whole unit. Whereas the advantage of the no-spaces method that you prefer is that word wrap will pull the entire word one, dash, word two combo onto the next line as a whole unit.
Why yes, I did list the opposite behavior as an advantage of each. Because that, too, is up to individual preference. :-)
That depends on the layout engine, I believe. Just tried it in Firefox (on macOS; not sure if it uses Core Text or something custom there), and it does sometimes break around the em dash in "foo—bar" style, not just "foo – bar" style.
I've definitely noticed the behavior you describe on some layout engines, too, and it's another reason why I personally prefer "foo – bar" style.
It's not your own. You write mostly for others to read.
P.S. I also prefer smileys with noses, :-), as opposed to the noseless smileys, :), that most people these days seem to prefer. :-)
I've wondered about this for similar reasons. I usually omit the spaces but as I said in an earlier post I'll sometimes include them when I think the typography calls for it or when I want to add extra emphasis.
I've come to the conclusion it boils down to which style manual one follows. I've taken a careful look at numbers of high-end books which no doubt have been carefully typeset and I've found EM dashes with and without spaces.
It seems there is no definitive rule but I might be wrong.
Grammar nasi but isn't it "It looks right — completely right, to me — to have the dash standing on its own"...
people don't normally put spaces around em dash
For what it's worth, I was in the last class in my high school to learn typing on IBM Selectric typewriters. We were taught to type two spaces, two hyphens, then two spaces. Incidentally, we were taught two spaces after periods and colons. To this day, I find it hard to read text that doesn't have proper spacing after periods. (HTML and WYSIWYG word processors handle formatting, but e.g. fixed-font text editors don't)
Its funny that people think that conventions for typewritten text built around the limitations of typewriters define what is “proper” in environments where typewriters and their limitations are not involved.
Yes, this always grinds my gears too. There is already a slightly larger space after periods in contemporary typefaces.
The old typewriter typefaces were monospaced, ie. every character was the same width, but this is no longer the case. Virtually all typefaces today are proportionally spaced, not monospaced. So it’s redundant to leave extra room after periods.
What does this have to do with what I wrote? I said nothing of the sort. In fact, I explicitly pointed out that HTML and WYSIWYG word processors address it automatically.
I was taught that and abandoned it as a pointless anachronism. How often are you reading long form text in a monospace font?
Often enough, thanks.
What is a "standard dash character"? There is no such thing in English; only hyphen, EN dash, EM dash (and some odds and ends).
I grew up in the UK, and have always used space, minus, space.
The first keyboard I used was my dad's typewriter, and I don't recall it having any 'dash' other that the minus sign.
I was under the impression that you do "-" for hyphen, "--" for En dash, and "---" for Em dash. IIRC, LaTeX (or maybe the editor, it has been some time) even helpfully changes that for you to the correct dash.
> I was under the impression that you do "-" for hyphen, "--" for En dash, and "---" for Em dash. IIRC, LaTeX (or maybe the editor, it has been some time) even helpfully changes that for you to the correct dash.
The conversion of '--' to an en dash and '---' to an em dash is done by the TeX compiler, and appears in the rendered file, but I think that most TeX editors don't change the TeX code itself. (This is distinct from XeTeX-based compilers, which can handle non-ASCII Unicode characters like the em dash '—' directly in the source.)
(I think that the article's point is that, in some fonts, -- (two hyphens) is literally the (approximate) size of an em dash, not that it is always understood as meaning an em dash. At least in my font, --- (three hyphens) is far too long to literally look like an em dash:
---
--
—
–
(in order, three hyphens, two hyphens, em dash, en dash).)
Google Docs also does these replacements.
British typesetting style is a little different from US style in the way dashes are presented. In the UK, you might see a thin-space--en-dash---thin-space where a US typesetter would use a em-dash. Typewriter style generally follows books style. Since typesetters no longer use an extra space after punctuation, it's vestigial in typing.
en-US style is a single em-dash. en-GB style is a single en-dash with spaces on either side.
space, minus, space is on the same level as manually typing two spaces after a period
How so? One is the only way to approximate an en or em dash on a typewriter or in a charset that doesn’t have one, the other seems like a workaround of a typesetting bug at best.
-, --, --- is, IIRC, how it is done in LaTex and would be exceedingly simple to do on a typewriter. That being said, to break up sentences I use " -- " because I think it looks nicer than "---". I'll go now ;)
LaTeX is a markup language though, not ASCII art. I can get behind two dashes as a substitute if no en dash is available, but three seems too much and looks like halfway to a horizontal line to me ;)
Until ~10 years ago, I used to type two spaces after a period.
I still do, and I maintain that it’s easier to read text with double spaces after periods.
TeX puts more space after periods/fullstops (which is why you're supposed to do special markup or other measures to mark '.' in the middle of sentences which aren't sentence-enders (e.g. like e.g.)). But it's generally smaller than the equivalent of two manual spaces.
(A nice thing in (La)TeX is that one could follow the "two spaces after a full-stop" rule, which then has the advantage of being an explicit marking for sentence boundaries (which your editor might be able to navigate; Emacs has a convention of assuming two spaces after a sentence-ending '.'), but then the TeX typesetting will take care of making it look right. I lost the habit of actually doing this, for better or worse, except when flycheck/checkdoc/package-linter.el makes me do it for docstrings.)
I used to feel similarly. Now I find the double space a visual distraction that doesn't in any way improve readability.
The effect of the double space is, I suspect, a product of the reader's expectations: if you expect it, its absence creates mental work, detracting from readability; if you don't expect it, its presence is what creates mental work.
I'm still doing it when I am typing at a physical keyboard. Hard habit to break. I learned it so long ago too.
You can tell when I've edited something on both a phone and a physical keyboard, based on the inconsistent use of spaces.
Retraining myself to stop doing double-spaces took maybe a week.
Most word processors can be configured to flag double spaces. That gives feedback to break the habit.
> Some style guides recommend "space, en dash, space" for this
The last paragraph of the article also addressed the subjective nature of spacing around the em dash:
> Spacing around an em dash varies. Most newspapers insert a space before and after the dash, and many popular magazines do the same, but most books and journals omit spacing, closing whatever comes before and after the em dash right up next to it.
As far as the selection detail, did you mean that you replace an em dash used like a comma or parenthesis with spaces and an en dash for specific highlight performance issues? Surely the spaces and an em dash would alleviate the selection highlight behavior and not muddy the waters of when to use an em vs. an en dash?
> Spacing around an em dash varies. Most newspapers insert a space before and after the dash, and many popular magazines do the same, but most books and journals omit spacing, closing whatever comes before and after the em dash right up next to it.
It's funny that they omit to mention the possibility of setting it off with a thin space ' ' or hair space ' ' (those are the thin-space and hair-space Unicode characters, though they show up full width for me), which I thought was preferred typographic practice.
(On Googling, maybe the reason that they don't mention it is that I was imagining it; I can't find any evidence for my belief.)
> those are the thin-space and hair-space Unicode characters, though they show up full width for me
Interestingly, at least in my browser and grabbing the direct link to the comment with curl, show the bytes as 0x20 for both. Perhaps the comment submission handler, or even the browser, collated your more specific U+2009 (thin) and U+200A (hair) spaces into the regular U+0020 space?
> Interestingly, at least in my browser and grabbing the direct link to the comment with curl, show the bytes as 0x20 for both. Perhaps the comment submission handler, or even the browser, collated your more specific U+2009 (thin) and U+200A (hair) spaces into the regular U+0020 space?
Probably! I think HN strips out emoji; maybe it just takes the safest approach and strips out all non-white-listed Unicode.
The AP Style Manual, a/the leading source for US journalism at least, says
Outside of journalism, usually there is no padding, only, I'm with you: For searches, the spaces make the words easier to parse. Those rules predate computers, I would guess.> <word> <dash> <word>
That one I’d usually parse as a hyphen, as in e.g. well-known. “Word space dash space word” is much clearer, in my view.
> The AP Style Manual, a/the leading source for US journalism
One of the things I can easily get away with by not being a US journalist :)
It’s quite hard to mistake an em dash for a hyphen in a proportional font.
self-fulfilling
self—fulfilling
One of these looks very, very wrong.
I agree, although I still prefer spaces between —.
Chicago Manual of Style has no spaces, so there’s some variation at least.
CMOS is not journalism, so it's not variation from the GP?
A wider number of people use either of them. Every place I’ve used used CMOS which I now use with others.
Company I used to work for used AP for things like press releases and, I think, official blog posts and Chicago plus a couple different tech style guides for everything else.
Basically, we didn’t like some things in AP but we wanted to make it easy for journalists to copy/paste.
I have been doing this for purely aesthetic reasons my whole life. Style guides be damned, I hate connected em dashes.
The good thing about style guides is that they’re guides, not laws :)
That’s one thing I really like about English: There’s no central authority decreeing what’s right and what’s wrong top down, and it feels like there is some room for individual preferences and experimentation.
Very refreshing, compared to e.g. German, which has more than one semi-official authority gate keeping “correctness” in speech and writing.
In fairness, especially in the Anglo-Saxon dominated world post-WWII, English was under no threat to be swamped by German or French words.
> Some style guides recommend "space, en dash, space" for this
Which one does that? I threw up a little in my mouth and wish to avoid such style guides in the future!
Better avoid British journalism then, and many other languages on top of that.
It’s very common outside of America, even in English.
https://news.ycombinator.com/item?id=43501482
I'm not a native English speaker, but don't you use the ";" in English ?
To me, it feels like it is the same purpose as the EM dashes.
And I discovered the EM with ChatGPT, I've never seen it before.
A semicolon connects, whereas an em-dash creates more of a pause and therefore separates. In addition, em-dashes can be used in pairs to create a parenthesis, which semicolons can’t. I think with time you will appreciate the difference.
https://thenarrativearc.org/blog/2020/2/4/epic-grammar-battl...
Dashes surround a sub-clause - something like this - which is like a parenthetical addition to a sentence that could stand alone without it; semi-colons (';') connect a further sentence or part of one where perhaps a full-stop and additional word could have been. They also sometimes separate list items following a colon, especially if the things listed are longer sentences perhaps themselves containing commas that'd otherwise be ambiguous.
Em dashes are very similar to semicolons. You use em dashes if your related sentence is in the middle of another sentence, and semicolons if it's at the end.
They're frequently used in skilled and professional grade writing.
So as not to mislead anyone, the parent is mostly incorrect:
Here's an example sentence: Semicolons must have independent clauses—phrases that could form a full sentence on their own—on both sides of them; they are essentially alternatives for periods. Em dashes don't require independent clauses on either side.
In the italicized sentence,
* phrases that could form a full sentence on their own is not an independent clause but is valid between em dashes. on both sides of them, after the em dashes, is also not an independent clause. (The em dashes function like commas or parentheses here.)
* The parts before and after the semicolon are independent clauses. You could replace the semicolon with a period and you'd have perfectly valid grammar. I just chose to connect the two sentences a bit more.
I don't know if you can use em dashes as the parent comment describes, connecting three independent clauses:
* My favorite fruit is peaches—they are very sweet—I eat them all summer.
I think the above is wrong; it should be one of the following:
* My favorite fruit is peaches—they are very sweet—and I eat them all summer.: The last section is a dependent clause made by "and", not an independent clause.
* My favorite fruit is peaches—they are very sweet; I eat them all summer.: One both sides of the semicolon are independent clauses; I could replace the semicolon with a period.
Maybe there are examples I'm not thinking of? I infer that the rule might be that the punctution following the em-dashed clauses should be the punctuation that would have been used without the em-dashed clause, but that's based on very limited evidence.
Many people don't use semicolons (;) in English but many do, and they are certainly part of correct grammar.
Semicolons are generally alternatives to periods, when you want more connection between the two sentences. Like periods, semicolons must have two full sentences—that is, what could be full sentences—on either side of them; the potential 'full sentences' are properly called independent clauses. (A dependent clause needs the rest of the sentence to form valid grammar; it can't function on its own. For example, in this paragraph's first sentence, when you want more connection between the two sentences is a dependent clause. Often they follow commas.)
Another use of semicolons is for lists in a paragraph where one of the list items has a comma in it (similar to the parsing problem for CSVs where some records contain commas): I only like wine; beer, but only ales; and orange juice.
I prefer the dedicated minus (U+2212) over the hyphen-minus (U+002d) for mathematical use because they look different in most font faces.
Are there cases where the dedicated hyphen (U+2010) is preferred over the hyphen-minus?
G. Brandon Robinson swears by U+2010 for hyphens in groff's Unicode output [0], but I see it as a hypercorrection. The most common convention by far (among authors who use Unicode and care about dashes) is to use U+002D for hyphens and U+2212 for minus signs. Not even the Unicode Consortium uses U+2010 for hyphens in its documents, and I'm not aware of any major organization that does.
As far as appearance goes, almost all fonts I've looked at make U+2010 identical to U+002D (i.e., they don't put any 'minus' into the 'hyphen-minus'), but a few make U+2010 a smidgeon shorter.
[0] https://news.ycombinator.com/item?id=38121765
Edit: G. Branden Robinson (note spelling) is the maintainer of groff.
https://www.gnu.org/software/groff/
Intl.NumberFormat also prefers it, but then you can't paste negative numbers into most financial software, calculators, spreadsheets. Even back into inputs on the same webpage, if it does custom number parsing. Even though <input type=number> accepts U+2212 as a minus, it turns it into a regular minus when you spin it down to -2.
It looks much better though and more visible: −1 vs -1. I wish hyphen was a separate symbol from the ascii start, or that monospace fonts didn't tend to shorten "-" cause it makes little sense in monospace anyway.
It has two potential benefits:
— In the context of automatic text processing, it unambiguously indicates the function of a hyphen, as opposed to a minus
— Fonts can choose to make the hyphen-minus a bit wider than a regular hyphen, to accommodate the usage as a minus sign. In that case, U+2010 would be typographically more appropriate for a hyphen, similar to how U+2212 usually is typographically more appropriate for a minus sign.
Visual style of hyphen-minus depends on font. Some fonts displays it more like a minus, others like a hyphen. So if you care about distinguishing hyphen and minus, it makes sense to use dedicated hyphen and minus, and do not use hyphen-minus at all.
A regular hyphen arguably looks better when used as a hyphen and not a minus.
> Unicode has the original ASCII hyphen-minus (U+002d), as well as a dedicated hyphen (U+2010), other functional hyphens…
Which can be fun when parsing CSV files from various sources. I've hit numbers with U2010 or others where you would expect a hyphen-minus should be. Presumably someone² has copied a negative number from a document where one of the alternate symbols was used, and pasted it into everyone's favourite data-mangler¹ which interpreted it as a string, and so on down the chain.
--------
[1] Excel. Sometimes a joy, sometimes the bane of my existence.
[2] It is surprising, horrifying even, how much manual manipulation of data goes on in banking, where you might naturally assume everything is more automated these days. Sometimes a laborious manual process done regularly is seen as cheaper than paying for it to be automated…
The em dash is now a GPT-ism and is not advisable unless you want people to think your writing is the output of a LLM.
My advise is to take pleasure and have confidence in good writing, over misspent energy worrying about things like this.
If you practice your skills, you will reap the rewards.
The letter 'm' is now a GPT-ism and is not advisable unless you want people to think your writing is the output of a LLM.
No, thanks—I’ll keep using them as I always have.
Someone else said the same. How can that be when most word processors, and at least some phone keyboards, automatically insert em dashes?
It's infuriating that people are drawing this conclusion. LLMs pick up on em dash usage because professional and skilled writers use em dashes. They're a consistently useful, if niche, part of the literary toolkit.
But, no, now it's a problem because the majority of people's experience with writing is graded essays. And because LLMs emulate professionals, it's now a red flag if students write too much like professionals. What a joke.
Emily Dickinson wept—
Ha, good point, and an interesting question: What kinds of dashes did Dickinson intend?
It's a hard one to answer: We could look at published Emily Dickinson books from the time, but did Dickinson really pay that close attention to or have that much control over the type?
We could look at Dickinson's actual personal documents, but if they were handewritten, distinguishing dashes could be difficult even if there was intention there.
Fortunately we have troves of her handwritten documents; all of her poems were first printed posthumously. To me, she's using the punctuation as pacing or tonal markers as opposed to ligatures ("I'll clutch— and clutch— " vs "I'll clutch-and clutch-"). Many publishers style these marks as longer than normal m-dashes for that reason, which makes sense seeing as they are rarely used as asides.
I interpret her marks—
as breathless pauses—
that— having no unicode—
should be given to m—
and space—
https://www.edickinson.org/editions/2/image_sets/12170035
Em-dashes have been the norm in every Dickinson poem I read, and I think it might have derived from the preferences of Victorian publishers, who I understand loved those long dashes.
Great comment. Thank you!
I imagine it would have been up to the typesetter to make the call. The conventions for dash usage are fairly straightforward. You use em-dashes for asides, en dashes for ranges, and hyphens for most other cases. Its easy to figure out the right character from context (apart from en ranges vs hyphen ranges).
I had a quick search, attempting to find a great author who hated em dashes and preferred the vastly superior en dash. I found nothing.
This list of authors punctuation quirks is interesting though.
https://lithub.com/the-punctuation-marks-loved-and-hated-by-...
You want Robert Bringhurst, poet and typographic nerd. He gives them special withering attention in his Elements of Typographic Style. I think he referred to them as Victorian excrescences?
Recently ran into this. Didn't realize it was that obvious.
And you'd better not 'delve' into anything
[dead]
EN dashes are also great for date ranges: 1/1/2025–3/28/2025
You are right of course
However this is the kind of rule that "existed" for a while and most likely will go away as most people can't be bothered with the difference and it all looks similar anyway
Or maybe who knows, it will keep going on because chatgpt knows it
"There's also the figure dash…"
Re last paragraph: dashes, etc. are confusing for perhaps most of us who aren't, say, typesetters, myself included. I use EM dashes a lot usually without a space between words and sometimes with spaces when I think the typography calls for it—or for extra emphasis.
Essentially, most of us guess the rules and often this doesn't matter much but it can in certain circumstances.
For example, in say machine conversion/transliteration. The ASCII dash is often used as a substitute for Unicode minus sign because it's easy to select [it's my usual practice], and anyway many don't know there is an actual difference. Whilst a human will usually know the difference by its use or context a machine may take the literal interpretation which could lead to say a numerical calculation error.
This problem has annoyed me for a long while. Why is it that wordprocessors and editors do not highlight these characters and query whether the usage is correct? Surely this ought not to be that difficult.
Another example is Roman numerals. The average person will enter say an uppercase 'I' for the Roman numeral one. Here's a typical example which is incorrect:
WWII
Here I entered the normal ASCII 'I' because it was too involved to find the correct Unicode character for Roman numeral one.
I'd like to know what others who are in typography, machine learning etc. think about this, and why WP programs and editors don't have simple ergonomics that allow for easy selection of the correct character.
† On a related matter, you'll note I've used single quotes whereas mmooss uses double quotes. This tell me that mmooss is likely in the US whereas I'm not. Again, this is not really a major problem for humans but it can be in transliteration, etc. Also, it's unclear (at least to me) what the default is for quoting quotes, i.e.: "" versus "' (right, I've refrained from using triple quotes).
Again, this seems country specific with I believe the US favoring double followed by single. Even when these rules are defined do people strictly adhere to them?
I've always wanted an array or object with range keys like: arr[0–2] = 123; if(arr[1.5555]>122){}
That doesn't seem to be an array at all, if the idea is to check whether a number is within a range. Seems like an interesting data type though, a combination of a range data type and a map/associative array.
I was thinking of a sparse array but any name will do. obj[~42] ?
One may have a bunch of key ranges each associated with a value or one may have a key that should be "rounded" to the nearest key or retreave the one below or above it.
It feels like something basic enough to have in a language and I found it oddly complicated to write myself. Comparing it with all values doesn't seem like a very good solution.
Not that I know many languages.
In Python it’s a colon.
Nice, covers at least some of the abstraction "problem".
A Figure Dash is perfect for phone numbers (especially when working with tabular numbers).
Also, not to be confused with "一", which is a different thing entirely……
This one is U+4E00, CJK Unified Ideograph-4E00. So it's a common character between Chinese, Japanese, and Korean. This should be "one" in all three. And it does technically look a little different than a dash: https://unicodeplus.com/U+4E00
And this is different from Japanese's chuuonpu (U+30FC) which is a vowel elongation mark, and it's rendered horizontally or vertically depending on whether the text direction is horizontal or vertical, respectively.
ー
AFAIK most computer keyboards don't have em dashes. Rather than hit ALT+0151 every time, I've always just strung along two hyphens, like: --
Absolutely proper and correct use of em dashes, en dashes, and hyphens is, to me, the most obvious tell of the LLM writer. In fact, I think that you can use it to date internet writing in general. For it seems to me that real em dashes were uncommon pre-2022.
This test feels biased by the fact that, like others have said, macOS provides keyboard shortcuts. For example, I'm only Gen Z and yet have tried for many years to use the proper dash characters in the right places, which is made much easier by virtue of being on a Mac.
Of course, I guess it's entirely possible—even accounting for OS—that this test remains statistically useful. It makes me kinda sad that my (very much human-generated) writing fails the Turing test....
The compose key, for those who use it, also makes it very easy to do em/en dashes, and I use them quite regularly as a result.
Came to say this as well. I use the compose key to write em dashes and other symbols on a daily basis. Very handy!
`misc:typo` is easier
Windows does too now via Windows+. which opens the "emoji keyboard" but you can switch to the "symbols" tab to see unicode. It does have multiple dashes in the quick access bar at the top or you can search.
I've used WinCompose¹² to add key composition to Windows for many years (after discovering the concept in Unix-land), which I still find more convenient than the other options I've tried (including the Windows Emoji keyboard).
----
[1] https://wincompose.info/
[2] Though having checked just now, the sequences for en-dash and em-dash don't seem to be working. Perhaps one of my custom macros is interfering somehow… (it is behaving overall, ellipsis just worked as did the following diacritic and other symbols: áèîöūñ±⁰¹²∞¡¿‽π⬚). I'll have to poke at it later and see what is ary.
That has nothing to do with being on a Mac. Em-dashes and the compose-key work fine on Linux, and Android has them under the '-' of the on-screen keyboard when long-pressed.
(Windows probably has some way, but those are rarely discoverable.)
I disagree, there is absolutely no easy way to do it on Windows. You can install a third party program that emulates the compose key but on macos it "just works". And I think that makes a difference for 95% of users
Install PowerToys, hold dash and then press space. This works for all the variants for any keyboard character.
I've always (well...for 20 years) done a Google search for "em-dash" then copy/paste the character off whatever result page come up. Word and other fancy editors always provided a popup pane where these characters could be clicked to insert.
It's a bit funny. On macOS en and em dashes can be natively typed with alt+- and alt+shift+-. The responses to your comment are apparently suggesting these methods are just as easy as that:
1. Install and configure this extra tool, which also by default enables a ton of other things you may not want, and may as well be a third-party tool even though it's technically built by Microsoft
2. Do a Google search and copy-paste (!)
3. Use a keyboard shortcut to bring up a symbol picker, then click on the tab containing the en and em dashes, then click to type them in
I mean, come on.
yeah, this is exactly my point haha. these are not at all the same
Hit Windows+. click on the "Symbols" tab and they're right there under general punctuation.
Released back in 2019 for Windows 10.
That's true, I do use them a lot on iOS as well—similarly, it's a long-press on '-' to get an en or em dash.
EURKEY layout in particular has them easily accessible.
On iPhone, type two hyphens to make an em dash:
-- into —
If OP wrote their post on an iPhone, they would have inadvertently appeared as an LLM by their own test.
You can also hold the hyphen key to select an en dash.
Does that become an en dash if it's between two numbers?
It does indeed! One of my favourite iOS keyboard features.
I hate that this feature doesn't have a timeout, so when you want to type "--" you have to "- -" and then go back and delete the space. You can't just wait as with double-space vs space-wait-space. It can be turned off, but that turns off other locale-based punctuation like quotes.
Just install a proper keyboard layout with proper typography support once.
It is maddening that the whole world uses typewriter keyboards with some facelift in the era of Unicode and even blasphemous full color emoji font rendering. What has changed in decades? Windows logo key, power keys, media keys, IE and Outlook logo keys — all Microsoft's fancies.
So initially IBM made some ad hoc decisions on what keys would be suitable for a single user office computer (as opposed to data input and admin terminals they had). Then everyone copied that, because sending unexpected scan codes could lead to bad things (random BIOS and program code couldn't care less about your ideas of forward compatibility). Then Windows became the “basic system” installed on most computers. Microsoft really pushed forward the internationalisation at the time, making a lot of national layouts and code pages (sometimes contradicting the national standards, for better or for worse). Then everyone copied what they decided. What's more important, even single byte code pages had the basic typographic symbols, anyone could've been using them for three decades, but they were not added to most physical keyboard layouts.
I wonder if that was because they wanted Word to seem more sophisticated than it was, and to make people think it was a requirement for “proper documents”, or because programmers still treated all non-ASCII symbols as free data markup constants that would “never appear in a regular text”.
> So initially IBM made some ad hoc decisions on what keys would be suitable for a single user office computer
Didn't it match ASCII and possibly typewriter keyboards?
ASR-33 right?
Alt+hyphen or alt+shift+hyphen is an endash/emdash. You may not have been aware of it because it's so subtle, but many people (including myself) used emdashes long before 2022
(edit: apparently only on Mac, see reply below)
I believe that's only on MacOS.
I think Microsoft Office (maybe jiat Word, but definitely not Windows) has a similar default shortcut.
You don't need a shortcut on Word.
You just type two hyphens (--) and Word will convert it to an em dash.
Across the Office suite:
Typing <word><hyphenminus><hyphenminus><word><space> yields an em dash.
Typing <word><space><hyphenminus><hyphenminus><space><word><space> yields an en dash.
That this has been true for some 3 or 4 decades makes me doubt all the comments that em dashes are a "tell" of LLM authorship. On the other hand, I guess when we confine this possibility to web content, I can see how people haven't used Office for web authoring lately, and whatever they do use (like web-based content management systems) don't tend to have this feature.
> Typing <word><space><hyphenminus><hyphenminus><space><word><space> yields an en dash.
More importantly, typing just a single hyphen minus in this constellation triggers the autoreplace, too. (Typing the double hyphen is only necessary without spaces in order to distinguish between an intentional hyphen and an em dash.)
Good point. Either way, it's kind of peculiar that getting an en dash in this manner demands flanking the hyphen(s) with spaces, and those spaces persist after replacement, when the typical usage of an en dash specifically doesn't demand spaces.
From TFA:
> August 1–August 31
From a top comment:
> Boston–San Francisco flight, 10–20 years
To achieve this using the replacement feature we're talking about would take something like <word><space><hyphenminus><space><word><space><alt+leftarrow><bksp><leftarrow><bksp><alt+rightarrow> which is ridiculous.
In professional typesetting, like a book, I sometimes see spaces flanking an em dash, however.
I can't get this to work in Powerpoint. It's funny, I clicked on this thread because I was struggling with trying to make an "emdash" in Powerpoint yesterday and couldn't find the correct search term for the "long hyphen" that I was looking for.
Works fine for me on PowerPoint for Mac, oddly enough. Unrelatedly, Mac also allows easy (non-alt-code) keyboard entry: option-hyphen yields an en dash, while option-shift-hyphen yields an em dash.
Turns into different things (like a bulleted list) in different situations in Word, though.
Seems like you're correct. Interesting!
That's one of my favorite features of macOS keyboard layouts, but it's so close to one of my least favorite ones – option + space inserting a non-breaking space.
I almost never want that, and when typing "space, en dash, space", it happens quite easily and is usually impossible to tell visually.
You always want a non-breaking space before a dash.
How so? Wouldn't this prevent line breaks around dashes bracketing parenthetical statements? That's the opposite of what I want!
Line starting with a dash can be mistaken for dialogue or dashed list, which is not what you want.
In any case: non-breaking space (or otherwise suppressed linebreak, if you don’t use a space) is the rule.
Works here on Linux too, so not just Macs.
I've been Googling "em dash" and copypasting from the Google results for a solid 15 years now. Long before LLMs.
I modify the keymap to use AltGr+dash as em dash. Very easy in Linux with xmodmap, bit more complicated in Windows with the Keyboard Layout Creator.
`misc:typo` has been in xkb for about 15 years. There's also xkb-birman (matching the current state of the project that inspired all of it). If your national layout does not have level 3 and 4 symbols set, those should work straight away. If it does, it is highly likely that they clash, so you need to create a suitable subset. It is highly advised to find like-minded people, discuss the best options, and then gently push the result to upstream to make it available for everyone. After all, it's Linux, if you won't do it, no one will.
Just use the Raycast emojipicker, it's very good. Better and faster than the macOS one
Certain corners of the world have absolutely cared about and employed the proper use of all the “dashes” well before but all the way up to 2022. I’d imagine LLMs have just consumed some of that material.
Pretty much everything professionally edit and typeset does, and those will generally be retained in Unicode text (obviously, not if it gets converted to ASCII). It’s less common in internet fora because not all users either know the use of dashes or have easy access to them on the devices they are using, and if its not both familiar and easy, people are going to skip it in quick messages.
I used to intern for a literary magazine and I can confirm that half my copy-editing was enforcing proper use of em-dashes. This was well before 2022.
I always use an em dash when possible when I should, and double en dash when I can't, just because I'm that kind of nerd. But it is the case that a double en dash on iOS autocorrects to an em dash, so I'm suspicious of the claim that em dashes are a tell for LLM writing.
Most editors should auto changes a double dash into em dash. I thing Google Docs does for example.
Why not a double hyphen, which has the same result?
Not in all fonts. In most monospace fonts, two hyphens will show with a small gap between them, for example.
I also personally prefer en dashes, surrounded by whitespace on both sides, over em dashes. Apparently some WYSIWYG software interprets two hyphens as an em dash, while other will interpret that as an en dash, so I'd rather just use the real thing if possible to avoid the ambiguity.
In Linux/Xorg with a compose/multi-key one can do:
<Multi_key> <minus> <minus> <period> : "–" U2013 # EN DASH
<Multi_key> <minus> <minus> <minus> : "—" U2014 # EM DASH
More in /usr/share/X11/locale/en_US.UTF-8/Compose
I've tried to use real hyphens and dashes since learning a bit about typography roughly 10–15 years ago. macOS makes it really easy with just alt and hyphen for en-dash, shift+alt and hyphen for em-dash. Definitely not an "obvious tell" of an LLM!
Thanks for the '⇧⌥<dash>' tip— from 2022–2025, I have been using macOS en's thinking they were em's.
(Side note: GTP says apostrophes should be used for pluralizing only for single letters to avoid confusion, but this seems more readable than "ens and ems" IMO.)
I recently got accused of using AI for some writing I submitted because I regularly use both en-dashes and em-dashes, and have for years. I said in another thread recently they are second and third, to semi-colons, as my favourite punctuation marks.
I was able to demonstrate my long use of them, prior to LLMs. And since I write in quarto markdown I don't need keyboard shortcuts.
The engineers of various AIs are probably reading your comment and making adjustments.
Or we are both just AIs, as a portion of HN comments are, commenting back and forth about other AIs.
As are super pedantic humans.
Em- and en-dashes have been well-supported by LaTeX, the smartypants family of Markdown extensions, and plain HTML for more than 20 years.
In your support, though, calling the extension “smartypants” really hints at the target audience :)
Automatic conversions have been happening for a long time. In fact, a few years ago there was some combination of settings on my terminal locale settings and man (well, troff/groff most likely) was converting hyphens in param definitions to some sort of dash character, meaning I couldn't copy and paste out of the man page. I think it also affected perldoc for the same reason.
I don't doubt there are publishing platforms that do it automatically as well, so I wouldn't count on seeing them as an indicator of generated output, even if it may be processed in some manner.
This is because the original was written using the wrong markup. When the output was ascii, nobody noticed, but it matters when the output is unicode.
That's revisionism. It was considered correct historically, before someone decided to unilaterally declare all existing man pages "wrong".
It's like we spent twenty years writing (mindlessly copying) web pages with &mdash and only viewing them with lynx, and then somebody makes a graphical browser and the mistake is apparent, but I don't think the browser is in the wrong.
Context: https://lists.debian.org/debian-devel/2023/10/msg00085.html
Money quote:
While this is true, this is an amazingly silly omission.
Serbian and Croatian XKB keyboard layouts have had em- and en-dashes since early 2000s even if they were not standardized: AltGr (right Alt) + hyphen (to the left of right Shift) produces an em-dash, and press Shift on top, and you get an en-dash.
This is how long I've had them easily accessible on any keyboard (I even have them converted to MacOS keyboard layouts for use with Karabiner).
http://srpski.org/dunav/raspored-c.html
It's — in HTML or Markdown.
If you use eg. a Japanese IME, you can also get it by typing a normal hyphen and selecting the em dash from the picker.
The lack of em dash usage in popular culture speaks more about typical people than it does about whether a text's author was an LLM. In fact, the average person has never even noticed—let alone considered—that the em dash exists. If they've read for 20+ years, they've seen at LEAST hundreds of them.
Imagine being an NPC (a human bot), flattering yourself with the thought that people who understand the language are language bots...
21% of adults in the US are illiterate in 2024 and 54% of adults have a literacy below a 6th-grade level[1]. “The average person” isn’t really a high bar, unfortunately.
1: https://www.thenationalliteracyinstitute.com/post/literacy-s...
Is this a legitimate institute? The linked article offers many statistics and even financial figures but cites no sources or studies. There is a “TOLL FREE” (capitalized) phone number in the website footer, and the comments are full of prostitution ads.
Wikipedia[1] contains more links to data, if you want to sample a few more sources.
1: https://en.wikipedia.org/wiki/Literacy_in_the_United_States
Not at all. It's just inconvenient for most of the Windows-using world, as the characters are not accessible. It's ALT+[whatever] or Google-it-and-ctrl+V. Hence an awful lot of internet writing didn't really use any of that stuff properly.
See, e.g., Boss Szabo's blog: https://unenumerated.blogspot.com/2018/03/the-many-tradition...
Two chained hypens, as was pretty much the norm back then.
And did you just call me an NPC?!? It's not a matter of "understanding the language" at all. It's a matter of convenience and of a sort of evolved convention.
On mac it's very easy to get an em-dash, just alt+shift+`-`. Though I do concur that it's more likely to come from an LLM, I don't think it should be considered a tell — I find it more of a predictor of the writer's age.
That’s interesting to note. I have usually taken the time to properly use en-dashes when it seems appropriate because I frequently deal with strings that represent academic years. At least where I live, these span two calendar years. I have noticed that a lot of college websites tend to use the en-dash properly (e.g. on their academic calendar webpages).
Just configure something like RightAlt to work as a compose key:
Compose--- produces —
Compose--. produces –
Lots of other characters like áăǎ°±€ are available through compose: https://whynothugo.nl/journal/2024/07/12/typing-non-english-...
> Absolutely proper and correct use of em dashes, en dashes, and hyphens is, to me, the most obvious tell of the LLM writer.
Or just someone who likes to use the right characters. There was a report a few months back about how writing from autistic kids keeps getting mislabelled as LLM simply because they use the correct specific terms.
Please stop associating being precise with being an LLM.
(La)TeX would typeset -- as an en dash. --- gets you an em dash.
I, of course, used proper dashes in typeset documents, at least after I'd learnt about them in Knuth's The TeXbook. I have found myself occasionally use them in ASCII contexts just as ---. But I've never sought out the proper unicode character.
The Mac has had them as part of the standard keyboard layout since 1984. Using Apple kit since then, they have long been burned into my muscle memory:
option-[-] for en dash –
shift-option-[-] for em dash —
The option key is IMHO the most underrated feature of the Mac platform. Having another modifier for character input is insanely handy, and I know where to find numerous characters like trademark™, divide (÷), pound (£), degrees (°), pi (π) and so on.
Someone should parse HN api and figure out total dash usage and see if there is a spike in recent times aha
I write poems a fair bit and use em dash a lot. (maybe too much and incorrectly)
If em dashes were uncommon pre-2022, they wouldn't have ended up in the LLM training sets.
I use --- to represent em dash in prose here, e.g. [1][2]. The behavior is just a residual of long time exposure to TeX.
[1] https://news.ycombinator.com/item?id=41833665
[2] https://news.ycombinator.com/item?id=41774199
Iphones will autocorrect two hyphens to an em dash
What i've been using: Install https://github.com/samhocevar/wincompose and you can then press AltGr then three hyphens to insert one. or if you're on Linux just search for "compose key".
More sophisticated clients require we use dashes correctly. I first encountered it pre-pandemic, so in professional contexts it's not a sure-fire signal of LLM use — Should you see em dashes correctly used in the Hacker News comments or Reddit, for that matter, then it's pretty reliable tell... Usually. ;)
I'd like to have the record show that I've been using them since before LLMs :)
Not sure when I started; my guess is that I got into the habit of using them in LaTeX when writing my thesis, and then at some point realized that they are easily reachable on standard macOS keyboard layouts (via "option" + "-").
As I mentioned above, I've had them easily accessible with a keyboard layout for >20 years on all the systems I've used — the only caveat that I find it really ugly with no spaces around em-dashes, which is usually recommended for English.
[dead]
My LLM prompts all have “don’t use em dashes or semicolons ever” when I send the output to someone else. ;)
I get not using em/en dashes, but semicolons don't really have an alternative in many cases (other than rephrasing), do they?
Usually I split it into two sentences, but yes. I don’t really see semicolons used in most business communications, so I treat them as a tell that the text was generated by LLM. Maybe I’m over reacting and prejudiced against semicolon usage.
Most word processing applications auto-substitute EM dashes as appropriate - some do it for two consecutive hyphens, iirc. I don't know if they substitute EN dashes automatically ... I don't know if there's a logic for that without understanding the text.
I've been using real em- and en-dashes for decades, in more or less the way M-W describes. MacOS and iOS make it easy to do, and growing up Mac kindled a life of typographical nerdage.
"Windows" + "." brings up symbols, and at the very top were em dashes. I've been using that since it was added.
On my Linux laptop, I confess to manually Googling them every time.
I use a compose key on Linux to write those. By default you should have these compositions available: --- → — || --. → –
I wrote for a magazine during college days a few decades ago that uses the Chicago manual of style. I still use em dashes, en dashes, and hyphens regularly. They don't show up as such in markdown, but they are effectively: one dash for hyphen, two for em-dash, and one with spaces surrounding it for en dash.
On Macs:
Hyphen -: -
En Dash –: alt -
Em Dash —: alt shift -
The default US English Mac keyboard is so extremely good, and has been the way it is for so long, that I remain baffled that other platforms haven't simply copied it. I came to it relatively late in life and it's one of the reasons I wish I'd started using Macs sooner.
It's pretty decent but the fact that I can't type an arbitrary unicode character has been a huge annoyance of mine since I switched from Windows/WSL to Mac.
They have shortcuts for Í, Î, and Ï but not for many commonly used characters like arrows
You can add the "Unicode Hex Input" keyboard layout, which lets you enter BMP characters by holding down Option and entering its code point in hex (similar to the hex entry on Windows). Expanding the Emoji & Symbols pane minitech mentioned also lets you browse by category (e.g. arrows), and you can customise the categories and add a full Unicode character picker (not limited to BMP like the Windows Character Map) there as well.
It's very easy¹ on MacOS to make yourself a custom layout with the characters you commonly use. Personally, I put arrows on ⌥⇧HJKL, vi-style.² (Doing so for Linux is a little more work, as xkb is more complicated and less capable.)
¹ https://software.sil.org/ukelele/
² https://codeberg.org/datatravelandexperiments/kps-keyboard-l...
Aside from the solutions other people have mentioned, if you have often-used symbols, you can set up a text replacement in keyboard settings. For instance, I have :x: for the multiplication sign.
Control+Command+Space or Fn+E or Edit > Emoji & Symbols if you know the character’s name. It’s not very convenient for repeated use, but it gets the job done in a pinch.
Yeah it's not great. Edit isn't always there. Fn+E seems to make the most sense. I've heard about ctrl+cmd+space but commonly forget it. Both of those open the same GUI which combines emojis, stickers, and unicode symbols—preferring the first two categories over the last. To type out a unicode symbol it takes at least three clicks on top of me starting to type in the name of my symbol
sigh
Thanks for the suggestions
> Edit isn't always there. Fn+E seems to make the most sense. I've heard about ctrl+cmd+space but commonly forget it.
You can remap Fn/Globe directly to it if you want. It's also accessible from the Input menu bar item if you show that.
> Both of those open the same GUI which combines emojis, stickers, and unicode symbols—preferring the first two categories over the last. To type out a unicode symbol it takes at least three clicks on top of me starting to type in the name of my symbol
Are you using the expanded Character Viewer window[0], or the default collapsed Emoji & Symbols pane[1]? Because the expanded Character Viewer lets you customise and reorder the categories[2] (though that doesn't affect search), including adding a full Unicode view[3]. And they both default to the search bar when opened (though the Character Viewer opens unfocused for some reason).
[0]: https://imgur.com/hTtrbcA
[1]: https://imgur.com/3L31DQu
[2]: https://imgur.com/Ch1PI5L
[3]: https://imgur.com/epayzwe
This specific key combination is not US keyboard specific. I like how they managed to group characters that are formally similar by binding them to the same keys.
Examples:
en and em are on -
Below are maybe Swiss specific?
~ is on N
@ is on G
| and \ and / are on 7
√ is on V
¥ is on Y and € is on E
∑ on W ( ∑ is a rotated W :)
etc.
Yeah, mostly the same on my US keyboard, except a couple like "@" (that's shift-2 on basically all US keyboards, and is printed on the key) and |/\, which are more prominent on US keyboards (two simply have their own keys, no shift modifier, even). I get the © symbol for option+g (which still kind-of makes sense!)
I appreciate that the designer of the layout clearly attempted to make some kind of mnemonic connection to the degree they could. Makes it easier to discover and remember the key-combos, even without a cheat sheet.
Ah! © is on C (makes sense!)
That's c-cedille here, because to write English fluently you need to be able to type French loan words like façade—but not quite so often as someone in Switzerland, probably (especially so in some parts of the country!) so I assume you've got it somewhere even more prominent on your keyboard.
Except for international where € is opt-shift-2 (next to the pound/hash), next the to dollar
modifiers:
opt-e+letter é (acute/aigu)
opt-`+letter è (grave)
opt-i+letter û (circumflex)
opt-u+letter ü (umlaut)
opt-n+letter ñ (for the mañana)
my favorite example of this is ellipsis … opt-; (the key with the colon over the semicolon is sort of a rotated ellipsis)
thank you for teaching me √
Word and Outlook have replaced "hyphenhyphen" with an Em dash for decades.
Or, I mean, it does SOMETHING. I've never checked, and just always assumed I was getting the em dash.
Compose - - - works for M-Dash (KDE / Linux).
For other combos — see /usr/share/X11/locale/en_US.UTF-8/Compose
See also: System Settings > Keyboard > Key Bindings > Position of Compose key
It's pretty bonkers (and mildly depressing, really) to imply that correct grammar and usage is a reason to accuse someone of using an LLM.
I mean if it's an obvious break from their normal style, sure. But by itself? Every time I hear this argument, it just seems like sour grapes from poor writers.
I have typed Alt+0151 almost every day for decades—and now with some annoyance I am limiting their use due to the "that's how LLMs write."
For a while, em dashes were really popular among LLM enthusiasts because of the idea that it would encourage the LLM to draw from training data that contained em dashes—which typically were higher quality training data written by a professional writer or somebody with a professional editor. Subjectively, I think it worked. I suspect that the LLMs trained to be used as chatbots were finetuned to use the em dash liberally for that reason. Now, after a few generations of these models, I think that the em dash is starting to have the effect of drawing from "slop" training data that was written by other LLMs rather than well-written human data.
I disagree—LLMs don't use them properly. They always put a space between the words before – and after – the dashed part.
Using spaces is not wrong. Typographically, a hair space or another thinner than usual space is usually used, but in plain text a space is often preferred. Style guides vary of opinion on this, but newspapers often space them. Without a space they end up looking like elongated hyphens joining the words on both sides. That's not their function.
> Using spaces is not wrong.
Its not wrong for en-dashes (and en-dash set open—with space on either side—is generally an alternative to an em-dash set closed.) And its not wrong on the trailing side of an em-dash used in dialogue to show an abrupt stop mid-sentence if the stop is followed by a new sentence. And there's a few other particular uses, but, generally, setting an em-dash open is wrong.
> but newspapers often space them.
I've never seen a newspaper set em-dashes open, but I have seen them use en-dashes set open instead of using em-dashes at all. Given the space premium in print newspapers, em-dashes set open, which would consume enormous horizontal space, would, other concerns aside, be an odd choice.
This is US vs British English. You will struggle to find an em dash in any British publication.
My comment was about spaces specifically. The Guardian and the BBC use en-dashes instead of em-dashes, but both do so with spaces.
I'm married to an editor and friends with an editor at work. They both use em dashes appropriately—even with informal writing. I've now learned the keyboard shortcut just to confuse people in the age of AI slop.
As a diligent user of ALT+0151 for many years on Windoes systems, I can contradict that it is a sign of LLM writing — perhaps in combination with other factors it can be used to increase the likelihood of LLM authorship, but alone, nope.
A few years back a journal editor maticulously reviewed all dashes in our manuscript and pointed out places where em dashes should have been used. Since then I started noticing different dashes everywhere around the internet.
Most obvious tell of the former/current Stripe employee, imho.
What's the significance of Stripe here?
I refuse to care about this. A single dash is all I will ever use. I see no possible reason to use the other two.
That's the comment I was looking for to rally behind. I use the same character `-` for all purposes: minus, hyphen, em/en dash. It's easy to type and it makes practically no difference in meaning or legibility. I refuse to waste my time differentiating between multiple variations of a short horizontal line with a few pixels more or less. Ain't nobody got time for that.
By the same logic, why bother with capital letters then?
Legibility. Same with punctuation marks.
Throwing my hat in here. The sub millimeter difference in the length of a dash conveys no additional meaning or clarity. It is impossible to argue me out of this position.
It's not like you can reliably write these consistently by hand either without going over the top in length to make it extremely obvious.
Here's some examples where the en dash could make things more clear:
-5--2°C
post-war-pre-digital era
See sections 10-O-15-Q
Try Our New York-London Flight Connection!
-5°C to -2°C
post-war - pre-digital era (not a sentence any sane person would use anyway).
See sections 10-O - 15-Q
Try our New York-London flight connection! (no kind of dash clears this one up without fixing capitalisation).
The last one was a gotcha: it's their newly established York–London flight!
Try Our New York–London Flight Connection.
Or if it was New York:
Try Our New York – London Flight Connection.
Note the additional spaces. Agree on the capitalization though.
> Try Our New York – London Flight Connection.
I'd wager serious money that if you put that on a sign and surveyed people, at least in the US, they'd all still conclude it is a "New York" to "London" flight.
What's the use of a communication tool, if it doesn't actually communicate anything to real people?
York doesn't have an active airport
In my region at least, -5 ~ -2°C, or -5°C ~ -2°C. If the something is making people confuse, we replace it with a suitable substitution. Re-educating people is really just last resort. Is there anything keeping us from changing it other than ego?
Have you heard of "to"?
Sorry, lol? You didn't really think this through. This is what that looks like using en/em
-5—2
That looks like dogshit.
It's a mistake in the first place to decide to use only dashes and no spaces to convey all of this lol
-5 - 2 (Everyone knows a sign has no space - if you are building your sign for idiots try some of these:)
-5 > 2 -5->2 -5 <-> 2 -5 to 2 -5...2 Between -5 and 2
blah blah blah
-5 - 2°C
Em dashes don’t convey much meaning or clarity for me.
Rather, seeing too short of a dash is like putting two clashing colors together or wearing two pieces of clothes that don’t match. It just looks instantly off.
It’s just not aesthetically pleasing for me.
Length of breath/pause with a longer dash. Read some -- Emily Dickinson poems – you'll find a world ––– of meaning ––– in the millimeter.
Poetry routines breaks grammar rules. A lot of poems rely on very specific white space layouts that you'd never see in writing.
And your example shows how you can just use multiple dashes instead of having three different ones.
I have read her in the past and can't say there were world's of meaning between -'s. Can you link an example? I looked again and couldn't see any obvious ones. Generally she just completely abused the -. Does she even use a comma once? lol
worlds. world's would indicate that a world owns something.
Also, you can just write -s instead of -'s as the apostrophe indicates possession
Exactly the type of comment I'd expect to see on an HN discussion about different types of dashes.
This sort of anti-intellectualism is the perfect antidote for those who claim that improper grammar is nothing more than evidence of language "evolving."
I think many grammar rules are not intellectual but just randomly evolved conventions.
E.g. some English language rule says that a comma or ending period of a non-quoted sentence goes inside the quotes if there's something quoted at the end of that sentence. That rule feels anti-intellectual to me, as if there's some misunderstanding of how hierarchical placement in one-dimensional space works (since something that's not being quoted is being put inside quotes)
Spelling used to be more fluid and up to the writer/printer. Printers would also use different spellings as a mechanism to change the line width and otherwise format text to their liking.
https://www.ruf.rice.edu/~kemmer/Histengl/spelling.html
That "rule" is the rule in America but not elsewhere. Please break it. It is stupid.
What is more intellectual about wanting to complicate the language for one reason, versus wanting to simplify it for another?
I was going to post basically this. There is only one dash, and it's the one for which my keyboard has a key. Minus sign, hyphen, or any other use case. When MS word autocorrects to something else, I always angrily undo it, because I don't know or care what it's doing.
-proud dash luddite
I don’t care about the length of the mark, but I did find this idea useful. Prone to excessive detail, I often find myself with a parenthetical inside of parenthetical. The developer in me insists on 2 closing parentheses. But it looks weird and nerdy. Although, using an em dash instead is probably just as nerdy.
> Dashes are used inside parentheses, and vice versa, to indicate parenthetical material within parenthetical material. ...
> The bakery’s reputation for scrumptious goods (ambrosial, even—each item was surely fit for gods) spread far and wide.
Long live the parenthetical!
I wish it was more popular, it neatly indicates meaning so very well.
This is coming from someone who can only speak English: what a stupid language. How is having 3 symbols that are discernible only by their, almost identical, length a good idea? How would one grade a paper for correct usage, especially if handwritten?
I agree with you completely.
En dashes, I'll grant you, are pointless. Those can go away.
However, em dashes are a different case. The main reason why it's desirable to use em dashes (beside convention) is for clarity of purpose. The hyphen is already a very overloaded character; they're extensively used to denote ranges and link compound words. Importantly, both of those usages do not correspond to pauses in spoken language. If you're voicing a hyphen you're supposed to barrel on through it. An em dash is much closer to a parenthesis, comma, or semicolon. It's a meaningful break in the sentence, in the way that a hyphen isn't.
Now, if it were up to me I'd choose a different character to replace em dashes (maybe underscores), but that's a separate argument.
Just use two dashes. Or like you said, use parentheses, commas, or semi-colons
Two dashes are fine, the other options have different literary functions than em dashes, and shouldn't generally be used as replacements.
I take this advice like "do not use a preposition to end a sentence with" and "pay close attention to 'much' and 'many'". Personal preferences from the 1800s taken as gospel by grammatical extremists, to the point where they're taken as some kind of solid rule in a vain attempt to forcefully shape language to a personal preference.
There are cases when you want to follow certain guidelines, for sure. If you write for a publication that adheres to Meriam-Webster, you'd better stay consistent and figure out the right AltGr code to type the right dashes. However, for the 99.99% of written media today, none of that matters.
"Much" and "many" are not interchangeable:
"I have too many water in the cup."
"How much people are in attendance?"
These sound obviously incorrect.
> Personal preferences from the 1800s taken as gospel by grammatical extremists, to the point where they're taken as some kind of solid rule in a vain attempt to forcefully shape language to a personal preference.
This is also true of "less" and "fewer". I use "less" everywhere.
Ending sentences with prepositions is and had always been fine. It has never been a serious rule of grammar that you may not end a sentence with a preposition. It does sometimes make a sentence sound better to rewrite it so that it doesn't end with one though. For example, "do not use a preposition to end a sentence with" sounds awkward to my ears, probably because you deliberately crafted the sentence to end with a preposition even though that is not naturally what you'd end that sentence with. (The previous sentence doesn't sound awkward to me, interestingly.)
Getting "much" and "many" right is completely different. They mean different things. Confusing them makes you sound stupid. Less vs fewer is the same. It often doesn't matter but in some cases it really grates on the ears (eg "there wasnt much people there" just sounds awful).
Dashes are not in the same category. They are orthographical conventions. They aren't really grammar. They are more like spelling. You can spell things wrong and say it doesn't matter because spelling is arbitrary and you can use the wrong dashes too, but it makes you look either uncaring or ignorant. If you want to give a good first impression, learn the basic conventions of written English and follow them.
Real monsters use a signle dash but with a wider font.
me too, do not think it makes a different in actual writing, like handwriting.
And that is why noone will remember your name.
Yeah, trying to get people to take Em vs En vs Hyphen seriously is a fool's errand. Only typography nerds would take it seriously and there just aren't enough of them to make a difference. I'd guess that the vast majority of people have never even heard of these distinctions.
uh, really?
i really like using em dashes -- for some reason, it feels "better" in my head than using something like a comma or a semi-colon.
Then why didn't you use an em dash?
you can use double dashes to symbolize an em-dash (i prefer to using an actual em-dash, which is option+shift+- in macos).
https://en.wikipedia.org/wiki/Dash#Approximating_the_em_dash...
i refuse to care about this lowercase letters are all i will ever use i see no possible reason to use the other symbols
Suit yourself, but if you refuse to learn basic grammar you will be treated like you are stupid and uneducated. Like it or not, presentation matters. Getting the basics right, including things like spelling, grammar, etc, shows a basic attention to detail without which your services will likely do more harm than good.
> etc,
actually it's "etc."
(I wouldn't usually be a pedant, but if you think the difference between "--" and "—" matters, you should probably try to get the basics right too.)
Wrong. Look at any dictionary. Etc is completely fine. What next, are you going to pretend you write N.A.S.A. or Mr. White? Come on
https://www.merriam-webster.com/dictionary/etc. - even the URL has the period, and I did in fact look this up before replying :)
https://www.merriam-webster.com/dictionary/etc even redirects to the correct URL with a "."
Merriam-Webster is an American dictionary and therefore totally irrelevant to me.
You said "look at any dictionary", so I did. I notice you can't provide a link to a single dictionary that supports you, or even name what dialect supposedly doesn't have "etc."
Etc. is an abbreviation for etcetera. Correctly signifying contractions, abbreviations, and acronyms is far more commonplace than using the correct dash. Almost everyone would have learned about shortening words in high school; many people leave university without ever having heard of an em dash.
Etc is also an abbreviation of et cetera. Only Americans put pointless dots everywhere.
This is all stuff you learn in school. Punctuation isn't obscure or niche. You may not have learnt about semicolons or em dashes in school but you should have and I did. As did anyone that has ever read a novel. There are two semicolons on the first page of the first Harry Potter book, a novel read by approximately every child of my generation. There are loads of examples of the proper use of dashes and other "obscure" punctuation marks in any professionally typeset text.
> Only Americans
I was raised and educated in Africa, specifically the GCSE curriculum. I was taught to use etc.
>Mr. White
As opposed to what, exactly?
Mr White, which is correct English. I believe Americans might put a dot after these abbreviations, but nobody else does.
I prefer &c
The various dashes are not "basic grammar" they are for pedants to argue amongst one another while the rest of the world just gets thing done.
Robert Bringhurst¹ prefers the en dash in the context of setting off phrases:
"The em dash is the nineteenth-century standard, still prescribed in many editorial style books, but the em dash is too long for use with the best text faces. Like the oversized space between sentences, it belongs to the padded and corseted aesthetic of Victorian typography.
"Used as a phrase marker – thus – the en dash is set with a normal word space either side."
¹https://archive.org/details/isbn_9780881791327/page/80/mode/...
Presently re-reading this book, The Elements of Typographic Style. It’s one of the few books I’ve gone out of my way to get a physical copy of – it’s just beautiful.
And I totally agree, space-set en dashes are vastly superior to em. I dislike the way it connects the word more closely to the word in the next clause than the phrase itself.
E.g. He left—no explanation. Vs. He left – no explanation.
To me, left—no feels like a weird gluing together than a separator for a different section.
Because I am exactly the kind of person to obsess about this sort of thing, when I was working on my last book, I spent a lot of time deciding how I wanted to style dashed subordinate clauses.
Personally, I think en dashes are too small and look like a mistaken use of a hyphen. I really only use them in their Chicago Manual of Style recommended uses like date ranges.
But I agree that em dashes without spaces around them look wrong. They glue the adjoining words together when the whole point is that the clause is secondary and should be set aside from the surrounding text.
I ended up using em dashes with a little blob of CSS to put a tiny amount of space on either side.
"Used as a phrase marker – thus – the en dash is set with a normal word space either side."
"Used as a phrase marker—thus—the em dash is set without normal word spaces."
>the em dash is too long for use
above, the em-dash without spaces is smaller, at least in this typeface
I've taken to using dash offsets—just as an aside—in many places were I formerly used parentheses; I find it "less interrupts" the flow of the sentence.
I think of that as “British” style (as opposed to American). I think it’s more common here and I certainly prefer it
So much this. Two weeks ago I learned that en dashes are used for numbers, but I thought they are what em dashes are for. Em dashes for me are too long and ugly.
That's how you use them in Germany. N-dash with spaces around, instead of an m-dash, as Americans do.
Mr Bringhurst is wrong. Em dashes have nothing to do with Victorian aesthetics.
Additionally:
* Use the minus sign /−/ (U+2212) when formatting numbers, because the default hyphen-minus /-/ (U+2D) just looks wrong: "It is −1 °C vs. -1 °C." Moreover, the correct minus has the same width as plus (− vs. +).
* Rare, but use the figure dash /‒/ (U+2012) or figure space / / (U+2007) if you need a placeholder character that is the same width as a single digit. For example, "Guess the PIN: 1‒34."
Somewhat off topic, however, I'm thoroughly convinced that there is a very high probability something is AI generated when I see Em dashes. Anyone else noticing this?
ChatGPT for example almost always uses them. I'm sure they are more common in academic writing, but its now super common on boards like Reddit.
I've been employing em-dashes extensively since I went on a JD Salinger binge circa 2002. Also, "incidentally", for the same reason. I use "Nb" a lot, from reading a bunch of DFW years ago. Oh, and that very-precise construction he does with "which" all the time, I stole that.
Before LLMs, I think em-dashes mostly signaled that you read books and paid attention to details, to the extent they signaled anything.
To generalize your point: A lot of the "brown m&ms" that we've walked around with for detecting a writers status, education, etc., are less useful in an age of LLMs.[1]
We might even be entering some waves of counter-signaling.
[1] They'll never totally nail all of DFW's mannerisms, though.
What is this very precise construction?
Something like, “the monks wore brown habits, which habits were made from wool”.
The slight ambiguity if you don’t do that now irks me, having seen a way to eliminate it.
So you're saying that when you see an Em dash in someone's prose, it's a big minus?
As I said in another comment, it depends highly on the context and previous / alternative knowledge of the source.
(How about when you see a pun in an HN thread?)
:)
It’s largely the Baader-Meinhof phenomenon. You’ve started noticing it because you just learned about it.
I feel this is an broad oversimplification.
When looking at the context of a given text, use of certain words or punctuation, can very well indicate AI use.
The "original" example was delve. There is no doubt that AI (did, or still does) use this word at a significantly higher frequency than the average person. I would say the same about em dashes.
When browsing a Reddit thread about a video game, if you encounter numerous comments written perfectly, especially those containing indicators like em dashes, the word delve, or similar language, it certainly can raise the question: am I genuinely seeing comments from users who write this way in this specific context, or is this content more likely produced by an LLM?
It sucks that people understanding their own language marks them as possibly AI.
No, it's not. AI uses em dashes far more frequently than the average human.
Why is this getting downvoted? ChatGPT is completely obsessed with em dashes. I don't even know how to make it on my keyboard.
Yeah, people are saying "well you didn't know about em dashes before LLMs".
No, I learned about em dashes in school, I just literally don't know how to type them on my keyboard and I'm too stubborn to learn how to.
It depends. Em dashes in news articles and written publications? Definitely expected. Em dashes on social media or reddit? Either someone who works in typesetting, or an LLM. Most likely an LLM, giving the dying nature of printed media.
Only typography nerds and professional printers care about things like these. Popular media, even modern professional media, hasn't been paying all that much attention.
Plausible. But apparently per TFA it's actually spelled Baader–Meinhof, with an en-dash not a hyphen.
yep. been using them for years. others have too. it’s not weird
same thing happened with “delve” — these are just words and grammar, people use them
there is no accurate way to tell whether text came out of a neural network or not
I’m not sure the same happened with “delve.” I saw an analysis of paper abstracts showing a clear uptick of “delve” starting with the mass-adoption of ChatGPT. Maybe it suddenly became a trendy word — especially in paper abstracts — or maybe more paper abstracts were edited by ChatGPT.
Combining the various "tells" of an LLM (em dashes, delve, grammatical signs etc) with the context (Reddit comments vs professional setting), you could establish a rough probability it was AI generated. At this point, it's the best we can hope for.
Gemini is in love with the phrase "It's important to..."
Whenever I see that at the start of a paragraph I know that there's an 80% chance it was written by Gemini.
There are regular folk who tend to be pedantic with their writing. I'm not sure this is a good test of whether text is generated by LLM. Consider that some may use LLMs to correct spelling or grammar, and the LLMs may often edit an en dash to em dash.
To be clear, It's essentially impossible to know if a given text is autonomously LLM generated (a bot on social media for example) or is the result of revision of real human effort.
To what extent that distinction matters, I'm not sure.
I've encountered and used em dashes regularly for the last 20 years. If most of your reading and writing are associated with social media, I could see the trend you're describing appearing real within that limited context. But em dashes are not new and have been a feature of high quality writing for many decades.
Yes, several of the most popular (and even lesser-popular but newly open-sourced models such as Gemma 3 27b) overuse Em dashes. Even when prompting them to not use dashes, they almost can't help themselves and include them occasionally anyways as it must be part of their learned stylometry. It's just not a common symbol to use at all as most people generally use commas for the same purpose. I can't even remember learning about Em dashes in my college english classes.
I submitted an application which I typeset using LaTeX, and some people thought it was AI-generated because of en and em dashes. I have been using these since forever.
If it's posted through a publishing platform (not just a commend on one or on a public site), it's very possible they do an automatic conversion of some of the common cases. That could also be filtering down to comment boxes and stuff, I'm not sure.
That's not to say that generated content doesn't use them, just that using them as an indicator might require a bit of nuance based on where you're seeing them.
I’ve noticed this, too. ChatGPT especially overuses them relative to other models. It’s an easy tell-sign that something is probably LLM-written.
I saw a reel the other day where some Young People(tm) were talking about "the ChatGPT hyphen" (an em-dash.) There was much wailing and gnashing of (false) teeth from Old People(tm) in the comments.
There is a special kind of irony in the fact that habits that used to set one apart from the unwashed masses (like the proper use of punctuation) now serve as a signal for being non-human.
Everyone I know that writes a lot, especially for copy or product design, seems to use em dashes more heavily. I've even seen a Drake format meme where he is shaking his head at parantheses, commas, and colons but—finally—nodding in approval at the em dash.
I wonder if it's a more recent phenomenon.
Em and en dash usage is officially part of style guides such as The Chicago Manual of Style [1], so it's often a work requirement for many writers and editors to use them in writing. This is why these kinds of dashes are everywhere in newspaper and magazine articles.
Eventually, people learn to include them out of habit—especially as most people see them as aesthetically nicer than a simple hyphen (-).
[1] https://www.chicagomanualofstyle.org/qanda/data/faq/topics/H...
Exactly. If I see an Em/En dash in a publication of really any kind, I don't think twice. Because that's the traditional context for them. Professional writing.
Yep, definitely been noticing it, especially on Reddit. It almost always makes me navigate away from the post, unless the author mentions that they’re using AI.
I’m bored with y’alls keyboard habits.
Not all though. Many people on HN use em-dashes and other proper punctuation.
Hold on, I'm coming back to this thread, I think I've cracked it guys. Some real alpha for you right here:
If the em dash has spaces around it -- as seen in AP style -- it was probably written by a real human, because that's how it comes out most conveniently on a word processor.
But if the em dash has no spaces around it--Chicago style--there's a good chance you're looking at LLM slop.
I saw this comment a day ago but it only clicked today. The way we tell it's AI is the use of too formal grammar. I think that means they now pass the Turing test. Or at most a hair's breadth from passing.
The only people still using em-dashes are those who think it's somehow a signal of high intellect rather than being (extremely) behind the times. Case in point: this exact comment section where you see it with ~10000x the frequency of standard human writing, or even the average HN thread.
Just makes me roll my eyes really seeing a human use an em-dash. We've in the age of informality, and at least for me personally I've definitely filed the em-dash away as "a near guarantee the text was written by a machine". No matter how much and perhaps especially because HN commentators are coming out of the woodworks to insist they've been using it daily for years.
This level of thinly veiled insecurity is just projection on your part.
Maybe you're projecting? Not everyone has an agenda beyond just thinking it looks good.
Yes! It's a tell-tale sign something is written by AI.
it is not
Today in “typesetting before we had typewriters”: …
At least we have dedicated O/0, and l/1 keys now. But we still see a lot of "straight" quotes instead of “those smart quotes Microsoft Word likes to generate”. And dashes. Did you know there is a dedicated ellipsis character? This is often set with slightly more space between dots than ..., and it by definition never wraps across a line between those dots. You still see (C) instead of ©.
It is one of those things that doesn’t really matter for readability, but although they can’t necessarily put a finger on why, people may still notice that some documents or pages appear to be set with more care for details than others.
(edit: I guess if you don’t have to search on Google what the hell a ‘Microsoft Word’ is, then you’re officially old)
> dedicated O/0, and l/1 keys now
And the 1 and 8 aren't next to each other anymore, either. (See typewriters from the "18"00s.)
> those smart quotes
Fixing straight quotes is a hard problem[0]. My FOSS text editor, KeenWrite[1], includes my library, KeenQuotes[2], for replacing them at build time. It's not perfect, but can typeset my ~400 page novel without any errors.
> Did you know there is a dedicated ellipsis character?
Yes! Here's where it gets parsed:
https://gitlab.com/DaveJarvis/KeenQuotes/-/blob/main/src/mai...
Then emitted:
https://gitlab.com/DaveJarvis/KeenQuotes/-/blob/main/src/mai...
Then transformed into an HTML entity:
https://gitlab.com/DaveJarvis/KeenQuotes/-/blob/main/src/mai...
When typesetting Markdown, KeenWrite first converts the document to XHTML (i.e., XML), then invokes ConTeXt to convert XML into TeX macros. One of those macros handles the ellipses by converting it to \dots{}:
https://gitlab.com/DaveJarvis/keenwrite-themes/-/blob/main/x...
This renders as the Unicode character in the final document: …
> set with more care for details
Some of us old folks care about these details. ;-)
[0]: https://stackoverflow.com/a/73466438/59087
[1]: https://keenwrite.com/
[2]: https://whitemagicsoftware.com/keenquotes
People have approximated ellipsis by using `. . .`.
I use ellipsis. Which ironically is way too short when viewed in monotype…
I use ellipses & dashes… perhaps the former will convince people I am human.
I hate smart quotes because it's super weird to use the «French» and „German“ quotation marks.
for em dashes and ellipsis at least it's trivial to convert before displaying them... which I do in my own markdown-to-publication toolchain (but not here on HN).
Em dashes without surrounding spaces is such a ugly relic that triggers me to no end and is objectively wrong. The dash object is part of the sentence — not the two words it's separating.
I agree, this bugs me too.
Using em-dash with spaces takes up way too much space. Use an en-dash then instead.
The perfect way to surround with hairspace.
> spans pages 128–34.
Who omits the 1 from the second number?! That is aweful!
Who keeps the 1?
You write pages 1,003–4, instead of typing out 1,003–1,004 which is just unnecessary.
Works the same with two digits, or even three: pp. 1,899–902.
This is standard practice and arguably clearer.
I've only ever seen it done with page ranges, though. I'm not sure if it's done with year ranges? E.g. 1984–5? Or 1989–92? You work with page ranges constantly in academia, I just don't see year ranges much in any form.
Literally never seen this (wish I could grep all comments I've ever replied to) and I do not understand what makes you say that it's clearer when it's dropping information, making it relative rather than a fully qualified number
In speech, it's common, and misunderstandings are usually not a problem (if you're not monologuing on a recording) because someone will just ask; but in writing it looks like the range is the wrong way around. Maybe I expect more care in writing because the feedback loop is longer, or maybe it's just habit and I think it's wrong in writing because I never see it?
I think you're just not used to it.
Quick, tell me how wide this range is, just as an order of magnitude:
285368737954–285368783645
Would be a lot easier if I only included the range at the end which had actually changed, wouldn't it?
That's why it's clearer. Now obviously that was an extreme example, but it's also easier to see at a glance that 1,387–9 is just three pages, as opposed to 1,387–1,389.
If you format your numbers properly, you get "285,368,737,954–285,368,783,645"
That's a change of about 50K, which isn't really that hard to notice.
"285368737954-83645" is... well I have to assume somewhere in the 10-100K range? Hold on a second while I line up the digits again... uh... let me rewrite that to "37,954 - 83,645", okay now I can read it. No, that wasn't any easier. I kept getting lost tracking where in the first number I was leaving off. Much easier to compare 737 vs 783 - digit groupings are really useful!
(I'll agree that 1387-9 is pretty reasonable, it just breaks down the longer the number is. Also, if the page count is important, you can just say "1387-1389 (3 pages)". This feels like the sort of shorthand you used to get on Twitter)
>"285368737954-83645" is... well I have to assume somewhere in the 10-100K range?
83645 is five digits, so certainly in the ~10,000 range.
Thus why I have to assume it's somewhere between 10K and 100K, yes :)
Taken to an extreme without formatting, sure, but what ranges have that many digits in human-readable situations? And if there are those exception situations, you can word around it for that case ("285368760800±45691" or "45'691 years after 285'368'737'954")
Genuinely trying to think of an examples, since e.g. books aren't ever that long and search results don't have that many pages (that you'd all read and refer back to). A salary range, perhaps, can get into the seven digits in extreme cases (not that you care about any individual digit when you make a lifetime's worth of money in a bit more than a year): "Prospective salary is 2'423'000 to 2'432'000" seems to convey the relevant info as well as "Prospective salary is 2'423'000 to 9'000" does (except that I wouldn't understand the latter and ask what this second number means, but that's plausibly attributable to me as an individual not being used to it)
MLA-style citations call for abbreviating page ranges in that way. I mostly see it in literary papers, and not many other contexts, so it would be easy to notice them rarely if at all. Outside of that context, I occasionally see it used for year ranges.
It's definitely standard, but in what way is it clearer? An abbreviation is never more clear than the full thing it abbreviates.
EDIT: I saw your explanation below, and you make a very good point.
copy/paste, "print", paste in from page, to to page
Result:
> print pages in range from: 1, 003
> print pages in range to 4
Now have I have two errors to fix: page 1003 to page 1004. Not nice. Who formats like this?!
-------------------
Also, some RPG books or encyclopedias I own have chapter that span like this:
p. 630 to p. 70 (book 2)
To me, now is unclear, is that 70 with a reset page count, or 670 for book 2?
Since I just now learned that a quotation standard somewhere outside Germany exists that omits leading numbers, I now need to manually check where it ends.
TL;DR:
Don't make me think, and allow for automation. So just write on more number.
closest thing we have on hn to being a reddit like comment/remark lol
When I was editing an academic book published by a well-known university press, we were all asked to do that for the references. (And my colleagues, all doctors and lawyers, only knew Word and entered the references manually.)
What if it's 124 to 127? would you really type 124–127, or 124–7?
> would you really type 124–127?
Yes, every time. The clarity for the reader is more important than the time I save by leaving out '12'.
> would you really type 124–127
literally yes
The latter, I believe.
I read Butterick's Hyphens and dashes some years ago and it stuck with me. Now I regularly use hyphens, en dashes, and em dashes correctly—I even memorized the Unicode sequences and enter them seamlessly on Linux with Ctrl-Shift-U!
https://practicaltypography.com/hyphens-and-dashes.html
Came here to post the same link! That book is wonderfully opinionated and has helped clarify some typographic concepts for me
We need a blog post documenting the ironic trend of people—themselves NPCs, actual human bots, just now realizing the em dash exists despite seeing it hundreds if not thousands of times before LLMs—flattering themselves by suggesting that anyone who understands the language at above a 5th grade level must be an LLM.
Taking knowledge of the three extra pixels that are "more correct" as some kind of indicator of intelligence is silly. Pretending you're somehow above them is just sad.
Must be lonely at the top.
This thread is rampant with anti-intellectualism that deserves to be called out.
You aren't special for using em dashes, and it doesn't make someone an NPC to notice that AIs frequently make use of them.
The comment above is not about being special, it is about proper typography that is still everywhere around us: books, serious websites, anything done by real designers. Those people had to try hard to miss all of that.
No, it is not “politically incorrect” to call people lacking curiosity and/or education like you see them.
No, someone's personal preferences or transitory fashions are not automatically promoted to the holy reference for the whole world.
One point that is very rarely mentioned is how to place em dashes around quotations marks.
If the em dash indicates an interruption (not a planned pause) of the actual speech, the em dashes go inside the quotes (often just one, before the closing quote).
If the em dash is the narrator interjecting with additional information, the em dashes go outside the quotes.
Besides this, the question of where to put spaces when multiple forms of punctuation are combined can be quite a complex topic.
this is the definiton of bikeshedding
No it isn't.
I use em dashes all the time in writing, but unfortunately ChatGPT and co. use the em dash frequently—and most people use the em dash infrequently, not knowing how to type it on a keyboard—so it's starting to make my writing look AI-generated sometimes. I fear it'll have to go the way of words like "tapestry."
FWIW, you can type an em dash on Mac with shift + option + hyphen.
I use them as well. For blog posts I suppose I'll need to switch to regular hyphens lest people think all my writing is LLM-spam.
That said, I don't even think you need the [shift] for em dash on Mac – just [option] + [hyphen] works for me.
That's an en-dash
Oh yeah, true. They look the exact same in the HN editor haha
Embrace the confusion with the AI—it’s a sign of progress!
Three^W Four top-level comments so far with this concern. Nice try AIs but I won’t downgrade my writing.
I like em dashes and use “Option Shift -” to summon them on macOS. However, LLMs tend to overuse them and compose absurdly long sentences. While proofreading a draft, I often instruct an LLM to “keep the original tone intact and don’t create overly complex sentences by fusing together simple ones.” That usually gets the job done.
Writers adores their em dashes. While they can sometimes clarify a concept by adding more context, overusing them can hurt readability. I prefer to read Hemingway-esque sentences that just say what they want to say and end sharply. So that’s how I write too—and sometimes the overuse of em dashes directly conflicts with that, making the content sound as if the author is confused about what they wanted to convey.
If you are looking for alternative to kebab case to write identifier in programming language which reserve the - (U+002d) as an operator, chances are good you can use · (U+00B7 · MIDDLE DOT), that we use in middot case.
So isMorePleasantToRead, is_more_pleasant_to_read or is·more·pleasant·to·read is up to you.
But how pleasant is it to write?
On the bépo layout that I use, extremely well, as it sits between ’ (U+2019 ’ RIGHT SINGLE QUOTATION MARK) and ‑ (U+2011 ‑ NON-BREAKING HYPHEN), each being generated by altgr+shift and x . and k (which are all on the opposite side of the keyboard compared to altgr key).
At least from the point of view of digital gymnastic, it’s not really any worst than camel or snake cases, though direct access to dash could be said to give a small facilitation for input in kebab case.
So it really depends on the keyboard layout used (or whatever input device facility is used). What’s you favorite input method lately? Does it really doesn’t provide a convenient way to input more than ASCII visible glyphs?
Plus, let’s be honest, identifiers are generally written in full expanse only once, then autocompletion is going to do it for us. And we all know we spend more time reading identifiers than declaring new ones.
This is intriguing to me, do you know which (programming) languages tolerate this?
Python
C C++ Ruby Javascript Rust Go throw an invalid character U+00B7 '·' in identifierJava throw error: illegal character: '\u00b7'
C# is really annoyed with it apparently:
Program.cs(1,60): error CS1056: Unexpected character `·' Program.cs(1,60): error CS1525: Unexpected symbol `identifier', expecting `,', `;', or `=' Program.cs(1,99): error CS1056: Unexpected character `·' Program.cs(1,99): error CS1525: Unexpected symbol `identifier'That’s it for the top in TIOB index I tested in the frame of this message.
Thank you very much for testing it! I'm plugging away on Advent of Code 2015 in C, I'll give this a go to see if I like it
The reason this works in Rust is that Rust follows Unicode's categorization of which code points are useful as identifiers: https://www.unicode.org/reports/tr31/
MIDDLE DOT is Other_ID_Continue
I know less about the other languages but it wouldn't surprise me if they did similar things.
I use em-dashes correctly because a reader emailed me, and I was dreadfully embarrassed. You can actually see them become correct in my writing after the "I will pile drive you" AI thing.
It never occurred to me that doing this correctly might make people think I use LLMs in my writing.
Edit: I'm sure the many typos protect me from that, actually.
> If you want to be official about things, use the en dash to replace a hyphen in compound adjectives when at least one of the elements is a two-word compound.
How is a literal dictionary making fun of people who "wanna be official about things" lol. That's the entire basis for dictionaries themselves
It's Merriam-Webster - they are descriptivist rather than prescriptivist about language. They don't define correct usage per se, but rather document actual usage, though some usage may be given greater weight than others.
In this case, they are calling out the prescriptivist definition but are implying that it may be overkill and offering the more commonly used alternative.
I had one minor quarrel with this article: The use of spaces (of any kind) before and after the em dash or any dashes.
Personally, I am fond of using either a hair space or a thin space before and after the em dash. Not a full space!
To explore the various options, I wrote a little program to print the various combinations of dashes and spaces. I think what looks best depends a lot on what typeface you're using. But let's see how they look in the Verdana font used here. You should be able to paste this into your favorite word processor to see it in other fonts:
ASCII 0x2D hyphen-with no spaces
ASCII 0x2D hyphen - with U+200A hair spaces
ASCII 0x2D hyphen - with U+2009 thin spaces
ASCII 0x2D hyphen - with 0x20 full spaces
Unicode U+2010 hyphen‐with no spaces
Unicode U+2010 hyphen ‐ with U+200A hair spaces
Unicode U+2010 hyphen ‐ with U+2009 thin spaces
Unicode U+2010 hyphen ‐ with 0x20 full spaces
Unicode U+2013 en dash–with no spaces
Unicode U+2013 en dash – with U+200A hair spaces
Unicode U+2013 en dash – with U+2009 thin spaces
Unicode U+2013 en dash – with 0x20 full spaces
Unicode U+2014 em dash—with no spaces
Unicode U+2014 em dash — with U+200A hair spaces
Unicode U+2014 em dash — with U+2009 thin spaces
Unicode U+2014 em dash — with 0x20 full spaces
It looks like HN is really mangling this. Hair spaces are rendered wider than thin spaces?
If anyone wants to experiment, here is the Python code:
If you're on Windows, install PowerToys, and check out the KeyBoard manager. It lets you set up shortcuts. I overload my keys using right alt for greek letters. (science stuff). Could do it for these dashes as well.
Dupe—https://news.ycombinator.com/item?id=43447819 ("How to use an en-dash and em-dash correctly?", 43 comments)
More interestingly it is the same highly niche subject from two different websites in two days. HN is...different
This isn't niche, it is the sort of thing every child used to have drilled into them in primary school.
Yeah exactly, used to. It's niche now, and has been for decades.
It isn't niche just because education has taken a nosedive.
I used a lot of these, but actually stopped due to my text sometimes being called out as chatgpt output. I also thorw in the occasional spelling mistake. If a piece of text on reddit/x has "–" (not "-") in it, you can be 95% sure it's an LLM.
That is an interesting observation. I wonder what percentage of the training text data for LLMs contains proper dashes, since a large part of it is user-generated content.
All self-respecting journalistic outlets use proper symbols. Where does the LLM get their opinions on “foreign affairs” from? Probably from the likes of New York Times like a standard lib...
And it shouldn’t be hard for an LLM to learn to use proper symbols when synthesizing content from the everyman. It’s not like it works on the level of literal copy and paste.
For Windows users, PowerToys has a Quick Accent tool, that lets you type in an em dash or figure dash by holding down the hyphen (-) and then toggling the space bar. Interestingly, the en dash is not available.
> The en dash is the least loved of all; it’s not easily rendered by the average keyboard user (one has to select it as a special character, whereas the em dash can be conjured with two hyphens)
on macOS:
- - => - (hyphen/minus)
- ⌥ - => – (en dash)
- ⇧ ⌥ - => — (em dash)
There are so many of these convenient typographical shortcuts that a long time ago I made Apple layouts for Windows and Linux.
And many are mnemonic too, like:
- of course ÷ (division) is ⌥ / (slash, which is poor man's division)
- of course ¿ is ⇧ ⌥ / because ⇧ / is ? so logically ⇧ ⌥ / is ⌥ ? which is ¿
- guess what ≤ ≥ ± ≠ are
- ¬ (logical negation) is ⌥ L because it's a L sideways
- £ (pound) is ⌥ 3 because ⇧ 3 is # (octothorpe, abused as sharp or pound - the other kind)
For anyone finding em-dashes too small, behold the majesty of U+2E3B, the triple-em dash: ⸻
I genuinely do not care one tiny bit about doing this right. At all. I will use the minus key for all of these like I always have and nothing bad will ever come of it. Find a better way to channel your limited energy.
That's a problem on the HN side only, not in the article
I'm just gonna say it: this does not matter. Just use whatever you want. If you're afraid that someone is going to think less of you for it: the people who matter won't.
For those who downvoted this - how does a millimeter of difference in the length of a line matter?
Well-meaning can vary if you don't put spaces around your dashes, and a well—meaning writer wants to ease the job of the reader.
ıt might simpıy not matter though, a miııimeter here and there, ı suppose.
The difference in dash length really doesn't matter and your example is not the same at all, but it probably made you feel really smart.
Did you mix those up on purpose?
Note also that the "hyphen" on your keyboard is actually a "hyphen-minus". Unicode provides separate characters for hyphen (‐) and minus (−).
The only problem with correctly using the Em or En dash is that people will automatically assume the text was written by an LLM -_-
The more people proliferate this, the worse it'll be—frankly, we should be embarrassed that societal literacy and writing style knowledge is so poor that we jump to the "must be written by an LLM" conclusion whenever we see any sort of exotic character usage!
Yeah, I've stopped using them because of this reason.
My therapist: "homoglyph and punycode attacks are made up term by computer people to justify their paycheck".
Also Merriam-Webster:
The problem with en and em dash is that
1) they are too hard to type.
2) using them without surrounding thin space or hairspace breaks the horizontal rhythm and draws unnecessary attention to the punctuation; but thin and hair spaces are equally hard to type
3) Most people write markdown with mono space fonts, making these dashes and spaces indistinguishable.
if anyone's wondering, the post title is wrong -- both of the first two characters are en dashes (U+2013).
That's actually kind of funny. Looks like it's the result of HN's Unicode filtering rules, though; the original website has different characters in its <title> tag.
I could never remember which was the longer dash. Now it's easy, because the en dash – is the approximate length of a capital N, and a em dash — is the approximate length of a capital M. Today I Learned!
I use the hyphen key, and hit it once for a hyphen or for a minus sign, and I use it twice for an em dash.
At some point, many things I type into started replacing "--" with an em dash, but my precambrian computer typing muscle memory is fine with "hyphenhyphen" meaning "em dash".
I will admit right here in front of god & everybody that I'm pretty sure I've never typed an en dash at all.
Hyphens - I'm normal, breaking up thoughts. En / Em - I'm an AI or I'm using AP style guide to write articles.
In that case, I guess I must be an AI—I use em-dashes all the time in casual text.
Let's not forget the minus symbol at U2212. I was making a Simulink like diagram editor and the dashes just didn't look good. 2212 worked nicely.
The eternal debate between minimalism and the ornate.
There's room for both: when presentation matters I use them; when it doesn't, I don't.
Fun fact: In Portuguese, the em dash is often used to introduce direct discourse, much like double quotes are used in English, but only when the direct discourse opens the paragraph. So instead of:
"Hello," said John, "how are you today?"
You'd see:
— Hello — said John — how are you today?
Use three repetitions of the ASCII minus for em-dash, two for en-dash.
Do not use the Unicode characters, or people will think you are an AI bot.
I’m all about spelling things correctly. To, too, two or their, there, they’re matter. But using the correct dash/hyphen is way too pedantic to me. In isolation, I can’t tell the difference between them.
Invoking these from the mac keyboard:
While I'm here, Shift+Return for a soft return (i.e. not a new paragraph.)In LaTeX simple to remember hyphen (-), an en-dash (--), and an em-dash (---).
I've been writing for years and never used en or em dashes before LLMs.
a human has never used an em dash in the wild
My personal rule is simple I just use - for everything
DON'T YOU DARE
This shows both the en dashes and hyphens for page ranges. Is one preferred?
Or, you can avoid an awful lot of headache by just sticking to hyphens.
Thanks, but I'll keep using good old U+002D. Widening a glyph is a font/typesetting concern and doesn't make it a different character.
I simply do not care. I will just use - (the one next to zero on the keyboard) everywhere. There are a grand total of zero situations where using one in place of the other hampers information reconstruction or reading comprehension (although the latter is subjective, I suppose)
Is this meaning to grep for a double hyphen from standard in, or to mark the start of positional arguments and then grep for a hyphen? If you want both, it should be:
Which is just beautiful(Your example causes the last hyphen to be grepped for, which happens to only match doubled-up ones because single ones don't occur in that text. The quotes/apostrophes do nothing because they're parsed by (ba)sh and so only the hyphens are passed to grep, not the quotes. The last hyphen can be omitted because reading from stdin is the default if neither filenames nor recursion options are passed.)
Oh, of course simply quoting it doesn't disable the special meaning of --, because quoting is handled by the shell and argument parsing is handled by the program.
(Although that turns out not to matter for this particular grep invocation; the -- is still interpreted as a pattern and - as standard input.)
As long as "this" is not a typical README.md with code snippets.
I'm sick of em dashes cause somehow that's become the tell its AI generated text.
Super- or subhuman intelligence can be identified in the pre-Mason–Dixon line era.
On macOS you can enter these by doing the following:
* em dash: ⌥ + ⇧ + - (alt + shift + hyphen)
* en dash: ⌥ + - (alt + hyphen)
Most people don't use the em dash. It's too hard to type and looks too similar to a hyphen.
As a result, a hallmark of GPT-generated text is its (over)using of the em dash--I have stopped using it for this reason an just use two hyphens now instead.
Most people don't use em dashes... apart from professional and skilled writers, who use them regularly.
It's a bit of a problem that the same character is both a mark of LLMs and skilled writing.
Not necessarily, I don't consider myself a skilled writer by any means but I use em dashes a great deal.
Em dashes allow me to get multiple ideas into a sentence with comparatve ease and have it still make sense. Otherwise I'd have to add additional sentences to a paragraph which itself has issues. With a longer paragraph one has to worry about its readability and comprehensibility, and that means having to restructure it—remove redundancies, etc.—and that takes time.
Good writers can think ahead and do all that restructuring in their heads. When writing about an idea, concept or logical unit thereof they'll write out short, coherent and readable text all in one go, and it will make sense. I only wish I could do that.
As I see it, em dashes are more a crux for bad writers like me (they allow our text to be at least comprehensible).
Yes, true! I was tired when I clumsily made that point above (I am not a skilled writer).
I learned how to use the em dash properly about 6 months before the release of ChatGPT and then when it was released I realized that it used them all the time. So, to convince people that I both know basic grammar and I am human I started to use "--" instead of "—".
BTW, the double dash "--" was a common substitute for a em dash in the ASCII days before Unicode.
You mean en dash?
No. Em dashes are to separate two thoughts in one sentence. En dashes are for denoting ranges.
OK fair enough, I was referring to the fact that visually the hyphen is more easily distinguishable to the em than the en.
Ah I see, I thought you were referring to my first point.
If it's important in English, it should have a key on the keyboard. It follows that if it doesn't have a key, it's not important.
Here's my AutoHotkey script for making my favorite punctuation hotkeys on my Windows laptops the same as my Mac:
edit...downvoted, why? weirdHot take - differentiating between these at all is dumb. There is virtually no situation when using one instead of another improves clarity.
It is usually clear that 2-3 thingies means a range of thingies, but I seem to remember there being situations where it could also have been a minus sign. Perhaps it was with placeholders, where 10-N could be either one. Problem is, iirc, the real minus sign is longer than the hyphen, looking like an en dash (the one meant for ranges) and so it defeats the purpose... hence I totally use hyphens as minus signs, but en dashes for ranges, which makes sense in my head because a range has a certain span/length whereas a minus sign is just a little mark to indicate that something is negative. I see lots of people/software use en dashes for ranges but the existence of a real minus sign is, from my perspective, mostly just noted in typographic resources, so I think this reflects most people's usages (for the people that care for these details)
I do like that the em dash is as long as it feels that broken-off thoughts should be
Not everything has to be functional, sometimes things can also just look nice for the sake of it
It might not be completely true that nobody cares, but I feel that almost nobody cares.
> comma, a colon, or parenthesis
They're all different. There is a difference between clear writing and typesetting. Why mix them up? A narcissism of small differences?
minus (US negative) enters the chat..
Seriously. If you want your − + to match, in terms of crossbar vertical position and width.
For comparison—
− + minus sign
- + hyphen
– + en dash
— + em dash
−+-+–+—
The correct minus sign looks a lot clearer than a hyphen-minus when printing out negative numbers, especially at small font sizes. I have in the past written code to convert them.
emdashes are on the rise thanks to people copying and pasting chatgpt
“So, you want to be accused of being an AI…”