Show HN: A bi-directional, persisted KV store that is faster than Redis

79 points by mehrant 7 months ago

we've been working on a KV store for the past year or so which is 2-6x faster than Redis (benchmark link below) yet disk persisted! so you get the speed of in-memory KV stores but with disk persistence. To achieve this we've created our custom filesystem that is optimized for our special usecase and we're doing smart batching for writes and predictive fetching for reads.

In addition to basic operations, it also provides atomic inc/dec, atomic json patch, range scans and a unique key monitoring mechanism (pub-sub) over WebSockets which essentially allows you to receive notification on registered key changes directly from the KV store. so for example in a realtime web application, you can receive notifications directly in your front-end, with no back-end implementation (no WebSocket server management, no relay etc.) and still be secure and not expose your API keys on front-end. We have REST, WebSocket and RIOC API and we can't wait to hear your feedback.

We're only providing the free tier for now but let us know and we can increase the limits for you, if have a specific case. please either send us an email to support@hpkv.io or use http://hpkv.io/contact if you prefer that way.

documentation: http://hpkv.io/docs

realtime pub-sub: http://hpkv.io/blog/2025/03/real-time-pub-sub

benchmark vs Redis: http://hpkv.io/blog/2025/02/redis-vs-hpkv-benchmark

looking forward to hear your feedback :)

dangoodmanUT 7 months ago

What disks give 600ns persistence _with fsync/fdatasync_? Never heard of anything under 50us p50.

mehrant 7 months ago

the 600ns figure represents our optimized write path and not a full fsync operation. we achieve it -among other things- through:
1- as mentioned, we are not using any traditional filesystem and we're bypassing several VFS layers.
2- free space management is a combination of two RB trees, providing O(log n) for slice and O(log n + k) - k being the number of adjacent free spaces for merge.
3- majority of the write path employs a lock free design and where needed we're using per cpu write buffers
the transactional guarantees we provide is via:
1- atomic individual operations with retries
2- various conflict resolution strategies (timestamp, etc.)
3- durability through controlled persistence cycles with configurable commit intervals
depending on the plan, we provide persistence guarantee between 30 sec to 5 minutes
- dangoodmanUT 7 months ago
  
  I didn't necessarily mean exactly fsync. I guess I'll ask: Is it actually flushed to persistent disk in 600ns such that if the node crashes, the data can always be read again? Or does that not fully flush?
  
  mehrant 7 months ago
  
  yes, in that case data can potentially be lost. 30 sec in a worse case scenario without HA.
  
  dangoodmanUT 7 months ago
  
  So it's not actually persistence then.
  That's extremely deceptive, and (IANAL) I think false advertisement. I'd clarify it.
  That's also not HA, that's durability. Concerning.
  
  pclmulqdq 7 months ago
  
  I have a product to sell you with a postgres interface but p99 write latency of 100 nanoseconds. It's postgres but our driver says "write done" before a write completes. It's revolutionary!
  
  gkbrk 7 months ago
  
  There's a tiny 50,000,000x difference between the now admitted 30 seconds and the previously claimed 600 nanoseconds.
- dangoodmanUT 7 months ago
  
  And hold on, 600ns can't possibly be right...
  A memory copy plus updating what ever internal memory structures you have is definitely going to be over 1us. Even a non-fsync NVMe write is still >=1us, so this is grossy misleading.
  
  mehrant 7 months ago
  
  our p50 is indeed 600ns for write, the way I explained it. I understand that at this point, this can be read as "trust me bro" kind of statement, but I can offer you something. we can have a quick call and I provide you access to a temp server with HPKV installed on it, with access to our test suit and you'll have a chance to run your own tests.
  this can be a good learning opportunity for both of us (potentially more for us) :)
  if you're interested, please send us an email to support@hpkv.io and we can arrange that
  
  mehrant 7 months ago
  
  for the time being, have a look at this please: http://hpkv.io/videos/performance_local.webm
  this is 1M records, 3M operations on a single node, single thread, recorded in real time (1x).
  I understand that without access to the source of test program it's hard to trust, but we can arrange that if you decided to take on that call :)
  
  pclmulqdq 7 months ago
  
  The question from most of us isn't "did you get that number," it's "what does that number actually mean?" Writes don't need to return any data, so you can sort of set that latency number arbitrarily by changing the meaning of "write done." I can make "redis with 0 write latency" by returning a "write done" immediately after the packet lands, but then the meaning of "write done" is effectively nil.
  In every persistent database, that number indicates that an entry was written to a persistent write-ahead log and that the written value will stay around if the machine crashes immediately after the write. Clearly you don't do this because it's impossible to do in 600 ns. For most of the non-persistent databases (eg redis, memcached), write latency is about how long it takes for something to enter the main data structure and become globally readable. Usually, "write done" also means that the key is globally readable with no extra performance cost (ie it was not just dumped into a write-ahead log in memory and then returned).
  In a world where you spoke about the product more credulously or where code was open-source, I might accept that this was the case. As it stands, it looks like:
  1. This was your "marketing gimmick" number that you are trying to sell (every database that isn't postgres has one).
  2. You got it primarily by compromising on the meaning of "write done," and not on the basis of good engineering.
  
  mehrant 7 months ago
  
  Thank you for your thoughtful critique.
  To clarify what our numbers actually mean and address your main question of "what does that number actually mean":
  1- The 600ns figure represents precisely what you described - an in-memory "write done" where memory structures are updated and the data becomes globally readable to all processes. This is indeed comparable to what Redis without persistence or memcached provides. Even at this comparable measurement basis (which isn't our marketing gimmick, but the same standard used by in-memory stores), we're still 2-6x faster than Redis depending on access patterns.
  For full persistence guarantees, our mean latency increases to 2582ns per record (600ns in-memory operation + 1982ns disk commit) for our benchmark scenario with 1M records and 100-byte values. This represents the complete durability cycle. This needs to be compared with for example Redis with AOF enabled.
  2- I agree that the meaning of "write done" requires clear context. We've been focusing on the in-memory performance advantages in our communications without adequately distinguishing between in-memory and persistence guarantees.
  We weren't trying to hide the disk persistence number, we simply used "write done" because in our comparison we compared with Redis without persistence. but mentioning the persistence made an understandable confusion. that was bad on our part.
  Based on your feedback, we'll update our documentation to provide more precise metrics that clearly separate these operational phases and their respective guarantees.
  UPDATE:
  clarification on mean disk write measurement:
  the mean value is calculated from the total time of flushing the whole write buffer (parallel processing depending on the number of available cpu cores) divided by the number of records. so the total time for processing and writing 1M records as described above was 1982ms which makes the mean write time for each record 1982ns.
  
  pclmulqdq 7 months ago
  
  > For full persistence guarantees, our mean latency increases to 2582ns per record (600ns in-memory operation + 1982ns disk commit)
  By the way, this set of numbers also makes you look stupid, and you should consider redoing those measurements. No disk out there has less than 10 microseconds of write latency, and the ones in the cloud are closer to 50 us. Citing 2 micros here makes your 600 ns number also look 10x too optimistic.
  I would suggest taking this whole thread as less of an opportunity to do marketing "damage control" and more of an opportunity to get honest feedback about your engineering and measurement practices. From the outside, they don't look good.
  
  pclmulqdq 7 months ago
  
  I also see the update in response to this comment, and it puts everything into perspective. You haven't changed the meaning of "write done," you have just been comparing your reciprocal throughput against Redis's latency, and I think you have been confusing those two.
  "600 ns" then really means "1.6M QPS of throughput," which is a good number but is well within the capabilities of many similar offerings (including several databases that are truly persistent). It also says nothing about your latency. If you want to say you are 2-6x faster than Redis, you are going to have to compare that number to Redis's throughput.
  
  mehrant 7 months ago
  
  Reading your comment about comparing the throughput to Redis, it seems to me that you haven't read the benchmark article really. In there, we're in fact comparing the "throughput" and not the latency. allow me to quote some of the throughput numbers from the article mentioned above:
  Single Operation Performance
  Redis Single Operations
  SET: 273,672.69 requests per second (p50=0.095 ms)
  GET: 278,164.12 requests per second (p50=0.095 ms)
  HPKV Single Operations
  INSERT: 1,082,578.12 operations per second
  GET: 1,728,939.43 operations per second
  DELETE: 935,846.09 operations per second
  Batch Operation Performance
  Redis Batch Operations
  SET: 2,439,024.50 requests per second (p50=0.263 ms)
  GET: 2,932,551.50 requests per second (p50=0.223 ms)
  HPKV Batch Operations
  INSERT: 6,125,538.03 operations per second
  GET: 8,273,300.27 operations per second
  DELETE: 5,705,816.00 operations per second
  The latency of 600ns as I mentioned is a local vectored interface call and not over the network. the is not how we compared the system with Redis. the above numbers are using our RIOC API over the network, in which HPKV behaves like a server similar to a Redis server.
  The numbers above are compared with Redis in-memory and HPKV is still 2-6x faster. even if you assume HPKV as just an in-memory KV store with no persistence.
- pclmulqdq 7 months ago
  
  Wait, "depending on the plan"?
  You're already monetizing your non-persistent non-database?

alex_smart 7 months ago

I don’t get it. How could you be fsyncing the WAL in 600ns? What are the transactional guarantees that you are offering?

mehrant 7 months ago

that's a great question. the 600ns figure represents our optimized write path and not a full fsync operation. we achieve it -among other things- through:
1- as mentioned, we are not using any traditional filesystem and we're bypassing several VFS layers.
2- free space management is a combination of two RB trees, providing O(log n) for slice and O(log n + k) - k being the number of adjacent free spaces for merge.
3- majority of the write path employs a lock free design and where needed we're using per cpu write buffers
the transactional guarantees we provide is via:
1- atomic individual operations with retries
2- various conflict resolution strategies (timestamp, etc.)
3- durability through controlled persistence cycles with configurable commit intervals
depending on the plan, we provide persistence guarantee between 30 sec to 5 minutes
- buenzlikoder 7 months ago
  
  What storage backend are you using?
  A write operation on a SSD takes 10s of uS - without any VFS layers
  
  mehrant 7 months ago
  
  sorry for not being clear again. by saying this number does not represent full fsync operation, I meant it doesn't include the SSD write time. this is the time to update KVs internal memory structure + adding to write buffers.
  this is fair because we provide transactional guarantee and immediate consistency, regardless of the state of the append-only write buffer entry. during that speed, for a given key, the value might change and a new write buffer entry might be added for the said key before the write buffer had the chance to complete (as you mentioned the actual write on disk is slower) but the conflict resolution still ensures the write of the last valid entry and skips the rest. before this operation HPKV is acting like an in-memory KV store.
- addaon 7 months ago
  
  You’re getting a lot of crap (rightly) for your lack of clarity and fuzzy language use on this point…
  But that also points out the demand for the seemingly-unachievable promises you’re making. I wonder if it’s worth stirring up some out-of-production DIMM-connected Optane and using that as a basis for a truly fast-persisted append-only log. If that gives you the ability to achieve something that’s really in demand, you can go from there to a production basis, even if it’s just a stack of MRAM on a PCI-e card or something until the tech (re-) arises.
  
  UltraSane 7 months ago
  
  you can just use NVDIMMs which are generally 8, 16, or 32GB DIMM modules that have a enough flash and backup power to copy all data to the flash storage if power is lost on the host.
  https://www.micron.com/content/dam/micron/global/public/prod...

mrbluecoat 7 months ago

Something must be in the water.. this is the third similar tool in three days on HN

https://news.ycombinator.com/item?id=43379262

https://news.ycombinator.com/item?id=43371097

linotype 7 months ago

Yeah, Redis fucked around and found out.
https://redis.io/blog/redis-adopts-dual-source-available-lic...
- rendaw 7 months ago
  
  That was a year ago.
  
  conception 7 months ago
  
  I guess we know how long it takes to make a redis clone.
  
  linotype 7 months ago
  
  Correct.
  
  LarsenCC 7 months ago
  
  lol, I guess so haha

Snawoot 7 months ago

> 2-6x faster than Redis (benchmark link below) yet disk persisted!

That's a false contradistinction: Redis is also disk persisted.

The benchmark you did mentions Redis benchmarking guide and this guide has following paragraph:

> Redis is, mostly, a single-threaded server from the POV of commands execution (actually modern versions of Redis use threads for different things). It is not designed to benefit from multiple CPU cores. People are supposed to launch several Redis instances to scale out on several cores if needed. It is not really fair to compare one single Redis instance to a multi-threaded data store.

Did you just benchmarked against only single Redis instance and claimed performance win? Even if so, how do benchmarks compare against source-available competitor DragonflyDB?

Finally, documentation doesn't mention how persistence exactly works and what durability guarantees should we expect?

mehrant 7 months ago

thanks for taking time to write a feedback :)
> That's a false contradistinction: Redis is also disk persisted.
The performance gain mentioned was vs. Redis in memory. so we weren't claiming that Redis can't be persisted (which of course it can), but we were saying that Redis without persistence (which performs faster that with persistence) was still this much slower than HPKV with persistence. But you're correct that we probably should have been more clear in explaining this :)
>Did you just benchmarked against only single Redis instance and claimed performance win?
Signle node of Redis vs. Single node of HPKV. so it's an apples to apples comparison
>Even if so, how do benchmarks compare against source-available competitor DragonflyDB?
Benchmark with DragonFly coming soon :)
sorry about lack of that information in documentation, we'll update that. for for now, the durability guarantee on Pro is 30 seconds. on Business with HA is 5 minutes.
- Xelynega 7 months ago
  
  They asked about instances and you responded with nodes.
  From the redis comment it sounds like the way to scale a redis node is to increase the size and run multiple instances in parallel.
  Saying it's "apples to apples" would be like setting the thread limit to a competitor to 1, then saying it's a fair benchmark.
- edoceo 7 months ago
  
  Benchmark to KeyVal too please
  
  mehrant 7 months ago
  
  sure! :)
ForTheKidz 7 months ago

> That's a false contradistinction: Redis is also disk persisted.
This feels wildly disingenuous.

bjornsing 7 months ago

Interesting. I did some work on a related but different product idea (https://www.haystackdb.dev/) a few years back. Gave up though as it seemed hard to get traction / find customers. What’s your thinking on that? How are you going to reach your initial customers?

Would love to have a chat about possible collaboration or if I could help out in some way. Nice to see foundational tech coming out of the EU!

mehrant 7 months ago

thank you :) it would be interesting to have a chat for sure. would you mind dropping an email on the email I mentioned in OP and I'll reach out to you.

edoceo 7 months ago

How will it be faster than my Redis or KeyVal which is very close if your servers are far away? Network time matters here, right?

mehrant 7 months ago

of course. the speeds down to 15us can be achieved over network over our custom protocol on the same region. for sub-microsecond latency, you need to have HPKV running on the same machine as yours :)

avinassh 7 months ago

If it based on some research papers, could you link them please

mehrant 7 months ago

One thing we'd like to know your opinion on, is our key monitoring via WebSocket (pub-sub) feature. You can read more about it in our documentation under WebSocket.

Is it something that you think it's useful and you might have use case for or you can't see any value in it? In other words, is it something that you might consider using HPKV because of it?

kshmir 7 months ago

Why pay what you're asking instead of using dragonfly or something like that and just putting a beefier node?

ehsanaslani 7 months ago

Well that's a technical choice depending on the context, but I can list some of the advantages of HPKV:
-Persistent by default without any performance penalties
-The pub/sub feature which is unique to HPKV and allows for a bi-directional websocket connection from clients to database
-Lower cost as we need less expensive infrastructure to provide the same service
-Simple API to use

quibono 7 months ago

Is this 2-6x faster because of multi threading/core? Or is this actually 2-6x faster on a single core machine?

mehrant 7 months ago

the test was done on a single node and a single thread. on multi thread and batch operations, HPKV was still faster on the same machine

alexpadula 7 months ago

Why no open source :<

cess11 7 months ago

Does it have ACID guarantees?

mehrant 7 months ago

We provide some elements of ACID guarantees, but not full ACID compliance as traditionally defined in database systems:
Atomicity: Yes, for individual operations. Each key-value operation is atomic (it either completes fully or not at all).
Consistency: Partial. We ensure data validity through our conflict resolution strategies, but we don't support multi-key constraints or referential integrity.
Isolation: Limited. Operations on individual keys are isolated, but we don't provide transaction isolation levels across multiple keys.
Durability: Yes. Our persistence model allows for tunable durability guarantees with corresponding performance trade-offs.
So while we provide strong guarantees for individual operations, HPKV is not a full ACID-compliant database system. We've optimized for high-performance key-value operations with practical durability assurances rather than complete ACID semantics.
- gcbirzan 7 months ago
  
  > Consistency: Partial. We ensure data validity through our conflict resolution strategies, but we don't support multi-key constraints or referential integrity.
  That's not what consistency means in ACID.
  > Durability: Yes. Our persistence model allows for tunable durability guarantees with corresponding performance trade-offs.
  > ~600ns p50 for writes with disk persistence
  I'm pretty sure there's no durability there. That statement is pretty disingenuous in itself, but it'd be nice to see a number for durability (which, granted, is not something you advertise the product for).
  My main concern is that all these speed benefits are going to be eclipsed by the 0.5ms of network latency.
- cess11 7 months ago
  
  OK, thanks. Those tradeoffs aren't suitable for my purposes.

tobyhinloopen 7 months ago

Not open source, not interested.

It looks neat though, but I won't burn myself on anything closed source if there's open source and/or self-hosted alternatives.

Aurornis 7 months ago

The comparison to Redis felt misleading when I realized it was a commercial product.
I'm not sure I like this recent trend of people registering new accounts on HN to use "Show HN" to pitch commercial products.
Show HN is fun to see projects people have been working on and want to share for the good of the community. It's less fun when it's being used as a simple advertising mechanism to drive purchases.
- gkbrk 7 months ago
  
  It's a suitable comparison I think. Redis is also not open-source.
  https://github.com/redis/redis/blob/unstable/LICENSE.txt
  
  old_bayes 7 months ago
  
  As of 12 months ago, sure, but before that Redis was open source for 15 years.
- pvg 7 months ago
  
  Show HN is not limited to open source projects.
  
  Aurornis 7 months ago
  
  I never said it was. Seeing green accounts (just registered) doing nothing other than promote products doesn't feel consistent with the spirit of Show HN.
  
  pvg 7 months ago
  
  There's nothing wrong with green accounts doing Show HNs, really.
h1fra 7 months ago

agree, especially on something as common as Redis. Being locked in is not worth the better perf on something already super-fast.
But then I wonder what the business model is here? Even without being open-source, I'm constantly asking myself who pays for DBs that are not in a major cloud
tschellenbach 7 months ago

Doesn't always help. Price hikes on CockroachDB for instance have been crazy.
mehrant 7 months ago

thanks for taking time and commenting :) we'd still be happy if you decided to use it and give us your thoughts. as I mentioned in one of the comments below, we're hoping to go open source in future :)
- verdverm 7 months ago
  
  > we're hoping to go open source in future :)
  That's just lip service, either you intend to or you don't, it's not up to hope
  We hope you will see the dev tooling space is based around open source and you will alienate many potential users by not being open source
- tobyhinloopen 7 months ago
  
  Let me know when it's open source and I'd be happy to give it a try! It doesn't even have to be free - I'm fine with Unreal's model where you get access to the source even though commercial use requires a paid license.
  I want something running locally on my machine that doesn't rely on calling home.
  
  mehrant 7 months ago
  
  your concern is understandable. we'll be in touch :)
- hdjjhhvvhga 7 months ago
  
  Give me one reason to use a closed-source Redis alternative rather than one of many open ones, starting with KeyDB. If I wanted a closed clone, I'd probably go with DragonflyDB (whose license is "feel free to run it in production unless you offer it as a managed redis service").

ranger_danger 7 months ago

Does not appear to be open-source.

mehrant 7 months ago

correct. however we're actually planning to make the system open source in future; we can't set an exact date as it depends on various factors, but hopefully not too far out. :)
- notpushkin 7 months ago
  
  Will be looking forward to that!
  However, it feels a bit weird: at this level of performance going SaaS only kinda defeats the purpose, no?
  
  mehrant 7 months ago
  
  our approach is actually hybrid. on the other side of the performance coin, we have resource efficiency. that resource efficiency let's us provide a performant and low latency managed KV store, with lower cost, so the economy of it makes sense. the idea is that not everyone requires sub-microsecond latency, and for that group the value proposition is a low latency kv store which is feature rich with a novel bi-directional ws api. for people who need sub-microsecond latency, we're planning custom setup that allows them to make a local vectored interface call to get the sub-microsecond speeds. in between, we have the business plan that provides a custom binary protocol that is the one used in the benchmark :)
- huhtenberg 7 months ago
  
  In that case you need to provide an SLA for your speed claims. Otherwise the claims are basically moot.
  
  mehrant 7 months ago
  
  that's a fair point and you're correct. we will have the SLAs for latency documented and provided soon. in the mean time, please try it out and give us your feedback :)
  
  huhtenberg 7 months ago
  
  The site is very snappy, which matches well your pitch.
  However your principal selling point - the nanosecond-level speed - falls flat because it's a property important in self-hosted scenarios. Once you put your super speedy stuff behind a web-based API, that selling point becomes completely meaningless. The fact that once our data hits your servers it is handled really quickly doesn't mean much. I am sure you are perfectly aware of that.
  That is, your pitch is disconnected from your actual offering. If you are selling speed, it needs to be a product, not a service. It doesn't need not be open source though, just looks at something like kdb+.
  
  mehrant 7 months ago
  
  thanks for the feedback :)
  our main target for "performance" value proposition are companies and businesses which will setup HPKV either locally (Enterprise plan) for nanosecond performance or in the cloud provider of their choosing, and working via RIOC API (Business Plan), getting ~15 microsecond range over network. however you're totally right, that doesn't really matter much if you're using it REST or WebSocket. for Pro tier, our value proposition is still the fastest managed KV store (you still get <80 ms for writes with a ~30ms ping to our servers) and features such as bi-directional WS, Atomic operations and Range Scans on top basic operations.
  but given your comment, I think we should perhaps rethink how we're presenting the product. thanks for the feedback again :)
- 4m1rk 7 months ago
  
  What's the tech stack? If you can share.

koushik_indie 7 months ago

[flagged]

theonlyvasudev 7 months ago

Amazing!

mehrant 7 months ago

thank you :)
- CyberDildonics 7 months ago
  
  That person only has one comment but they also made a database 8 months ago. Crazy coincidence, you could get together and compare notes.