Event-driven startup

Alexey Zimarev
14 min readMar 14, 2021

During the last year, I’ve done quite a few webinars, talks and workshops about Event Sourcing. A lot of time, the audience answered a question: when I should not use this pattern. In broader terms, the question appeared during the Ask me anything with Udi Dahan session organised by Virtual DDD. The question was: what are the circumstances where things like Event-Driven Architecture (EDA), Service-Oriented Architecture (SOA), CQRS, et al., aren’t applicable.

During the session, Udi argued that when building one of the first iterations of software for a startup, you’d need to focus on validating the idea before thinking about things like scalability and, overall, doing it right. Therefore, the more complex software design patterns are premature in that kind of environment.

It resonates with one of my own answers given to the audience when asked about Event Sourcing in particular. I usually tell people that going with Event Sourcing might introduce unnecessary complexity when you don’t really know what you’re doing. There’s one caveat here, though. I might not express my thought well enough, so let me elaborate on this a bit more.

Quality of early software versions

Early versions of any software are built with many workarounds, hacks and shortcuts. Deliberate acceptance of such compromises on quality is what we call technical debt, meaning that it will be repaid with interest. In a startup environment, such decisions are often well-justified. When we don’t really know if the software will solve the problem at hand, or the problem even worth solving, we might even write the debt off and the software, which accumulated it. Like a startup company burning investors money and then going out of business, early versions of software could get bankrupt, and the debt is then written off the books.

Keeping this in mind, we could clearly see that spending a lot of time on a technical excellency in a piece of software, which might even never be used, doesn’t make much of a sense.

I won’t talk here about EDA or Event Sourcing just yet, as I don’t believe that applying those patterns makes your software better by definition. It might, but it can also go the opposite way. However, I’d like to point out that Domain-Driven Design (DDD), which is not really a software pattern or architecture, might be saving you months of work, especially during the early phases. So, how DDD could help?

Domain-driven design (DDD) is an approach to developing software for complex needs by deeply connecting the implementation to an evolving model of the core business concepts. — Eric Evans

Let’s get back to the question and some answers I mentioned earlier. Presuming the startup idea hasn’t been properly validated; we might even not know if the problem we want to solve is relevant. That is, in fact, the very area where DDD shines. Collaboration with your future customers (users), called domain experts in DDD, brings tons of valuable insights about your customers' struggle and how you can solve their problems. Conducting a few Event Storming sessions with real customers will definitely change your view of the domain and your proposed solutions. You will find things you didn’t know about, you forgot to consider, you thought are important, but they aren’t, and the other way around.

That being said, I am not telling you to use Aggregates, Repositories or Value Objects. That is not the point. But, you’d be able to see, understand, and model your system's behaviour in a way the code alone would never do.

In fact, just doing that, in iterations, could prevent your software from going bankrupt in the first place! Then, if that’s the case, should we start planning for bankruptcy from the start or prepare our system to run a marathon instead of a sprint (and die)?

Another issue with poorly designed software, especially in a startup environment, is that Phase Two never comes to be. What was intended to be a tech debt will essentially become a burden, which you won’t have time to address. Why? Because startups can rarely afford to solidify their systems, as they have limited engineering capacity, always struggle with funding, and have a very ambitious product backlog. It’s okay to scrap five versions of badly written software because it doesn’t fit the purpose. When the sixth version is a success, it would also be badly written because we planned to scrap it too, but eventually found it useful. Don’t expect a refactoring break to make it better. It works, so you’d need to build more and more features on this weak foundation. Eventually, it will grow to a Big Ball of Mud and disintegrate under its own weight.

Image credit: Bonkers World

Now, the question is — can we find some middle ground?

Event Sourcing in a startup

I sometimes build small products on my own to get away from the daily routine and to keep my technical and product-oriented skills in shape. It looks like a startup of one, and I am sure some of my colleagues do the same.

Once, I build a working system to support my holiday property rental business. It was a monolith, and it used a document database for persistence. Lots of workarounds, hardcoded bits, usual hacks were right there. In such a scenario, I am also a domain expert, as I know exactly what issues I want to solve, just because these are my own issues. In that regard, I am not ready to say it’s a clean experiment, as you’d rarely get that level of understanding of the problem space from your prospect customers. Still, good enough to call it a lab experiment.

Models will be wrong

After a short while, I found out that my domain model was wrong. I haven’t spent enough time to model a variety of scenarios and only focused on the most obvious ones. When I figured out a better model, I found myself stuck with the system I’ve built. Why was that? Simply because all I’ve got were many documents in the database representing the current system state. As the model was not right, the state itself was correct, but it didn’t have enough data. I could also say that the model was not entirely wrong, but it was missing an important context, which I didn’t even know exists.

What I realised at that time is something I clearly remember today. The behaviour of my system was right. All the commands I had were valid and useful. However, as I used state-based persistence, I didn’t capture the behaviour explicitly, updating the system state instead, as we do “by default” almost everywhere. For the new model, I needed a different representation of that behaviour, represented as another piece of state. So, here is what I learned:

Reconstructing state from the behaviour is very easy. Reverse engineering the behaviour from a piece of state is extremely hard, and sometimes impossible.

Here’s an example of a state, a Booking state in MongoDB:

{ 
"_id": "ac2fd0edd2d74f249afea3f9014934ad",
"amount": "5600",
"bookingChannel": "booking.com",
"checkInDate": [{"$numberLong": "637498908000000000"}, 0],
"checkOutDate": [{"$numberLong": "637500636000000000"}, 0],
"externalBookingNumber": "2955008750",
"guest": {
"name": "Ole Nordmann",
"email": "ole.fake0@brooking.me",
"phone": "+78123123123"
},
"paidInFull": false,
"prepaid": false,
"property": {
"_id": "0392c950d8ea4840850d098af0de12df",
"name": "Great Apartment"
}}

I eventually found out that when I need to check the availability for a new booking, just going through all the future bookings for that room is very hard. And, if you deal with room categories instead of individual rooms, it becomes even harder. I needed something that’s called a Day— a concept you would find in many domains, which deal with scheduling.

If my system would’ve been event-sourced from day one, I could drop and rebuild the system state or introduce new state representations easily by writing new read-model projections. I could’ve created them in separation from the currently running production system and still use production data, as it won’t even touch anything that already works.

Projections can present the original data in different ways.

Compare it with a possibility to change the state database schema. When doing such a change, I’d need to have a migration, which must run once. I must rigorously test this migration before it goes to production. Otherwise, the whole system stops working. Migration cannot be run in production continuously, side-by-side with the production system so that you can experiment with it. No, you run it, and it’s done. If it goes horribly wrong — you restore the whole thing from a backup (when you have one), then try again.

How would I produce several Day things from a bunch of bookings? I'd need to split each booking, per room, into a bunch of days, where each day can be free or occupied. I also wanted to manage ongoing tasks, so a day would also have things like clean the apartment after departure, but that is even from a different bounded context! First, that would not even be a migration; it would be a component, which I must often execute to update those days from new, cancelled and updated bookings. The Booking thing is not going away, the Day thing is a derivative of the Booking and, potentially, other things.

I remember spending many hours trying to fix my state-based system. I don’t even remember if I gave up or eventually dit it. I remember that every minute of those hours, I regretted not having my system event-sourced from the start.

Experimentation

As a follow up from the previous paragraph, I can also share my experience with another iteration of that system of mine. This time, it’s fully event-sourced. I made quite a few mistakes in the model (again) as I had to build a quick working prototype, serving real users, seeking accommodation in my holiday apartment, and it worked. Now, did I gain anything from Event Sourcing? I definitely did, and it’s something that any startup company would highly value. As I mentioned before, the behaviour in my system is rather obvious for many scenarios. The UI and UX part is not, but only when it comes to showing information on the screen. Again, as commands represent the intent, they are a part of the system's behavioural model, and it is mostly fine. I discovered a need to build a few screens, which would require heavy queries in the existing representation of the state of some elements of the state, which were entirely missing. And you know what? I had no issue at all building those query models from the events I have at all. Besides, I could build two different read-models and run them side-by-side. New events would continuously feed them. I can build different versions of one particular UI component, deploy them both to production, and do A/B testing on real users without damage to any other piece of the system, including the source-of-truth database, such as EventStoreDB.

In a way, I was able to seek and explore different solutions for the system UI without changing any data structures at all. It brought me very close to the concept of finding the best solution by going wide rather than iterating on what I already have.

Iterations will find you a good solution, but not the best one.

Another aspect of experimentation support is using events to trigger another behaviour, usually called integration. For example, I have a subsystem, which receives parsed incoming emails from SendGrid, and incoming texts via Twilio or Nexmo (now Vonage). Those things are events, and I treated them as such from the start. All I knew is those messages could be coming from guests or a booking channel, like Airbnb or booking.com. But, I didn’t know the format of those events, if I needed to parse the text, and, essentially, what to do next. So, at the start, I just saved those events to my event store, as-is. After accumulating some data, I understood how to process those events and trigger other behaviour, like communication with guests, recording new bookings, or processing cancellations. It is not Event Sourcing, but what we call EDA — Event-Driven Architecture, and when it comes to integration, it’s an invaluable tool.

Issues with Event Sourcing

Of course, silver bullets don’t exist. There’s always a fly in the ointment, and I’ve got one too.

Although I said the behaviour of my system is rather obvious, it has quirks. For example, I can receive a block on availability from the calendar feed of the booking provider. I modelled it as an IncompleteBookingAdded event, a part of the Booking aggregate. However, that might not be entirely correct. It might be a block for a different reason, like the booking channel block based on certain limitations, which I have set for that channel myself. As an example, it could be the minimum stay duration block. As the calendar feed doesn't include any essential booking details, I never really know what it is and how to deal with it. When I receive a text from the booking channel, I get much more information, and I could reliably parse it to a proper booking, but what shall I do with a matching calendar feed event? Do I need this IncompleteBooking thing to be a part of my booking aggregate? Probably not. But I already have those events, so what do I do with them? I'd rather move them out to a separate aggregate type and, therefore, stream, but I can't, as events are immutable. Another problem was very technical. I started using NodaTime but forgot to configure the serialisation properly. So, right now, many of the events, which are already in the production database, have some dates serialised incorrectly. After I configured the serialisation properly, they don't deserialise anymore!

I expect many such things to happen, but I accepted the challenge. For me, the benefit of explicitly storing the behaviour outweighs the issues I am currently facing. I also know that the issues I have can be solved by applying patterns from Greg’s book Versioning in an Event Sourced System, so the problems I have are purely technical, and technical problems aren’t the hardest ones to solve.

Microservices

Now I’d like to touch the SOA bit. Are services overkill in a startup? I highly doubt so if you don’t go overboard, of course.

In the system I mentioned, there are four services right now:

  • Backoffice
  • Guest portal
  • Messaging
  • Calendar feed sync

The reason for me to split those subsystems into individual services is quite obvious. They have very different concerns, and I can clearly identify those as bounded contexts, although some of them might be multi-context services.

Services are awesome

Could I have built them all as a monolith? Yes, why not. Would it be easier to maintain? Definitely not. Here are my arguments.

First, I don’t want to deploy the whole thing when only a part of it changes. The Backoffice is for me, as the property manager. It has authentication and authorisation bits, which don’t apply to any other component. I am also free to deploy it at any time, as it would only affect me. Deploying the guest portal might happen right when a guest is trying to check-in, or worse, is in the middle of the payment session. Deploying the Messaging service without a proper rollout strategy might lead to lost messages, so I must have at least one instance running at all times. The Calendar sync service has a similar limitation but less extreme. In the worst case, I miss a sync call from the booking channel, but they will do it again later. Therefore, I am free to design different rollout strategies for each of the services right now, so I keep the balance between provisioning too much or too little.

Second, I feel better working on a contextualised piece of software. It doesn’t cause me much cognitive overload, as I know exactly what it does and what I need to do with it. Before I even start talking about different developers or teams working on different services, it is already a relief when I work alone. I have this tendency to pick up things when I see them, so in a monolithic system, I often find myself dragged away from the initial task, as I start to notice other, unrelated things, which I want to fix, “by the way”. As a result, I might never be able to finish the task I really wanted to do. It might be not a problem for people, which are more disciplined and focused; it’s just my experience.

As a consequence, I also separate issues into different repositories. When I work on a single service, I can clearly see what needs to be done right there, without applying any issue filter or similar. Seeing incoming text message processing issues on the same board as the better guest check-in experience is quite distracting.

Two of my current services also have UI. Since the services are separated, I don’t need to care about using the same visual styling, SPA framework, and the same built system. I choose what’s best to do the job in that particular subdomain and service without affecting anything else.

I can also build a new version of the Guest Portal and do A/B tests with two different versions, without any concerns about parallel changes in the Backoffice or any other isolated component.

Concerns about services

As for Event Sourcing, splitting the system into services come with some trade-offs.

First, I need some transport medium to share events between the services. Luckily, as my system is event-sourced, I already persist events in a way that I can consume them elsewhere, real-time. I will have to introduce integration events later on, but I don’t have an immediate need for that, and I accept the coupling for the moment. Again, it’s a technical problem, and it’s easy to solve.

Second, I have to build and maintain proper delivery pipelines for all the services. Once more, it’s a technical problem, but it has to be solved immediately. Here I am in luck (maybe not the right word) again as I know how to do it. I have a managed Kubernetes cluster in Digital Ocean, my own private GitLab instance, and a good foundation for automated deployments using Pulumi. Since I have the knowledge and experience needed to do all that DevOps stuff, everything on the operational side runs smoothly. Some might prefer using serverless functions, which might be even better to do the job (not always) with even less effort. So, for me, maintaining a reasonable fleet of microservices is not really as much of a burden as it used to be in the .NET world a few years ago.

Conclusion

As I previously mentioned, dealing with Event Sourcing, DDD, microservices, etc., in a startup environment might be not a thing for you. This statement is especially relevant if you don’t have much experience doing some or any of it. Then again, if you don’t do it, you never get the experience anyway…

As I got more proficient in applying these patterns and got more comfortable with the DevOps part, I stopped worrying about so-called accidental complexity, which many developers associate with Event Sourcing and SOA. It’s like a muscle you need to train, and then you start using it. Alas, I keep seeing that systems are being built without any model, with monolithic databases, by teams, which struggle to focus, teams, which keep arguing about their endless dependencies.

Like any other human activity, software development is full of compromises and trade-offs, so you need to choose your battle and the lesser evil. So, before you decide to build a monolith first, make sure to see the other point of view.

As for me, I’d rather build an event-sourced, event-driven system separated into a few services. Even in a startup. Especially in a startup!

See also

Originally published at https://zimarev.com.

--

--

Alexey is the Event Sourcing and Domain-Driven Design enthusiast and promoter. He works as a Developer Advocate at Event Store and Chief Architect at ABAX.