A company is an organic, indeterministic, complex system, running on humans.
An operating system is a quite formal, deterministic, complex system, running on logic gates.
There’s one thing in common between them: they provide access to the underlying resources. Even so… some of them are actually successful in doing so!
Maybe, there’s something to be learnt from approaching the organic system (company) like an OS?
Could you put your rules, routines under version control?
For every change that touches the source code, there should be a well defined process how that happens. Somebody should probably check whether the changes introduce any unintended side-effects. Otherwise you're just one of those cowboy developers who pushes to master and makes it hard for everyone else to work on this thing!
As an engineer of a company, how would you put the rules, the routines, the behavioural patterns, the processes under version control, and what would it enable?
Would it be useful to use tags, so people can refer to specific versions?
Would it be useful to create pull requests, so everyone can review the change before it gets merged, and spot inconsistencies/risks?
Would it be useful to test the new code before its merged? To experiment with the changes, and see whether they really work?
What scheduling strategies does your company use to decide what to work on day-to-day?
Most operating systems seem to be able to balance allocating resources well for high-priority, time-sensitive tasks and still executing on longer, lower priority tasks.
Does your company work on multiple products or have multiple target audiences to serve?
What are the priorities of these goals/products, and are they explicit?
How do competing goals (processes) fight for the resources at this time?
What kind of scheduling algorithm does it work with?
Is it First come, first served?
That’s pretty chaotic - but could be pretty satisfying as well. New ideas are always appealing -
Is it Priority scheduling?
Do you always work on the highest impact, most urgent task, and ignoring its size?
Do you get the reward of completing tasks often this way?
Is Shortest remaining time first?
Surely the most efficient way to complete the most amount of projects… But would those project matter at all?
Is it Fixed priority pre-emptive scheduling?
Do lower priority tasks regularly get interrupted by incoming, higher priority tasks?
Or a mix of, let’s say 5 different strategies?
Is it explicit _how_ and _why_ the scheduler mixes the algorithms?
Do you know why a certain thing is on your roadmap?
Was it selected by which priority scheduling algorithm?
Does your company provide a “thread-safe” environment, without excessive use of locks and unnecessary waiting time?
Most of the work in a department/team is inherently concurrent - but usually this doesn’t manifest itself in a formal, easily followable concurrency model.
I do believe that bad concurrency models do cause a lot of headaches for organisations. Mutexes, locks are error-prone, and threads, that force you to use these should be considered harmful.
Does your company uses locks to synchronise processes: do different processes wait for one team to complete their work, then another phase starts? Or, do they overlap?
Do you wait for your manager to approve things?
Can departments can work on the same thing, without stepping on each others’ toes?
Or is your company using an actor-based model concurrency model?
How is that experience like for your coworkers?
Are the caches (immediate communication channels) structured correctly and have the right size?
What’s the immediate and semi-immediate communication channels your company uses?
What’s your company’s L1 cache - the one that’s quick to access, but limited in size? Slack? Calls?
Is the L1 cache’s size appropriate?
What’s your company’s L2 cache - the one that’s bigger but slower?
Is it email? F2F Meetings? A project management’s tool commenting system?
Is the L2 cache’s size appropriate?
Does you company have an L3 cache?
What does that look like, and what is it used for?
With this angle, we could potentially spot issues, where people are using instant messaging for things where fast response time is not essential, like discussing strategic moves. Then there are some issues that may actually need a faster reaction time, that are sitting somewhere in the L3 cache, while blocking entire important projects from moving forward.
Error handling
What are the places where an “exception” is thrown? - Where would you catch an error?
How do you monitor errors?
How would you debug an error something in your organisation? Can you trace back to its root cause?
Do you have a something like a stack trace available? (Retros?)