What is a distributed system?

Published in tech industry, job hunting, on Mar 7, 2021

The way I think of distributed systems has a lot to do with virtual cloud providers like AWS and Google Cloud. Historically apps have been built with a single system like Ruby On Rails, Django, Laravel and other large Node.js, Java and Python apps. This works but at massive scale presents new challenges to handle all that data and maintain fast application performance for millions of users around the world. Apps are hosted on web servers, machines that require electricity and cooling. Web servers can be hosted "in the cloud" on server farms maintained by private corporations. Cloud service providers make billions selling distributed system technologies to corporations and individuals. If you want to build a modern distributed system that runs in the cloud you're likely going to need to pay a lot of money to Amazon, Google or Microsoft.

Monolth vs Micoservice

In contrast to monolithic applications stand "microservices". Microservices are paid products from cloud providers that couldn't be bought from traditional server providers. Microservices are machines that boot up for fractions of a second, run some code and then turn off to a warm state. If a microservice hasn't been used for a while it's known as a "cold start" making response times slower. Microservice technology from Amazon, Lambda functions, are invoked every time someone say "Alexa". Google Cloud functions are invoked when someone says "Hey Google". Massive scale voice assistants and much of the tech industry require microservice computers that boot up and shut down at will to run code.

Applications on machines that you maintain instead of fractional machines fired up by cloud service providers are called "monoliths". A monolith could by a Ruby On Rails, Django or Laravel app. Ruby On Rails, Django and and Laravel are all open source web frameworks built by the community to make the web a better place. There are tons more commercial options for buying server hosting when building with a microservice

Educative has an article with definition of distributed computer systems. Educative has a lot of great resources like a course on Airflow and Python and grokking the system design interview.

From Educative: "A distributed system is a computer network that consists of independent components communicating through a decentralized protocol".

This Free Code Camp article is also legit. Keep doing what you do free code camp ✊

Interested in front end web development? Check out Vue, Inertia and TailwindCSS.

Examples of distributed systems

These are some examples of distributed system technologies.

  • Scuba: a fast, scalable, distributed, in-memory database built at Facebook for web application performance monitoring.
  • MapReduce from Google research for processing and generating large data sets.
  • Cassandra from Facebook research is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers.
  • Crypto networks for making networks secure and managing information. From the linked slides:

secure keys cryptography

  • Kafka for handling trillions of messages.

Distributed systems can be used to build powerful technology that handles insane amounts of information like Netflix, Uber and many major enterprises today. It's not easy to do and takes teams of engineers to run the data and the business at scale. Being able to work with distributed systems can be a game changer for businesses and is a sout after skill in the labor market.

Interested in passing a system design interview? Try out this course (paid product) by Educative called Grokking the System Design Interview.

Ethics

  • Environmental considerations: Running distibuted systems can use up tons of electricity. In the case of Bitcoin and proof of work this results in massive computer servers subsidized by coal fired power plants solving math problems for money. For more info see article: Bitcoin is ruining the planet. Be conscious about what programs you're running and the real world implications!

  • Concentrated corporate power: There are only so many cloud service providers and companies that offer cloud services. The investment to host server farms and cutting edge technology at scale is huge. Amazon, Google and Microsoft are the major players in this space. They've bought up their competitors. Building distributed systems necessitates paying these corporate entities to host your application. At the beginning of this year the AWS team showed they're willing to pull their platform out from a company at will, in the case of Parler. There's net neutrality regulation but that does not extend to a cloud service provider like AWS (yet).

How to scale database writes?

Scaling database reads can be tricky, but scaling database writes can be dastardly. Database writes happen when information is recorded to tables, an activity that happens often in today's web applications. These are three available methods for scaling database writes.

  • Sharding: sharding is hard. It involves segmenting out your database by a primary key, or using other methods to keep databases in sync across multiple machines. For a full breakdown of sharding and how it works I recommend reading through this Digital Ocean article. Digital Ocean provides web hosting materials and the foundations for how to build things on the internet. If web apps are today's railroads Digital Ocean sells steel and beams, among other things 🚂

  • Clustering: (managed version): fully managed sharding. Your joins stay intact. Google Spanner is a popular service for clustering structured databases. Amazon has Arora for managed clustered database.

  • PubSub by Google or Amazon Kinesis to create data streams. Great for building things like social media-style activity feeds.

  • Use a CDN and ElasticSearch (or equivalent): Fastly comes recommended as a CDN. There's also Cloudflare, a successful company in San Francisco for CDN hosting. Other innovative databse technologies include Cassandra, GraphQL, CockroachDB. Distributed database tools can be enormously helpful when mining cryptocurrency.

Resources

"We Just Rolled Up A Snowball And Tossed It Into Hell. Now Let's See What Chance It Has." - Ethan Hunt, Mission: Impossible II