Software Engineer, Distributed Systems

fal·May 7, 2026·0 views

🌍 Hybrid · San Francisco, CA, USAFull-time

💰 $180,000 – $250,000/yr

distributed-systems python rust gpu-orchestration system-design ai-infrastructure performance-optimization

Job Description

About fal

fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise. For developers and enterprises, fal is the foundation that makes generative media not just possible, but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock new categories of AI-native products.

As generative media reshapes industries across a market projected to grow by hundreds of billions over the next decade, fal is becoming the ecosystem that ambitious teams build on.

About This Role

We're seeking an experienced software engineer who thrives on building large-scale computing platforms. As a Software Engineer, Distributed Systems at fal, you'll design and implement core infrastructure that powers the next generation of AI applications. You'll have deep expertise in large-scale distributed systems that handle high complexity, enormous traffic volumes, and massive data flows. Your expertise will be in achieving reliability and scale with minimal operational overhead.

This is an excellent opportunity for a senior engineer ready to take ownership of critical platform components that directly impact fal's ability to serve enterprise AI workloads globally.

Key Responsibilities

Build and maintain core Python/Rust platform components including request routing, AI workload orchestration, scheduling, GPU autoscaling, large-scale file storage, and queueing systems
Produce forward-looking architectural designs for platform evolution as fal scales to 100x current traffic while maintaining low latency across global regions
Leverage AI extensively to automate mundane aspects of building complex yet reliable systems, improving engineering velocity
Profile, optimize, and tune low-level CPU and memory performance to meet demanding SLA requirements
Collaborate with product and infrastructure teams to define technical direction and drive cross-functional initiatives

Required Experience & Skills

3+ years of hands-on experience building distributed compute and orchestration platforms in Python or Rust
Strong foundational knowledge of distributed systems fundamentals including consensus mechanisms, scheduling algorithms, fault tolerance patterns, and capacity planning
Deep understanding of computational complexity theory and memory allocation strategies in high-performance systems
Proven track record designing and implementing systems that scale reliably under real production load
Extensive experience building and using observability tools (monitoring, tracing, profiling) to drive performance and reliability decisions
Excellent communication skills with demonstrated ability to drive technical decisions and align stakeholders across teams
Self-starter mentality: executes quickly, takes ownership of complex problems, and constantly seeks improvement and learning opportunities

Nice-to-Have Qualifications

Experience with AI/ML inference or training infrastructure platforms
Background in high-performance systems programming including async runtimes, zero-copy optimizations, and memory-safe concurrency patterns
Track record building multi-tenant compute platforms with strong isolation and resource guarantees
Deep understanding of networking fundamentals and performance characteristics relevant to distributed systems
Familiarity with GPU workload characteristics, scheduling constraints, and optimization techniques
Experience with Kubernetes, container orchestration, or similar infrastructure platforms

Why Join fal

fal is at the forefront of the generative media revolution, building the infrastructure layer that powers AI-native products. You'll work on technically challenging problems at scale, collaborate with world-class engineers, and have significant impact on the future of AI infrastructure. The role offers competitive compensation, equity participation, and comprehensive benefits in a fast-growing, well-funded company.

💰 Compensation: $180,000–$250,000 USD annually plus equity and benefits. This range reflects Mid, Senior, and Staff level positions.