About the role
This role is for one of the Weekday's clients
Min Experience: 10+ years
Location: Hyderabad
JobType: full-time
Requirements
Architecture & Design
o Create comprehensive reference architectures using AWS services such as VPC, ALB/NLB, EC2, ECS/EKS, Lambda, API Gateway, S3, DynamoDB/Aurora, OpenSearch, CloudFront, and Route 53.
o Design systems for resilience and high availability, incorporating multi‑AZ configurations, active‑active/active‑passive setups, cross‑Region disaster recovery, defined RTO/RPO goals, automated failover mechanisms, throttling, retry strategies, dead-letter queues (DLQs), and circuit breakers.
o Promote security-by-design principles through IAM least privilege policies, KMS, Secrets Manager, VPC endpoints/PrivateLink, WAF/Shield, GuardDuty, Security Hub, and threat modeling.
• Practical experience developing Generative AI applications on AWS using LangGraph, including:
o Designing and managing multi-step agent workflows with LangGraph for tasks like retrieval-augmented generation (RAG), tool invocation, workflow branching, and maintaining stateful interactions.
o Integrating LangGraph with Amazon Bedrock features such as model invocation, guardrails, embeddings, Knowledge Bases, and Agents, as well as AWS services including Lambda, API Gateway, Step Functions, DynamoDB, and S3.
o Implementing secure, scalable, and cost-efficient Generative AI solutions involving evaluation, prompt management, latency improvements, caching techniques, and content safety controls.
o Developing production-grade GenAI microservices or platform components equipped with observability (logs, metrics, traces), CI/CD pipelines, and automated testing.
Build & Platform Engineering
o Lead the use of Infrastructure as Code (AWS CDK, Terraform, CloudFormation), manage Git-based workflows, and automate pipelines using tools like CodePipeline, GitHub Actions, and Azure DevOps across various environments.
o Establish observability standards using CloudWatch, X-Ray, and OpenTelemetry; define SLOs and error budgets; correlate logs and traces; and develop automated runbooks.
• Performance, Reliability & Cost
o Conduct load and chaos testing, perform capacity planning, define autoscaling policies, and implement data partitioning and caching strategies.
o Optimize total cost of ownership (TCO) by leveraging Savings Plans and Reserved Instances, adopting Graviton processors, right-sizing resources, applying storage lifecycle policies, and utilizing cost allocation tags.
• Leadership & Stakeholder Collaboration
o Serve as the technical lead for cross-functional teams by breaking down initiatives into actionable architecture epics.
o Collaborate with Product, Security, and Operations teams to define roadmaps and acceptance criteria; effectively communicate design decisions and trade-offs to senior stakeholders.
o Mentor engineers and elevate engineering standards and architectural discipline through reviews and guild participation.
Basic Qualifications
• Minimum of 8 years’ experience designing and building production systems on AWS, with at least 3 years in an architect or technical lead role involving hands-on development.
• Demonstrated success delivering scalable, highly available, and resilient services employing multi-AZ and cross-Region patterns and disaster recovery strategies with defined RTO/RPO.
• Expertise in development using one or more programming languages such as TypeScript/Node.js, Python, or Java, and working with microservices, serverless, or container platforms like Lambda, ECS/Fargate, and EKS.
• In-depth knowledge of AWS networking (VPC, subnets, routing, NAT, TGW, PrivateLink), security (IAM, KMS, Secrets Manager), and data services (DynamoDB, Aurora, S3, event streaming with SNS, SQS, Kinesis).
• Hands-on experience delivering Generative AI solutions on Amazon Bedrock, including working with models, guardrails, RAG, Knowledge Bases, and integrating enterprise data sources.
• Strong proficiency in Infrastructure as Code (IaC), CI/CD automation, testing, and observability.
Preferred Qualifications
• Certifications:
o AWS Certified Solutions Architect – Professional (required)
o AWS Certified DevOps Engineer – Professional (preferred)
o AWS AI/ML or Generative AI specialty certifications (preferred, if available)
• Experience with SageMaker (JumpStart, model hosting, tuning) and vector search technologies such as OpenSearch and pgvector.
• Familiarity with Zero-Trust security models, compliance standards (SOC2, PCI, ISO 27001), data loss prevention (DLP), and data residency requirements.
• Experience with event-driven and streaming architectures, schema governance, idempotency, and eventual consistency patterns.
• Prior responsibility for migration or modernization projects, including transitioning from monoliths to microservices, on-premises to AWS, or lift-and-shift evolutions.
Soft Skills (Critical)
• Executive-level communication skills, delivering clear, concise narratives and visuals tailored for both technical and non-technical audiences.
• Technical leadership and mentoring capabilities, including setting guidelines, conducting design and code reviews, and empowering teams to work independently.
• Strong stakeholder management, encompassing roadmap alignment, expectation management, risk and issue handling, and conflict resolution.
• Product-oriented mindset focused on outcomes, data-driven decisions, automation, and iterative delivery.
Must-have skills
AWS Architecture, AWS bedrock
Good-to-have skills
Python