Mrityunjay Kumar
MRIT-yen-jay koo-MARR
- blog
- Resume in html
- Curriculum Vitae
- Palo Alto, California, US
I am eager to pursue new opportunities in the field of distributed systems and I am confident that my skills and experience make me a valuable asset to any organization in this area.
Currently, I work in a role in Microsoft's PowerPoint backend service engineering team. I have over 7 years of professional experience in software development, including over 2 years with VMware's vSAN storage team, and 3 years as a Senior Software Engineer on the Data Science team at Talentica Software. Additionally, I spent 2 years in the multimedia team at Mediatek.
Current Experience
Software Engineer - 2
Powerpoint Backend engineering team
Research Experience
Graduate Research Assistant
Distributed Multi-core Transactional Database
- Implementing asynchronous replication with multi-process paxos to remove contention
- Implemented fast header-only/compiled, C++ logging library facilitating aligned memory allocators with features like Rotating log and auto-flush in approx. 1200 LOC.
- Implemented optimistic replay protocol to guarantee serializibility and wrote verification pipeline for logged streams.
- Wrote various evaluation micro-benchmarks to inspect contention points and indenfied flows help of mutrace and gprof tool
Previous Experience
Member Of Technical Staff - III
Lead effort for designing FSCK user-space tool infrastructure for VMware Distributed Storage. Core part of snapshot dev team.
- Distributed FSCK [Infrastructure] Conceptualized and implemented distributed FSCK tool for analyzing and repairing storage metadata. This tool is leveraged to analyze the state of metadata storage on disk after crash recovery by forming analysis matrices and cross consistency patterns between multiple metadata data structures like B+ trees, bitmap, segment usage table. The service can be scheduled or run on-demand to check the health of the storage system k-v store. In addition, the service is also used for other storage features, such as snapshots, un-map, segment cleaning to verify consistency and integrity of the metadata.
- Distributed FSCK [Core Algorithms] End-to-end implementation of a cloud-native microservice to validate the integrity of the file system key-value store (such as B+ trees). New validating algorithm can be attached in both online and offline modes, leveraging lazy evaluation at runtime.
- Snapsnot Crash Consistency Implemented serialized transactional consistency of snapshots in presence of crash. It was implemented via replaying commited reads from persisted WALs.
- Snapsnot Telemetry & Monitoring Implemented aggregated snapshot capacity from scratch to support statistics collection with p99 latency in range of 30 ms.
Teaching Assistant
Taught Introduction to Data Science (CSE-351) with Prof. Martin Radfar. Helped to set quizes and midterms. Wrote test module to auto generate scores for homework assignments and quizes
Senior Software Engineer
I joined data science team as a software engineer and worked almost three years in Applications ranging from search engine, File Sync Application, NLP Pipelines, Chat bots and applying machine learning algorithms in network based solutions. We developed a novel algorithm which coins the answerability of a question.
- Network Traffic Estimation Implemented a ensembled regression model for network bandwidth prediction using SLA metrics and real feedback from native Speedtest Tool.
- ML Model Realtime Deployment Implemented an ingestion pipeline for model deployment using Storm, Kafka, Python, Java
- Indoor Location Positioning for BLE assets 95% classifying the region for Static Assets and 68% accuracy in Regression based on region triangulation using RSSI and the interference correction
- Network Traffic Identification Developed a multi-modal Machine Learning model for network traffic classification for Audio and Video streaming using Ensemble of regression and auto-encoders.
- Data dump tool for AWS Athena Added an ad-hoc client to Storm topology to support collection of Real-time data to Disk & Batch pipeline using Java, Spark.
- Live BLE asset view Developed a live Bluetooth Low Energy asset view portal to support data collection team, adding 70% more correctness to site data calibration using JavaScript, Kafka, Python
- Machine Learning Algorithm object storage service Designed continuous delivery pipeline for Machine Learning models using gRPC, protobuf, Redis, RabbitMQ, AWS S3 improving deployment frequency by 40%. Wrote multiple clients in Java, python, Go, C++ to facilitate object serialization, improving frequency of model update as weekly release.
- Single cell Identity Classification Auto-encoder based neural network to identify the single cells with 83% accuracy for cell based print technology.
- File Sync Application Lead development of file sync tool developement; Implemented delta file sync Service in c++ to reduce load in sync server by 200%.; Wrote stateless software update tool to push new releases without killing of running tool; Developed a client application update framework to support release notification for multi-talent architecture using Java, C++, Python; Designed rate limiter service to prevent throttling of indexing service, improving sync frequency by 30%.
- Financial Document Search Engine Improved search relevancy by 45%. Improved Online Ontology enhancer by 30%, NLP Pipeline for Document Clustering, Keyword Extraction, Text Classification.
Software Engineer
Joined multimedia team and helped with development of Audio Player in feature phones. Also, I lead development of in-house tools for stress based testing for Wifi/BT/GPS.
- Audio Player Module Improved Audio Player sub-modules Playlist, Integrated BT Stack in MMI Layer
- Internal Tool Development Wrote SW Layer code for Stress Test based Combo(GPS-WiFi-BT) Tool; Implemented event driven asynchronous architecture for packet data interface
Projects
Backup File System in Linux Kernel
Encryption based System tool for Linux Kernel
Music Recommendation System
Implemented recommendation system for music based on mood and activities. Project Link
Research Papers
Conference Papers
- R. Guntur and Mrityunjay Kumar. Learning to fingerprint the latent structure in question articulation. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 73–80, Dec 2018. Paper Link
Thesis / Patents
- Rolis: a software approach to efficiently replicating multi-core transactions. EuroSys 2022 (Development & evaluation of novelty of the paper in an early stage.) Stony Brook University 2020