Nabeel Mukhtar

Data Scientist, Software Architect and Developer

Curriculum Vitae

Summary

12+ years of experience in architecture and development of large scale enterprise applications.
Experienced in developing big data analytics and machine learning applications using R, Python, Java and Hadoop.
Well versed in Object Oriented Analysis and Design.
Avid open source developer and contributor.

Skills

Areas of Expertise

Machine Learning, Software Architecture, Text Analysis, Natural Language Processing, Computer Vision, Probabilistic Robotics, Rule Engines.

Programming Languages

Java, Python, R, Scala, C/C++, Octave.

Libraries/Frameworks

Spring, Hibernate, JEE, scikit-learn, TensorFlow, nltk, Hadoop, Pig, Lucene, Drools, OpenNLP, Spark, UIMA.

Platforms

JBoss, Tomcat, Liferay, WebSphere, Docker, Kubernetes, Azure ML, Amazon EC2, Azure ML, Cloudera, BigInsights, MapR, HortonWorks, Google AppEngine.

Databases

Oracle, SQL Server, MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch, Neo4J, BigTable, HBase.

Experience

Senior Software Architect (Creative Chaos, Karachi)

Projects

Podium Data

Podium is an end-to-end data management and preparation platform built from the ground up to leverage low-cost, high-performance Big Data technologies. Podium’s core features (listed below) fill in important data management functionality through an integrated, easy to use browser interface. This allows organizations to take full advantage of the cost effectiveness of Big Data technologies without their inherent complexities, significantly shrinking the time to market for data availability and value.

Roles and Responsibilities

Configured application for certification on multiple platforms like Cloudera, BigInsights, MapR and Hortonworks.

Developed the backend for Data Analytics Engine that runs on Tez, MapReduce and Spark.

Implemented Hadoop Impersonation and Authorization Using Kerberos, Apache Sentry and Apache Ranger.

Wrote the module for Data Export to various targets like FTP, S3, HDFS etc. in multiple formats including PARQUET, AVRO, ORC and TEXT.

Senior Consultant Machine Learning (GfK Etilize, Karachi)

Projects

Smart Data Extraction

Smart Data Extraction uses machine learning and business rules processing to automate as much of product data extraction process as possible.

Roles and Responsibilities

Worked as Software Architect, Developer and Data Scientist.

Architected and developed the search and indexing engine for products.

Worked on full text search of product rich content (documents and images) using Elasticsearch.

Researched techniques for using Named Entity Recognition for product attribute extraction.

Implemented kNN product classification using Elasticsearch MLT API.

Researched Sentiment Analysis techniques for product reviews.

Developed the ETL process for loading data from various heterogeneous data sources into a multi-tenant Elasticsearch data store.

Developed the REST/HATEOAS API for product search using Spring REST.

Principal Data Scientist (Elastica Inc, Karachi)

Projects

Elastica CloudSoc

Elastica is the leader in Data Science Powered™ Cloud Application Security. Its CloudSOC™ platform empowers the companies to confidently leverage Cloud and SaaS applications while staying safe, secure and compliant.

Roles and Responsibilities

Worked as Data Scientist, Architect and Developer

Architected and developed the machine learning workflow for anomaly detection subsystem.

Implemented feature extraction and scaling from user logs related to session, geography, categorical distributions and work habits.

Created methods for detecting outliers using K-means clustering.

Implemented rule-based detection of outliers using Drools rule engine.

Build exploratory models for other clustering/outlier detection algorithms including 1-Class Classification, ORCLUS, DBSCAN and LOF.

Created visualizations for clusters and categorical distributions for exploratory data analysis.

Developed sensitivity analysis method for finding sensitive outlier features.

Implemented bootstrapping for generating synthetic data.

Wrote functions for generating activity graphs from logs for graph analysis.

Investigated supervised learning algorithms like Support Vector Machines and Deep Learning.

Software Architect (Creative Chaos, Karachi)

Projects

Outage Analyzer - Compuware Corporation

Outage Analyzer analyzes real-time stream of Gomez measurement data from different web sites and identifies anomalies in it. Once the anomalies are identified they are classified in an adjacency matrix that identifies their probable causes and affected regions. When these anomalies exceed a certain threshold, they are classified as outages and shown in the UI overlaid on a world map.

Roles and Responsibilities

Worked as Software Developer/Data Scientist

Developed the anomaly detection engine of Outage Analyzer for streaming data.

Implemented the method of detecting anomalies from Gaussian and Binomial distributions of real time data.

Network Sentry/Analytics

Network Sentry/Analytics provides a new perspective on the connections being made to a network. It leverages the data collected by Network Sentry network access control (NAC) to deliver reports and trends that keep one informed on the devices, users, and connections being made to your network. Organizations gain long-term visibility and answers they need to help ensure wireless network capacity, software licenses, mobile device support, and compliance. Network Sentry/Analytics can be leveraged to report on multiple Network Sentry servers. The data is aggregated from each server into a data mart and archived, analyzed, correlated, and reported.

Roles and Responsibilities

Worked as Software Architect/Software Developer

Architected the data analytics and reporting platform for the application.

Worked as a developer on the data collection module.

Implemented both pull-based and push-based data collection workflows.

BezNext

BEZNext uses a new approach to capacity management of Data Warehouses based on modeling and performance prediction. A model is a mathematical representation of the essential elements of the Data Warehouse system’s architecture, physical and virtual configuration, parallel processing and other characteristics that impact performance: Things such as processor speed, network latency, memory capacity, I/O speeds as well as software limitations such as job concurrency, degree of parallelism and workload priority. BEZNext utilizes a queuing network model approach that represents the IT infrastructure as a network of queues and servers. It then employs analytics to answer what–if questions to evaluate and compare the options and alternatives for capacity management.

Roles and Responsibilities

Developed model service to align and aggregate various metrics and characterize workloads into discrete classes.

Developed the data collection module for Microsoft SQL Server.

Developed the database advisor module for Microsoft SQL Server.

Developed the auto discovery module for VMWare.

Developed the workload management module for Teradata.

Researched performance data collection methods for .NET applications.

face2face

face2face is a location-aware social network that solves the problems of privacy and safety that have become synonymous with social networking – so that more people can benefit from the advantages that come from knowing which friends are nearby.

Roles and Responsibilities

Architected the backend of the application.

Worked as a developer on the host module for the social networking app.

Developed gateways to external services like Facebook, Twitter and LinkedIn.

Implemented the social network analysis services using graph database.

Senior Software Engineer (Etilize Inc, Karachi)

Projects

ConQuire CMS

ConQuire CMS is an extensive product information management system. It allows the extraction of product specifications into atomic parameters that are fully searchable and can be displayed in a variety of templates. It supports the extraction and display of content in multiple languages and markets.

Roles and Responsibilities

Developed the taxonomy editor for the CMS that allows user to create parameters, display attributes and templates and group them into various categories.

Was also involved in the development of multi-market/multilingual module for ConQuire.

Worked on the cross-sell/up-sell module of the application that suggested products similar to a product.

Worked on the data synchronization module whose function was to synchronize the data between various geographically dispersed servers.

inQuire Catalog Search

inQuire is a catalog search engine that uses parameterized and faceted search over a wide category of products.

Roles and Responsibilities

Worked on the parametric search module and also developed the synchronization module for data sync between ConQuire CMS and inQuire Search.

Senior Software Architect (E-Dev Technology, Karachi)

Projects

Dupont E-Pass Application Redesign - Dupont

E-Pass (electronic passport) application was developed to replace the use of static passwords with two-factor authentication for Dupont’s major IT platforms. It involves not only the management of SecurID cards and tokens, but also the creation of a “tree-of-trust” and a central registration application to allow responsibility and maintenance for the RSA tokens to be distributed throughout the company.

Roles and Responsibilities

Worked as the Software Architect in the elaboration phase of the project and developed a software design document. Later worked as a developer on the services layer of the application and also designed the notification and scheduling modules.

IDEA Suite

IDEA is a model based Java application generator. It allows the user to design the application using an abstract model and then automatically generates JSP-based front-end and an XML-based persistence engine resulting in a fully executable Java web application that can be deployed on any application server. The persistence engine supports a variety of configurations including RDBMS, XML-DB and flat files. The display engine supports multiple themes and navigation patterns.

Roles and Responsibilities

Worked as a Software Architect in the elaboration phase of the project and later developed the persistence engine and the workflow engine for the application.

Education

B.E. Computer Systems Engineering

N.E.D. University of Engineering and Technology, Karachi

Passed with 83% marks.

Self-Driving Car Engineering Nanodegree

Udacity

Projects

Finding Lane Lines on the Road

The goals / steps of this project are the following:

Make a pipeline that finds lane lines on the road

Reflect on your work in a written report

Build a Traffic Sign Recognition Classifier

The goals / steps of this project are the following:

Load the data set (see below for links to the project data set)

Explore, summarize and visualize the data set

Design, train and test a model architecture

Use the model to make predictions on new images

Analyze the softmax probabilities of the new images

Summarize the results with a written report

Behavioral Cloning

The goals / steps of this project are the following:

Use the simulator to collect data of good driving behavior

Build, a convolution neural network in Keras that predicts steering angles from images

Train and validate the model with a training and validation set

Test that the model successfully drives around track one without leaving the road

Summarize the results with a written report

Advanced Lane Finding

The goals / steps of this project are the following:

Compute the camera calibration matrix and distortion coefficients given a set of chessboard images.

Apply a distortion correction to raw images.

Use color transforms, gradients, etc., to create a thresholded binary image.

Apply a perspective transform to rectify binary image (“birds-eye view”).

Detect lane pixels and fit to find the lane boundary.

Determine the curvature of the lane and vehicle position with respect to center.

Warp the detected lane boundaries back onto the original image.

Output visual display of the lane boundaries and numerical estimation of lane curvature and vehicle position.

Vehicle Detection

The goals / steps of this project are the following:

Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier

Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.

Note: for those first two steps don’t forget to normalize your features and randomize a selection for training and testing.

Implement a sliding-window technique and use your trained classifier to search for vehicles in images.

Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.

Estimate a bounding box for vehicles detected.

Certificate of Specialization in Data Mining

University of Illinois at Urbana-Champaign through Coursera

XSeries Certificate, Big Data

University of California, Berkeley through edX

Independent Coursework

Machine Learning - Andrew Ng (Stanford University)
Tackling the Challenges of Big Data (MIT Professional Education)
Software Architecture: Principles and Practices - SEI (Carnegie Mellon University)
Natural Language Processing - Dan Jurafsky & Chris Manning (Stanford University)
Statistical Learning - Trevor Hastie & Rob Tibshirani (Stanford University)
Introduction to Data Science – Bill Howe (University of Washington)
Programming A Robotic Car - Sebastian Thrun (Udacity)
Mining Massive Datasets – Stanford University
Applied Cryptography - David Evans (Udacity)
Neural Networks for Machine Learning - Geoffrey Hinton (University of Toronto)
Social Network Analysis - Lada Adamic (University of Michigan)
Practical Machine Learning - Jeff Leek (John Hopkins University)

Memberships

Member of ACM, ACM SIGKDD, ACM SIGIR, ACM SIGMETRICS.
Member of IEEE Computer Society.

Languages

English
Urdu
German
Persian