8070 - Data Analyst IV

Location: McLean, VA

Data Engineering: 
•Cleanse, manipulate and analyze large datasets (Structured and Unstructured data – XMLs, JSONs, PDFs) using Hadoop platform.
•Develop Python, Spark, HIVE scripts to filter/map/aggregate data. Scoop to transfer data to and from Hadoop.
•Manage and implement data processes including Data Quality scripts
•Analysis and Modeling: 
•Perform R&D and exploratory analysis using statistical techniques and machine learning clustering methods to understand data.
•Develop data profiling, deduping logic, matching logic for analysis
•Big Data languages such a
•5+ years of experience in processing large volumes and variety of data (Structured and unstructured data, writing code for parallel processing, XMLS, JSONs, PDFs)
•3+ years of programming experience in at least 2 – Python, Spark, Java for data processing and analysis.
•Strong SQL experience
•2+ years of experience – using Hadoop platform and performing analysis. 
Familiarity with Hadoop cluster environment and configurations for resource management for analysis works Python, Spark, HIVE for analytics and developing dashboards