The goal of the Coleridge Initiative at NYU is to use data to transform the way governments access and use data for the social good. We are a fast-growing university-based startup that has already created dozens of pilot projects, worked with over 100 agencies – federal, state and local - and trained over 450 agency staff. Our program directors – Julia Lane, Rayid Ghani, and Frauke Kreuter – have designed and implemented training programs, research projects and a secure data facility that are attracting national attention, including the Commission on Evidence Based Policy and the Federal Data Strategy.
Our team works with government agencies to break down data barriers around the secure use of confidential data. We do this in two ways. We have developed a secure environment for data (the Administrative Data Research Facility, or ADRF https://coleridgeinitiative.org/computing ), and are building new tools for data stewardship, data discovery and collaboration with some of the top scientists in the nation. We work with government agencies to (1) identify critical agency problems, (2) train staff to solve them, and (3) create products that have value. You can read more about our work at https://coleridgeinitiative.org.
Role & Responsibilities
We are seeking an enthusiastic, analytically minded Research Information Scientist with extensive experience working with data and research processes, as well as demonstrated experience in information or content management. The Research Information Scientist will be the lead on the full life cycle of data ingestion and storage in the ADRF. This is detail-oriented work, and the successful candidate will have complementary technical skills in data management, programming, and user experience as well as knowledge of current technologies, metadata standards and encoding standards (e.g. XML).
The Research Information Scientist will design and develop highly robust, repeatable and scalable workflow patterns to ingest, integrate and publish a wide variety of data from internal and external sources. The successful candidate will be responsible for ensuring that the ADRF's data workflows and pipelines are enterprise-grade – reliable, scalable and secure – and for maintaining infrastructure and operations to support data science activities. The Research Information Scientist will focus on performance tuning, quickly identifying bottlenecks through review of SQL execution plans to maximize ADRF resource utilization and system performance. The successful candidate will also work directly with ADRF development and operations team-members, as well as collaborators and clients, to build out semi-automated approaches to data management, with an emphasis on data quality automation as the Coleridge Initiative builds to scale.
The Research Information Scientist's responsibilities will include:
Managing data ingestion process and troubleshooting/resolving any resulting issues, ensuring the integrity and security of data housed in the ADRF
Performing preliminary quality assessment on data files, correcting obvious issues and then formatting files for ingestion
Contributing, as part of a team, to ADRF platform enhancement projects using appropriate technologies in research and large-scale data management (e.g., Hadoop and contemporaries, parallel databases, cloud services), and/or interactive visualization and specialized data presentation interfaces.
Implementing and documenting data ingestion best practices
Master's Degree in Information Science, Library Science, or Computer Science
Proven experience successfully managing the full ETL and data preparation life cycle of large datasets in a data warehouse
Experience with relational and non-relational databases and other data storage and access technologies, such as MySQL, PostgreSQL, Aurora, Citus Data, Oracle, Hadoop, Spark, and/or AWS Athena.
Strong communication skills, team player
Additional Desired Experience & Skills
Experience with development of web applications and APIs using open source software
Experience working with large scale administrative datasets
Knowledge of key open source software resources
Prior experience in SQL and working with database technologies like Postgres
Demonstrated ability to write analytical reports
Please include a resume and cover letter.
For people in the EU, click here for information on your privacy rights under GDPR: www.nyu.edu/it/gdpr
New York University is an Equal Opportunity Employer. New York University is committed to a policy of equal treatment and opportunity in every aspect of its hiring and promotion process without regard to race, color, creed, religion, sex, pregnancy or childbirth (or related medical condition), sexual orientation, partnership status, gender and/or gender identity or expression, marital, parental or familial status, caregiver status, national origin, ethnicity, alienage or citizenship status, veteran or military status, age, disability, predisposing genetic characteristics, domestic violence victim status, unemployment status, or any other legally protected basis. Women, racial and ethnic minorities, persons of minority sexual orientation or gender identity, individuals with disabilities, and veterans are encouraged to apply for vacant positions at all levels.
NYU aims to be among the greenest urban campuses in the country and carbon neutral by 2040. Learn more at nyu.edu/sustainability
Internal Number: 73003
About New York University
Founded in 1831, New York University is now one of the largest private universities in the United States. Of the more than 3,000 colleges and universities in America, New York University is one of only 60 member institutions of the distinguished Association of American Universities.