HarvardX Data Science Series Updates

Principal components plot of benign and malignant breast tumors from the dslabs brca dataset, one of seven datasets I added to this R package.

I began programming as a student in the HarvardX Data Science Professional Program, a massive open online course series from Rafael Irizarry on edX. To become more comfortable with coding and project workflows, I contributed dozens of edits to the course textbook, Introduction to Data Science, via its open source GitHub repository. I initially acted as a volunteer, systematically revewing the textbook and acting as a community teaching assistant, but my diligence earned me a full-time position improving the course series.

As the new lead content developer for the program, my first major project was to write additional assessments for the courses in order to add value to the certificate. I formatted these problem sets as case studies and included relevant data in the course series R package, dslabs, available on CRAN. I added 7 new datasets to dslabs with documentation and scripts. I wrote a guest post on Simply Statistics describing the new datasets and demonstrating some potential uses.

The new problem sets launched in the July 2019 versions of the HarvardX Data Science Professional Certificate courses, which currently have over 75,000 unique enrolled learners. In that course release, I also added supplemental notes and links to the accompanying textbook on every content page. I currently supervise a team of TAs answering student queries on the course discussion boards and maintain existing materials.

My current projects involve revising the HarvardX Genomics Data Analysis XSeries for release in March 2020 and co-authoring the official solution manual for the Introduction to Data Science textbook.

Amy Gill
Lead Content Developer, Data Science

Biomedical researcher, data scientist and educator.