Penn Courses API Auto-Docs

Quick Info / Links

After working on the backend of Penn Course Alert and Penn Course Plan, I had a strong understanding of the Penn Courses API and our unified backend codebase. Therefore, I was in a good position to improve our codebase's documentation. Clear, detailed, and up-to-date documentation is especially important in a student organization with new members joining every semester. So I designed a system which would automatically infer an API specification from our code, and generate a documentation page with these qualities. The docs are populated with comments and Django help_text strings extracted from our codebase (additionally supporting various optional customizations).

Project Description

Clear, detailed, and up-to-date documentation is very important for any complex codebase. But it is especially important in a complex codebase managed by a student organization, implying a high member turnover rate (approximately 25% of Labs members graduate each year, and the club's most knowledgeable members are concentrated disproportionately in the graduating group). At the same time, our members develop our codebase as a part-time project (with their primary "employment" being their college studies). Lots of open-source projects with part-time developers suffer from a lack of focus or effort allocated to the less exciting tasks. Our documentation's combined high value and high risk of neglect was the main problem targeted by my auto-docs system. The system would generate an accurate spec and high-quality default documentation without any need for manual input, by inspecting the codebase. Additionally, improving the comprehensibility of the generated docs with comments and customizations would be easy and enforcable via code reviews.

My starting point for this system was Django's AutoSchema class and the Redoc project. However, simply installing Redoc and using DRF's default AutoSchema class resulted in docs which were quite barebones. These docs could really only serve as a reference for developers already familiar with the codebase, as many intricacies and details of the API were not represented or explained anywhere in the docs.

So, I set out to improve DRF's AutoSchema class both in its default behavior and customizability. I also modified the code underlying Redoc with the same goals. I created an AutoSchema subclass called PcxAutoSchema, which is the workhorse behind our auto-docs system. I also wrote some code to modify/improve the resulting OpenAPI schema (built up by DRF using the codebase's various PcxAutoSchema instances) in ways beyond the reach of the AutoSchema class. I implemented all these changes in a docs_settings.py file, also including a section at the top of the file for some manual customizations (which weren't appropriate to afford elsewhere in the codebase). I even included comprehensive meta-docs in this file, so future members can understand how to maintain the docs as they modify the codebase and develop new features.

Many of my improvements to DRF's AutoSchema class are generalizable beyond the Penn Courses codebase. I intend open a pull request in the Django REST Framework repo so that DRF's many users can benefit from these improvements, just as Labs has.