Codebase Study
Codebase Study
What This Repository Is
This repository is a customized academic website built on top of the academicpages fork of the Minimal Mistakes Jekyll theme. The site is primarily content-driven: most of the “application logic” lives in Jekyll configuration, Liquid templates, front matter, and static assets rather than in a conventional backend or frontend app framework.
At a high level:
- Source content lives in Markdown files under
_pages,_posts, and several Jekyll collections. - Jekyll reads
_config.ymlplus those source files. - Layouts under
_layoutsand partials under_includesrender the final HTML. - SCSS under
_sassandassets/css/main.scsscompiles into site CSS. - JavaScript from
assets/js/_main.jsis bundled intoassets/js/main.min.jsthrough the Node-based minification script inpackage.json. - Generated site output is written to
_site/.
This repo is best understood as a static-site publishing system with some utility scripts for generating publication and talk entries.
Technology Stack
Core stack
- Jekyll static site generator
- Liquid templating
- GitHub Pages-compatible plugin set via the
github-pagesgem - Minimal Mistakes / academicpages theme structure
- Sass/SCSS for styling
- jQuery-based frontend behavior
Runtime and build dependencies
- Ruby/Bundler for Jekyll builds
- Node/npm only for JavaScript bundling and minification
- Optional Python and Jupyter for content-generation scripts in
markdown_generator/ - Optional R/RStudio workflow indicated by
Website.Rprojand the_source/*.Rmdfiles
Repository Shape
Key top-level paths
| Path | Role |
|---|---|
_config.yml |
Main Jekyll/site configuration |
_config.dev.yml |
Local development overrides |
_pages/ |
Manually authored top-level pages and route entry points |
_posts/ |
Blog posts |
_publications/ |
Publication collection items |
_research/ |
Research collection items |
_talks/ |
Talks/presentations collection items |
_teaching/ |
Teaching collection items |
_layouts/ |
Jekyll layouts |
_includes/ |
Reusable Liquid partials |
_data/ |
Navigation and UI text data |
_sass/ |
Theme and custom SCSS partials |
assets/ |
CSS entrypoint, JS source, fonts |
files/ |
Static downloadable assets such as PDFs and HTML teaching materials |
images/ |
Static site images |
markdown_generator/ |
Python/Jupyter utilities for generating content files |
talkmap/ |
Standalone Leaflet map page and location data |
_site/ |
Generated output; ignored by Git |
_source/ |
R Markdown source material for some post-related workflows |
Content counts at time of review
_posts: 17 files_publications: 36 files_research: 8 files_talks: 1 file_teaching: 4 files_shortpapers: 46 files
_shortpapers exists in the repository but is not configured as a Jekyll collection. That matters operationally and is discussed later.
Build and Render Flow
Main path from source to HTML
- Jekyll starts with
_config.yml. _config.dev.ymlcan override settings for local development.- Files in
_pagesand configured collections are loaded into site/page variables. - Layouts in
_layoutscompose the page shell and page structure. - Includes in
_includesrender navigation, author profile, archive cards, analytics, footer, and page metadata. - SCSS is compiled from
assets/css/main.scss, which imports the theme partials plus_sass/_custom.scss. - JS is served from
assets/js/main.min.js. - Output is emitted into
_site/.
Recommended local commands
The repository itself points to this workflow:
bundle install
bundle exec jekyll serve --config _config.yml,_config.dev.yml
If JavaScript source changes:
npm install
npm run build:js
package.json also supports watch:js, but the main site is still Jekyll-first rather than Node-first.
Main Configuration
_config.yml
This file is the control plane of the site. Important settings:
- Site identity:
title,name,description,url,repository
- Social and SEO:
- Open Graph image
- Twitter handle
- ORCID, Google Scholar, DBLP, GitHub, LinkedIn-style author links
- Analytics:
- GoatCounter is enabled through
_includes/analytics-providers/goatcounter.html
- GoatCounter is enabled through
- Include/exclude behavior:
_pagesandfilesare explicitly includednode_modules,package.json, vendor JS source folders, and other non-site files are excluded
- Markdown and syntax highlighting:
kramdownrouge
- Collections:
publications,research,talks,teaching
- Plugins:
jekyll-paginatejekyll-sitemapjekyll-gistjekyll-feedjekyll-redirect-from
- HTML compression:
- enabled through
compress_html
- enabled through
_config.dev.yml
This is a clean local override file:
- sets
urlandbaseurlto empty values so local builds use relative links - disables analytics
- expands Sass output
- injects a Disqus shortname for local dev theme testing
Collection configuration
The configured collections are:
| Collection | Output | Permalink pattern |
|---|---|---|
publications |
true |
/:collection/:path/ |
research |
false |
/:collection/:path/ |
talks |
true |
/:collection/:path/ |
teaching |
true |
/:collection/:path/ |
Two details are especially important:
research.outputis currentlyfalse, so Jekyll will not emit standalone research item pages even though the collection is iterated on the/research/page.- The default block for the
researchcollection is commented out, so research items rely on their own front matter for layout and metadata.
Layout System
Layout chain
The actual render hierarchy is:
compress -> default -> one of archive, single, single-portfolio, talk
_layouts/compress.html
- theme-provided HTML minifier/compressor
- disabled in development via
compress_html.ignore.envs
_layouts/default.html
This is the shared page shell:
- includes
<head> - renders the top masthead/nav
- inserts the page content
- renders the footer
- loads
assets/js/main.min.js
_layouts/archive.html
Used for listing pages such as:
/publications//research//posts//talks//cv/
It wraps content in the site shell and sidebar, then renders the page title plus whatever list markup the page body emits.
_layouts/single.html
This is the standard detail layout for:
- posts
- publications
- teaching items
- general pages using the default single-page structure
It handles:
- hero image logic
- breadcrumbs
- schema metadata
- title/date/venue/citation display
- table of contents support
- taxonomy footer
- related posts
- comments hook
_layouts/single-portfolio.html
This is the research-item detail layout. It is close to single.html but stripped down:
- no date/modified-date display block beyond venue/date context
- intended for research/project-style presentation
- used by files in
_research/
_layouts/talk.html
Used for talk detail pages. It specializes the metadata header to show:
- date
- talk type
- venue
- location
Include Layer
The site behavior depends heavily on _includes.
Core includes
head.html- SEO include
- feed link
- viewport and CSS link
masthead.html- top navigation from
_data/navigation.yml
- top navigation from
sidebar.html- renders author profile and optional page-specific sidebar blocks
author-profile.html- renders avatar, bio, and social/contact links
- defaults to
site.authorunlesspage.authorpoints into_data/authors.yml
footer.html- social links plus powered-by footer
scripts.html- loads
assets/js/main.min.js - includes analytics and comment-provider scripts
- loads
Archive card includes
archive-single.html- most important custom include in this repo
- publication entries are rendered differently from normal posts
- publication titles are not linked to an internal page by default
- optional abstract is rendered in a
<details>block - venue/news/icon links are shown inline
- research items render their excerpt inside a clickable overlay container
archive-single-cv.html- simplified list entry for CV sections
archive-single-talk.html- talk-specific listing format
archive-single-talk-cv.html- talk entry formatting for CV page
Notable custom or legacy files
_includes/archive-single_backup.html- untracked backup include
- not part of the active render path
Page Layer
Navigation
Top-level navigation is defined in _data/navigation.yml and currently exposes:
- Publications
- Research
- Teaching
Several other pages exist but are not in the nav:
- home
/ - posts
/posts/ - talks
/talks/ - CV
/cv/ - software
/software/ - teaching materials
/teaching-materials/ - tags
/tags/ - terms
/terms/ - sitemap
/sitemap/ - talk map
/talkmap.html
Important pages
_pages/about.md
- serves as the home page (
/) - large content-heavy landing page
- mixes biography, recruiting information, office hours, projects, awards, and links to publications/assets
- relies on
author_profile: truefor the sidebar
_pages/publications.md
- loops through
site.publications reversed - renders each item via
archive-single.html - adds a Google Scholar link if the author config supplies it
_pages/research.md
- manually introduces the research agenda
- sorts research items by
order_number - renders them using
archive-single.html type="grid" - expects research items to have teaser/excerpt imagery
_pages/teaching.md
- fully hand-authored page rather than derived from
site.teaching - current content is prose plus manual course listings
_pages/cv.md
- uses archive layout
- embeds a PDF
- also loops through
site.publications,site.talks, andsite.teaching - contains clearly stale imported content from a different site in several sections
_pages/software.md and _pages/teaching-materials.md
- both appear to be inherited or copied from another academicpages-derived site
- they are functional pages, but the content does not match the rest of this repository’s subject matter
Content Model
Posts: _posts/
- standard Jekyll blog posts
- rendered through
single.html - surfaced on
/posts/ - grouped by year in
_pages/posts.html
Publications: _publications/
Publication files follow a fairly regular front matter schema:
titlecollection: publicationspermalink- optional
excerpt datevenue- optional
link - optional
paperurl citation- sometimes
abstract,news,code,github
Publications are a first-class collection and are surfaced correctly through the site.
Research: _research/
Research entries use:
layout: single-portfoliocollection: researchorder_numberexcerptthat often contains raw HTML image markupheader.og_image
These files are treated as project cards on the /research/ page.
Important caveat:
- Because
research.outputis currentlyfalse, the collection can be iterated but its detail pages are not emitted. archive-single.htmlstill links research excerpts topost.url.- That means the listing page is designed as if research detail pages exist, while config currently disables them.
Talks: _talks/
- currently only one talk file exists
- rendered through
talk.html - surfaced on
/talks/ talkmap_linkis disabled in config, so the talk map is not linked from the talks page
Teaching: _teaching/
- configured as a collection with output enabled
- current files look like template/demo entries rather than the hand-authored teaching page
- there is a split between the collection and the custom
/teaching/page content
Short papers: _shortpapers/
This is the largest architectural inconsistency currently in the repo.
- The folder name starts with
_, which implies Jekyll-special handling. - The folder is not listed in
collections:in_config.yml. - Jekyll collection membership is determined by directory/configuration, not by front matter alone.
- Even though the files inside
_shortpapers/declarecollection: publications, they do not become publication items just because of that front matter.
Operationally, these files are best treated as dormant content unless the site config is updated to ingest them explicitly.
Styling and Frontend Behavior
CSS pipeline
assets/css/main.scss is the single SCSS entrypoint. It imports:
- theme primitives from
_sass/ - font and popup vendor styles
_sass/_custom.scss
_sass/_custom.scss contains actual repo-specific styling:
.zoomhover enlargement for icon links.containerand.overlaystyling used by research card excerpts
JavaScript pipeline
The active JS entrypoint is assets/js/main.min.js, produced from:
- vendor jQuery
- several jQuery plugins
assets/js/_main.js
assets/js/_main.js handles:
- sticky footer spacing
- responsive sticky sidebar behavior
- author profile “Contact” toggle
- smooth scrolling
- image lightbox initialization via Magnific Popup
Apparently unused collapse assets
assets/js/collapse.jsassets/css/collapse.css
I did not find these referenced from the active layout/includes. They look like leftover assets or assets intended for manually embedded HTML, not for the main site shell.
Data and Static Assets
_data/
_data/navigation.yml: top nav items_data/ui-text.yml: theme text strings_data/authors.yml: stock template data; appears unused because no page setsauthor: ...
files/
This is the download bucket for the site:
- CV PDFs
- research/teaching statements
- award documents
- publication PDFs
- static HTML teaching materials under
files/html/
There are also mirrored/generated-looking subdirectories such as files/_site/, which appear to be legacy output rather than primary source.
images/
Contains:
- profile/headshot images
- favicon/browser assets
- research illustrations
- software gallery images
- some
_sitemirrored assets that look generated or copied
Content-Generation Utilities
markdown_generator/
This folder is the repo’s main automation area for content authoring.
Files of note:
publications.py- generates Markdown files from
publications.tsv
- generates Markdown files from
talks.py- generates talk Markdown files from
talks.tsv
- generates talk Markdown files from
pubsFromBib.py- generates publication Markdown from BibTeX
*.ipynb- notebook versions of the same workflows
These scripts write directly into:
_publications/_talks/
This is the most code-like part of the repository outside the theme assets.
talkmap/
- standalone Leaflet map page
- location data stored in
org-locations.js - page embedded through
_pages/talkmap.html - not integrated deeply into the main build system beyond the iframe page
_source/
Contains .Rmd files and a CSV source file. This suggests an older or parallel content-generation pipeline for some posts or visualizations, but it is not part of the active Jekyll build path directly.
Operational Caveats and Maintenance Notes
1. _shortpapers/ is not wired into Jekyll
This is the biggest structural issue in the repo as currently checked in.
- Content exists.
- The directory name suggests “source content”.
- The files resemble publication entries.
- The config does not register the collection.
Result: those files are very likely ignored by the active site build.
2. Research collection configuration conflicts with research page behavior
_pages/research.mdrenders research items as if they are clickable projects.archive-single.htmllinks research excerpts topost.url._config.ymlsetsresearch.output: false.
If the intent is a grid of cards linking to full research pages, current config works against that intent.
3. The repo contains significant template residue and stale imported content
Examples:
_pages/software.md_pages/teaching-materials.md- parts of
_pages/cv.md _data/authors.ymlportfolio/untracked sample contentpublications_backup/untracked backups_includes/archive-single_backup.html
These do not all break the site, but they blur the boundary between active source and inherited template material.
4. Generated or mirrored output exists outside _site/
Examples:
images/_site/files/_site/
These are easy to mistake for primary source files. They should be treated carefully.
5. .gitignore is minimal
Current ignore rules do not cover:
.DS_Store- notebook checkpoints
- backup directories
- ad hoc generated files
That is why local junk shows up easily in git status.
6. No test suite or CI configuration is present
- there is no
.github/workflows/ - there are no automated tests
- validation is effectively “does Jekyll build and does the rendered page look right”
For this kind of repository that is common, but it means breakage is mostly caught manually.
7. Current local build is blocked on Ruby/Jekyll compatibility
When reviewed in this workspace, the following command failed:
bundle exec jekyll build --config _config.yml,_config.dev.yml --destination /tmp/academic-webpages-doc-check
Observed failure mode:
- Jekyll
3.9.0is running under Ruby4.0.x - Ruby no longer exposes
csvas a default bundled library - Jekyll fails with
cannot load such file -- csv
Practical implication:
- the repository structure is valid as a Jekyll site
- but the current local environment cannot build it without dependency adjustment
The lowest-friction fix is likely to add gem "csv" to Gemfile or run under an older Ruby/Jekyll-compatible environment.
8. Frontend build tooling is old and theme-oriented
package.json still identifies the project as minimal-mistakes version 3.4.2. The Node tooling is only used for minification and is not central to development, but it reflects the template’s age.
Practical Mental Model for Future Changes
If you need to change the site, use this sequence:
- Check
_config.ymlfirst for collection behavior, output flags, permalinks, author data, and plugins. - Check
_data/navigation.ymlnext if the change affects the top navigation. - Edit
_pages/if the change is a top-level section or landing page. - Edit collection files if the change is item-level content.
- Edit
_includes/archive-single*.htmlif list display behavior changes. - Edit
_layouts/*.htmlif page shell or detail-page structure changes. - Edit
_sass/_custom.scssif the change is site-specific styling. - Rebuild
assets/js/main.min.jsonly if_main.jsor JS plugins change.
Best Entry Points by Goal
Add or edit a publication
- edit a file in
_publications/ - or regenerate content from
markdown_generator/
Add or edit a research area card
- edit a file in
_research/ - if you need clickable detail pages, revisit
research.outputin_config.yml
Change the homepage
- edit
_pages/about.md
Change navigation
- edit
_data/navigation.yml
Change publication card formatting
- edit
_includes/archive-single.html
Change site-wide metadata or social links
- edit
_config.yml
Change author sidebar content
- edit
author:fields in_config.yml - or wire pages to
_data/authors.ymlif multi-author behavior is desired
Summary
This is a static academic site whose real “application” is the Jekyll content model plus a lightly customized academicpages theme. The repository is understandable once you view it through that lens:
- content lives in Markdown collections
- layout logic lives in Liquid includes/layouts
- presentation lives in Sass and a small jQuery bundle
- automation is limited to publication/talk generation scripts
The main maintenance risk is not complexity. It is ambiguity: some directories are active source, some are generated artifacts, some are imported template residue, and at least one major content directory (_shortpapers/) is currently not wired into the build.