Codebase Study

Codebase Study

What This Repository Is

This repository is a customized academic website built on top of the academicpages fork of the Minimal Mistakes Jekyll theme. The site is primarily content-driven: most of the “application logic” lives in Jekyll configuration, Liquid templates, front matter, and static assets rather than in a conventional backend or frontend app framework.

At a high level:

  1. Source content lives in Markdown files under _pages, _posts, and several Jekyll collections.
  2. Jekyll reads _config.yml plus those source files.
  3. Layouts under _layouts and partials under _includes render the final HTML.
  4. SCSS under _sass and assets/css/main.scss compiles into site CSS.
  5. JavaScript from assets/js/_main.js is bundled into assets/js/main.min.js through the Node-based minification script in package.json.
  6. Generated site output is written to _site/.

This repo is best understood as a static-site publishing system with some utility scripts for generating publication and talk entries.

Technology Stack

Core stack

  • Jekyll static site generator
  • Liquid templating
  • GitHub Pages-compatible plugin set via the github-pages gem
  • Minimal Mistakes / academicpages theme structure
  • Sass/SCSS for styling
  • jQuery-based frontend behavior

Runtime and build dependencies

  • Ruby/Bundler for Jekyll builds
  • Node/npm only for JavaScript bundling and minification
  • Optional Python and Jupyter for content-generation scripts in markdown_generator/
  • Optional R/RStudio workflow indicated by Website.Rproj and the _source/*.Rmd files

Repository Shape

Key top-level paths

Path Role
_config.yml Main Jekyll/site configuration
_config.dev.yml Local development overrides
_pages/ Manually authored top-level pages and route entry points
_posts/ Blog posts
_publications/ Publication collection items
_research/ Research collection items
_talks/ Talks/presentations collection items
_teaching/ Teaching collection items
_layouts/ Jekyll layouts
_includes/ Reusable Liquid partials
_data/ Navigation and UI text data
_sass/ Theme and custom SCSS partials
assets/ CSS entrypoint, JS source, fonts
files/ Static downloadable assets such as PDFs and HTML teaching materials
images/ Static site images
markdown_generator/ Python/Jupyter utilities for generating content files
talkmap/ Standalone Leaflet map page and location data
_site/ Generated output; ignored by Git
_source/ R Markdown source material for some post-related workflows

Content counts at time of review

  • _posts: 17 files
  • _publications: 36 files
  • _research: 8 files
  • _talks: 1 file
  • _teaching: 4 files
  • _shortpapers: 46 files

_shortpapers exists in the repository but is not configured as a Jekyll collection. That matters operationally and is discussed later.

Build and Render Flow

Main path from source to HTML

  1. Jekyll starts with _config.yml.
  2. _config.dev.yml can override settings for local development.
  3. Files in _pages and configured collections are loaded into site/page variables.
  4. Layouts in _layouts compose the page shell and page structure.
  5. Includes in _includes render navigation, author profile, archive cards, analytics, footer, and page metadata.
  6. SCSS is compiled from assets/css/main.scss, which imports the theme partials plus _sass/_custom.scss.
  7. JS is served from assets/js/main.min.js.
  8. Output is emitted into _site/.

The repository itself points to this workflow:

bundle install
bundle exec jekyll serve --config _config.yml,_config.dev.yml

If JavaScript source changes:

npm install
npm run build:js

package.json also supports watch:js, but the main site is still Jekyll-first rather than Node-first.

Main Configuration

_config.yml

This file is the control plane of the site. Important settings:

  • Site identity:
    • title, name, description, url, repository
  • Social and SEO:
    • Open Graph image
    • Twitter handle
    • ORCID, Google Scholar, DBLP, GitHub, LinkedIn-style author links
  • Analytics:
    • GoatCounter is enabled through _includes/analytics-providers/goatcounter.html
  • Include/exclude behavior:
    • _pages and files are explicitly included
    • node_modules, package.json, vendor JS source folders, and other non-site files are excluded
  • Markdown and syntax highlighting:
    • kramdown
    • rouge
  • Collections:
    • publications, research, talks, teaching
  • Plugins:
    • jekyll-paginate
    • jekyll-sitemap
    • jekyll-gist
    • jekyll-feed
    • jekyll-redirect-from
  • HTML compression:
    • enabled through compress_html

_config.dev.yml

This is a clean local override file:

  • sets url and baseurl to empty values so local builds use relative links
  • disables analytics
  • expands Sass output
  • injects a Disqus shortname for local dev theme testing

Collection configuration

The configured collections are:

Collection Output Permalink pattern
publications true /:collection/:path/
research false /:collection/:path/
talks true /:collection/:path/
teaching true /:collection/:path/

Two details are especially important:

  1. research.output is currently false, so Jekyll will not emit standalone research item pages even though the collection is iterated on the /research/ page.
  2. The default block for the research collection is commented out, so research items rely on their own front matter for layout and metadata.

Layout System

Layout chain

The actual render hierarchy is:

compress -> default -> one of archive, single, single-portfolio, talk

_layouts/compress.html

  • theme-provided HTML minifier/compressor
  • disabled in development via compress_html.ignore.envs

_layouts/default.html

This is the shared page shell:

  • includes <head>
  • renders the top masthead/nav
  • inserts the page content
  • renders the footer
  • loads assets/js/main.min.js

_layouts/archive.html

Used for listing pages such as:

  • /publications/
  • /research/
  • /posts/
  • /talks/
  • /cv/

It wraps content in the site shell and sidebar, then renders the page title plus whatever list markup the page body emits.

_layouts/single.html

This is the standard detail layout for:

  • posts
  • publications
  • teaching items
  • general pages using the default single-page structure

It handles:

  • hero image logic
  • breadcrumbs
  • schema metadata
  • title/date/venue/citation display
  • table of contents support
  • taxonomy footer
  • related posts
  • comments hook

_layouts/single-portfolio.html

This is the research-item detail layout. It is close to single.html but stripped down:

  • no date/modified-date display block beyond venue/date context
  • intended for research/project-style presentation
  • used by files in _research/

_layouts/talk.html

Used for talk detail pages. It specializes the metadata header to show:

  • date
  • talk type
  • venue
  • location

Include Layer

The site behavior depends heavily on _includes.

Core includes

  • head.html
    • SEO include
    • feed link
    • viewport and CSS link
  • masthead.html
    • top navigation from _data/navigation.yml
  • sidebar.html
    • renders author profile and optional page-specific sidebar blocks
  • author-profile.html
    • renders avatar, bio, and social/contact links
    • defaults to site.author unless page.author points into _data/authors.yml
  • footer.html
    • social links plus powered-by footer
  • scripts.html
    • loads assets/js/main.min.js
    • includes analytics and comment-provider scripts

Archive card includes

  • archive-single.html
    • most important custom include in this repo
    • publication entries are rendered differently from normal posts
    • publication titles are not linked to an internal page by default
    • optional abstract is rendered in a <details> block
    • venue/news/icon links are shown inline
    • research items render their excerpt inside a clickable overlay container
  • archive-single-cv.html
    • simplified list entry for CV sections
  • archive-single-talk.html
    • talk-specific listing format
  • archive-single-talk-cv.html
    • talk entry formatting for CV page

Notable custom or legacy files

  • _includes/archive-single_backup.html
    • untracked backup include
    • not part of the active render path

Page Layer

Top-level navigation is defined in _data/navigation.yml and currently exposes:

  • Publications
  • Research
  • Teaching

Several other pages exist but are not in the nav:

  • home /
  • posts /posts/
  • talks /talks/
  • CV /cv/
  • software /software/
  • teaching materials /teaching-materials/
  • tags /tags/
  • terms /terms/
  • sitemap /sitemap/
  • talk map /talkmap.html

Important pages

_pages/about.md

  • serves as the home page (/)
  • large content-heavy landing page
  • mixes biography, recruiting information, office hours, projects, awards, and links to publications/assets
  • relies on author_profile: true for the sidebar

_pages/publications.md

  • loops through site.publications reversed
  • renders each item via archive-single.html
  • adds a Google Scholar link if the author config supplies it

_pages/research.md

  • manually introduces the research agenda
  • sorts research items by order_number
  • renders them using archive-single.html type="grid"
  • expects research items to have teaser/excerpt imagery

_pages/teaching.md

  • fully hand-authored page rather than derived from site.teaching
  • current content is prose plus manual course listings

_pages/cv.md

  • uses archive layout
  • embeds a PDF
  • also loops through site.publications, site.talks, and site.teaching
  • contains clearly stale imported content from a different site in several sections

_pages/software.md and _pages/teaching-materials.md

  • both appear to be inherited or copied from another academicpages-derived site
  • they are functional pages, but the content does not match the rest of this repository’s subject matter

Content Model

Posts: _posts/

  • standard Jekyll blog posts
  • rendered through single.html
  • surfaced on /posts/
  • grouped by year in _pages/posts.html

Publications: _publications/

Publication files follow a fairly regular front matter schema:

  • title
  • collection: publications
  • permalink
  • optional excerpt
  • date
  • venue
  • optional link
  • optional paperurl
  • citation
  • sometimes abstract, news, code, github

Publications are a first-class collection and are surfaced correctly through the site.

Research: _research/

Research entries use:

  • layout: single-portfolio
  • collection: research
  • order_number
  • excerpt that often contains raw HTML image markup
  • header.og_image

These files are treated as project cards on the /research/ page.

Important caveat:

  • Because research.output is currently false, the collection can be iterated but its detail pages are not emitted.
  • archive-single.html still links research excerpts to post.url.
  • That means the listing page is designed as if research detail pages exist, while config currently disables them.

Talks: _talks/

  • currently only one talk file exists
  • rendered through talk.html
  • surfaced on /talks/
  • talkmap_link is disabled in config, so the talk map is not linked from the talks page

Teaching: _teaching/

  • configured as a collection with output enabled
  • current files look like template/demo entries rather than the hand-authored teaching page
  • there is a split between the collection and the custom /teaching/ page content

Short papers: _shortpapers/

This is the largest architectural inconsistency currently in the repo.

  • The folder name starts with _, which implies Jekyll-special handling.
  • The folder is not listed in collections: in _config.yml.
  • Jekyll collection membership is determined by directory/configuration, not by front matter alone.
  • Even though the files inside _shortpapers/ declare collection: publications, they do not become publication items just because of that front matter.

Operationally, these files are best treated as dormant content unless the site config is updated to ingest them explicitly.

Styling and Frontend Behavior

CSS pipeline

assets/css/main.scss is the single SCSS entrypoint. It imports:

  • theme primitives from _sass/
  • font and popup vendor styles
  • _sass/_custom.scss

_sass/_custom.scss contains actual repo-specific styling:

  • .zoom hover enlargement for icon links
  • .container and .overlay styling used by research card excerpts

JavaScript pipeline

The active JS entrypoint is assets/js/main.min.js, produced from:

  • vendor jQuery
  • several jQuery plugins
  • assets/js/_main.js

assets/js/_main.js handles:

  • sticky footer spacing
  • responsive sticky sidebar behavior
  • author profile “Contact” toggle
  • smooth scrolling
  • image lightbox initialization via Magnific Popup

Apparently unused collapse assets

  • assets/js/collapse.js
  • assets/css/collapse.css

I did not find these referenced from the active layout/includes. They look like leftover assets or assets intended for manually embedded HTML, not for the main site shell.

Data and Static Assets

_data/

  • _data/navigation.yml: top nav items
  • _data/ui-text.yml: theme text strings
  • _data/authors.yml: stock template data; appears unused because no page sets author: ...

files/

This is the download bucket for the site:

  • CV PDFs
  • research/teaching statements
  • award documents
  • publication PDFs
  • static HTML teaching materials under files/html/

There are also mirrored/generated-looking subdirectories such as files/_site/, which appear to be legacy output rather than primary source.

images/

Contains:

  • profile/headshot images
  • favicon/browser assets
  • research illustrations
  • software gallery images
  • some _site mirrored assets that look generated or copied

Content-Generation Utilities

markdown_generator/

This folder is the repo’s main automation area for content authoring.

Files of note:

  • publications.py
    • generates Markdown files from publications.tsv
  • talks.py
    • generates talk Markdown files from talks.tsv
  • pubsFromBib.py
    • generates publication Markdown from BibTeX
  • *.ipynb
    • notebook versions of the same workflows

These scripts write directly into:

  • _publications/
  • _talks/

This is the most code-like part of the repository outside the theme assets.

talkmap/

  • standalone Leaflet map page
  • location data stored in org-locations.js
  • page embedded through _pages/talkmap.html
  • not integrated deeply into the main build system beyond the iframe page

_source/

Contains .Rmd files and a CSV source file. This suggests an older or parallel content-generation pipeline for some posts or visualizations, but it is not part of the active Jekyll build path directly.

Operational Caveats and Maintenance Notes

1. _shortpapers/ is not wired into Jekyll

This is the biggest structural issue in the repo as currently checked in.

  • Content exists.
  • The directory name suggests “source content”.
  • The files resemble publication entries.
  • The config does not register the collection.

Result: those files are very likely ignored by the active site build.

2. Research collection configuration conflicts with research page behavior

  • _pages/research.md renders research items as if they are clickable projects.
  • archive-single.html links research excerpts to post.url.
  • _config.yml sets research.output: false.

If the intent is a grid of cards linking to full research pages, current config works against that intent.

3. The repo contains significant template residue and stale imported content

Examples:

  • _pages/software.md
  • _pages/teaching-materials.md
  • parts of _pages/cv.md
  • _data/authors.yml
  • portfolio/ untracked sample content
  • publications_backup/ untracked backups
  • _includes/archive-single_backup.html

These do not all break the site, but they blur the boundary between active source and inherited template material.

4. Generated or mirrored output exists outside _site/

Examples:

  • images/_site/
  • files/_site/

These are easy to mistake for primary source files. They should be treated carefully.

5. .gitignore is minimal

Current ignore rules do not cover:

  • .DS_Store
  • notebook checkpoints
  • backup directories
  • ad hoc generated files

That is why local junk shows up easily in git status.

6. No test suite or CI configuration is present

  • there is no .github/workflows/
  • there are no automated tests
  • validation is effectively “does Jekyll build and does the rendered page look right”

For this kind of repository that is common, but it means breakage is mostly caught manually.

7. Current local build is blocked on Ruby/Jekyll compatibility

When reviewed in this workspace, the following command failed:

bundle exec jekyll build --config _config.yml,_config.dev.yml --destination /tmp/academic-webpages-doc-check

Observed failure mode:

  • Jekyll 3.9.0 is running under Ruby 4.0.x
  • Ruby no longer exposes csv as a default bundled library
  • Jekyll fails with cannot load such file -- csv

Practical implication:

  • the repository structure is valid as a Jekyll site
  • but the current local environment cannot build it without dependency adjustment

The lowest-friction fix is likely to add gem "csv" to Gemfile or run under an older Ruby/Jekyll-compatible environment.

8. Frontend build tooling is old and theme-oriented

package.json still identifies the project as minimal-mistakes version 3.4.2. The Node tooling is only used for minification and is not central to development, but it reflects the template’s age.

Practical Mental Model for Future Changes

If you need to change the site, use this sequence:

  1. Check _config.yml first for collection behavior, output flags, permalinks, author data, and plugins.
  2. Check _data/navigation.yml next if the change affects the top navigation.
  3. Edit _pages/ if the change is a top-level section or landing page.
  4. Edit collection files if the change is item-level content.
  5. Edit _includes/archive-single*.html if list display behavior changes.
  6. Edit _layouts/*.html if page shell or detail-page structure changes.
  7. Edit _sass/_custom.scss if the change is site-specific styling.
  8. Rebuild assets/js/main.min.js only if _main.js or JS plugins change.

Best Entry Points by Goal

Add or edit a publication

  • edit a file in _publications/
  • or regenerate content from markdown_generator/

Add or edit a research area card

  • edit a file in _research/
  • if you need clickable detail pages, revisit research.output in _config.yml

Change the homepage

  • edit _pages/about.md

Change navigation

  • edit _data/navigation.yml

Change publication card formatting

  • edit _includes/archive-single.html
  • edit _config.yml

Change author sidebar content

  • edit author: fields in _config.yml
  • or wire pages to _data/authors.yml if multi-author behavior is desired

Summary

This is a static academic site whose real “application” is the Jekyll content model plus a lightly customized academicpages theme. The repository is understandable once you view it through that lens:

  • content lives in Markdown collections
  • layout logic lives in Liquid includes/layouts
  • presentation lives in Sass and a small jQuery bundle
  • automation is limited to publication/talk generation scripts

The main maintenance risk is not complexity. It is ambiguity: some directories are active source, some are generated artifacts, some are imported template residue, and at least one major content directory (_shortpapers/) is currently not wired into the build.