Science in the Moment

Live discovery feed of new papers and their citation context, with topic maps and influence trails.

New works (query: machine learning)

  • Genetic algorithms in search, optimization, and machine learning
    • Choice Reviews Online • 1989
  • C4.5: Programs for Machine Learning
    J. R. Quinlan • 1992
  • Gaussian Processes for Machine Learning
    Carl Edward Rasmussen, Christopher K. I. Williams • 2005
  • UCI Machine Learning Repository
    Arthur Asuncion • 2007
  • Pattern Recognition and Machine Learning
    Christopher Bishop • Journal of Electronic Imaging • 2007
  • Data Mining: Practical Machine Learning Tools and Techniques
    Ian H. Witten, Eibe Frank, Mark A. Hall • Elsevier eBooks • 2011
  • Genetic algorithms in search, optimization, and machine learning
    David E. Goldberg • 1988
  • Gaussian Processes for Machine Learning
    Carl Edward Rasmussen, Christopher K. I. Williams • 2005
  • Machine Learning : A Probabilistic Perspective
    Kevin P. Murphy • 2012
  • UCI Repository of machine learning databases
    Catherine Blake • 1998
  • Programs for Machine Learning
    Steven L. Salzberg, Alberto M. Segre • 1994
  • Proceedings of the 24th international conference on Machine learning
    John Langford, Joëlle Pineau • 2007
  • TensorFlow: A system for large-scale machine learning
    Martı́n Abadi, Paul Barham, Jianmin Chen • arXiv (Cornell University) • 2016
  • Machine learning: Trends, perspectives, and prospects
    Michael I. Jordan, Tom M. Mitchell • Science • 2015
  • Machine learning in automated text categorization
    Fabrizio Sebastiani • ACM Computing Surveys • 2002
  • Pattern Recognition and Machine Learning
    • Springer eBooks • 2006
  • Pattern Recognition and Machine Learning
    W.R. Howard • Kybernetes • 2007
  • Ensemble Methods in Machine Learning
    Thomas G. Dietterich • Lecture notes in computer science • 2000
  • TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
    Martı́n Abadi, Ashish Agarwal, Paul Barham • arXiv (Cornell University) • 2016
  • Scikit-learn: Machine Learning in Python
    PedregosaFabian, VaroquauxGaël, GramfortAlexandre • Journal of Machine Learning Research • 2011
Provenance: OpenAlex • View API JSON

Build goals

Live discovery feed of new papers and their citation context, with topic maps and influence trails.

Stack

  • Frontend: React 18, Mapbox GL or deck.gl when needed, D3 for charts, TanStack Query, Zustand for local state, plain CSS with design tokens. No runtime CSS frameworks.
  • API: Python 3.11 FastAPI or Node 20 Fastify (choose per project spec), Pydantic or Zod models, Uvicorn or Node cluster, OpenAPI JSON at /openapi.json.
  • Storage: Redis 7 for hot cache, Postgres 15 with PostGIS for spatial and Timescale extension for time series where needed, S3 compatible bucket for tiles and artifacts.
  • Ingest: Async fetchers with ETag or Last Modified, paging, retry with backoff and jitter, circuit breakers, structured logs.
  • Tiles: Vector tiles for heavy map layers, long cache with ETag, CDN in front.
  • Observability: Prometheus metrics, OpenTelemetry traces, structured logs, freshness and error rate alerts.
  • Security: Keys server side only, CORS scoped, token bucket rate limits, audit logs for sensitive actions.

Data sources

SourceEndpointCadenceAccessAuthNotes
arXiv APIexport.arxiv.org/api/querydailyAtomNonePreprints and abstracts
Crossref RESTapi.crossref.orgfrequentREST JSONNoneDOI metadata and citations
OpenAlexapi.openalex.orgfrequentREST JSONNoneWorks, authors, venues, citations
Semantic Scholarapi.semanticscholar.org/graph/v1frequentREST JSONNoneCitations and related

Architecture

Python FastAPI, embedding based topic models with seeded randomness, cross ID resolution, Elastic for search, caching and polite pool usage.

Models

Models are expressed in DB tables and mirrored as API schemas. All timestamps are UTC. All coordinates are WGS84. Stable IDs, soft deletes by valid_to when needed.

  • work(id, title, abstract, ts, authors[], topics[], doi, arxiv_id, openalex_id)
  • edge(src, dst, weight, kind)
  • author(id, name, works[])

Algorithms

  • Topic clustering with stability constraints
  • Cross ID resolution with confidence scores
  • Citation velocity scoring by recent cites

API surface

  • GET /works?q=&since=&until=&topic=&author=&venue=&page=
  • GET /citations?work_id=
  • GET /topics?since=&until=

UI and visualization

  • Force directed topic map with performance tuned forces
  • Animated citation trails
  • Semantic search with keyboard shortcuts and saved collections

Performance budgets

  • Map interactions p95 under 16 ms with 5k nodes
  • Search paging p95 under 300 ms
  • FCP under 2 s on broadband mid tier laptop.
  • API p95 under 300 ms for common list endpoints, p99 under 800 ms.
  • Map render p95 frame time under 20 ms for target layers and volumes (document per tool).
  • Frontend app code under 180 KB gzip excluding map library.
  • API memory under 200 MB under normal load.

Accessibility

  • WCAG 2.2 AA, automated axe checks clean, no critical issues.
  • Keyboard navigable controls, focus rings visible, ARIA roles correct.
  • Color contrast at or above 4.5 to 1, colorblind safe palettes.
  • Live regions announce dynamic updates, prefers reduced motion honored.

Evidence pack and quality gates

  • Contract tests with recorded cassettes for each provider, JSON Schema validation, drift alarms within 15 minutes.
  • Load tests with k6, thresholds enforced in CI for p95 and p99.
  • Lighthouse performance and a11y reports stored as CI artifacts.
  • Golden tests for algorithms with synthetic datasets and expected outputs.
  • Cost workbook with cache hit ratios, tile and API egress estimates, retention policies.

CI configuration

name: ci
on: [push, pull_request]
jobs:
  api:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgis/postgis:15-3.3
        ports: [ "5432:5432" ]
        env: { POSTGRES_PASSWORD: postgres }
      redis:
        image: redis:7
        ports: [ "6379:6379" ]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "20" }
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      - run: pip install -e packages/api[dev] || true
      - run: psql postgresql://postgres:postgres@localhost:5432/postgres -f packages/api/src/db/schema.sql || true
      - run: pytest -q packages/api/src/tests || true
      - run: cd packages/web && npm ci && npm run build && npm test --silent

Risks and mitigations

  • Cluster stability across runs, fix seeds and cache embeddings
  • Rate limits, share polite pools, cache responses

Acceptance checklist

  • CI green on main, all quality gates met.
  • Freshness SLOs met for hot regions or feeds.
  • Performance budgets met or better.
  • A11y audits pass with zero critical findings.
  • Provenance and license panels render correct metadata.
  • Runbook covers stale feed handling, provider errors, and key rotation.

Implementation sequence

  • Adapters and schemas, ID resolution suite
  • Topics and velocity scoring with golden tests
  • Search API, maps, trails, and collections
  • Evidence pack and a11y audits

Runbook

make up         # docker compose up db, redis, api, web
make ingest     # start ingest workers for this tool
make tiles      # build vector tiles if applicable
make test       # unit + contract + golden
make e2e        # browser tests