Science in the Moment

Live discovery feed of new papers and their citation context, with topic maps and influence trails.

New works (query: machine learning)

Scikit-learn: Machine Learning in Python
Fabián Pedregosa, Gaël Varoquaux, Alexandre Gramfort • arXiv (Cornell University) • 2012
Open
Genetic algorithms in search, optimization, and machine learning
• Choice Reviews Online • 1989
Open
C4.5: Programs for Machine Learning
J. R. Quinlan • 1992
Open
Data Mining: Practical Machine Learning Tools and Techniques
Ian H. Witten, Eibe Frank, Mark A. Hall • Elsevier eBooks • 2011
Open
UCI Machine Learning Repository
Arthur Asuncion • Medical Entomology and Zoology • 2007
Open
Pattern Recognition and Machine Learning
Christopher Bishop • Journal of Electronic Imaging • 2007
Open
Genetic Algorithms in Search, Optimization and Machine Learning
David E. Goldberg • 1988
Open
Gaussian Processes for Machine Learning
Carl Edward Rasmussen, Christopher K. I. Williams • The MIT Press eBooks • 2005
Open
Proceedings of the 24th international conference on Machine learning
• 2007
Open
Machine learning: a probabilistic perspective
Kevin P. Murphy • 2012
Open
Machine learning: Trends, perspectives, and prospects
Michael I. Jordan, Tom M. Mitchell • Science • 2015
Open
Pattern Recognition and Machine Learning
• 2006
Open
TensorFlow: A system for large-scale machine learning
Martı́n Abadi, Paul Barham, Jianmin Chen • arXiv (Cornell University) • 2016
Open
Scikit-learn: Machine Learning in Python
PedregosaFabian, VaroquauxGaël, GramfortAlexandre • Journal of Machine Learning Research • 2011
Open
UCI Repository of machine learning databases
Catherine Blake • Medical Entomology and Zoology • 1998
Open
Machine learning in automated text categorization
Fabrizio Sebastiani • ACM Computing Surveys • 2002
Open
Programs for Machine Learning
Steven L. Salzberg, Alberto M. Segre • 1994
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martı́n Abadi, Ashish Agarwal, Paul Barham • arXiv (Cornell University) • 2016
Open
Pattern Recognition and Machine Learning
• Kybernetes • 2007
Open
Ensemble Methods in Machine Learning
Thomas G. Dietterich • Lecture notes in computer science • 2000
Open

Provenance: OpenAlex • View API JSON

Build goals

Live discovery feed of new papers and their citation context, with topic maps and influence trails.

Stack

Frontend: React 18, Mapbox GL or deck.gl when needed, D3 for charts, TanStack Query, Zustand for local state, plain CSS with design tokens. No runtime CSS frameworks.
API: Python 3.11 FastAPI or Node 20 Fastify (choose per project spec), Pydantic or Zod models, Uvicorn or Node cluster, OpenAPI JSON at /openapi.json.
Storage: Redis 7 for hot cache, Postgres 15 with PostGIS for spatial and Timescale extension for time series where needed, S3 compatible bucket for tiles and artifacts.
Ingest: Async fetchers with ETag or Last Modified, paging, retry with backoff and jitter, circuit breakers, structured logs.
Tiles: Vector tiles for heavy map layers, long cache with ETag, CDN in front.
Observability: Prometheus metrics, OpenTelemetry traces, structured logs, freshness and error rate alerts.
Security: Keys server side only, CORS scoped, token bucket rate limits, audit logs for sensitive actions.

Data sources

Source	Endpoint	Cadence	Access	Auth	Notes
arXiv API	export.arxiv.org/api/query	daily	Atom	None	Preprints and abstracts
Crossref REST	api.crossref.org	frequent	REST JSON	None	DOI metadata and citations
OpenAlex	api.openalex.org	frequent	REST JSON	None	Works, authors, venues, citations
Semantic Scholar	api.semanticscholar.org/graph/v1	frequent	REST JSON	None	Citations and related

Architecture

Python FastAPI, embedding based topic models with seeded randomness, cross ID resolution, Elastic for search, caching and polite pool usage.

Models

Models are expressed in DB tables and mirrored as API schemas. All timestamps are UTC. All coordinates are WGS84. Stable IDs, soft deletes by valid_to when needed.

work(id, title, abstract, ts, authors[], topics[], doi, arxiv_id, openalex_id)
edge(src, dst, weight, kind)
author(id, name, works[])

Algorithms

Topic clustering with stability constraints
Cross ID resolution with confidence scores
Citation velocity scoring by recent cites

API surface

GET /works?q=&since=&until=&topic=&author=&venue=&page=
GET /citations?work_id=
GET /topics?since=&until=

UI and visualization

Force directed topic map with performance tuned forces
Animated citation trails
Semantic search with keyboard shortcuts and saved collections

Performance budgets

Map interactions p95 under 16 ms with 5k nodes
Search paging p95 under 300 ms
FCP under 2 s on broadband mid tier laptop.
API p95 under 300 ms for common list endpoints, p99 under 800 ms.
Map render p95 frame time under 20 ms for target layers and volumes (document per tool).
Frontend app code under 180 KB gzip excluding map library.
API memory under 200 MB under normal load.

Accessibility

WCAG 2.2 AA, automated axe checks clean, no critical issues.
Keyboard navigable controls, focus rings visible, ARIA roles correct.
Color contrast at or above 4.5 to 1, colorblind safe palettes.
Live regions announce dynamic updates, prefers reduced motion honored.

Evidence pack and quality gates

Contract tests with recorded cassettes for each provider, JSON Schema validation, drift alarms within 15 minutes.
Load tests with k6, thresholds enforced in CI for p95 and p99.
Lighthouse performance and a11y reports stored as CI artifacts.
Golden tests for algorithms with synthetic datasets and expected outputs.
Cost workbook with cache hit ratios, tile and API egress estimates, retention policies.

CI configuration

name: ci
on: [push, pull_request]
jobs:
  api:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgis/postgis:15-3.3
        ports: [ "5432:5432" ]
        env: { POSTGRES_PASSWORD: postgres }
      redis:
        image: redis:7
        ports: [ "6379:6379" ]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "20" }
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      - run: pip install -e packages/api[dev] || true
      - run: psql postgresql://postgres:postgres@localhost:5432/postgres -f packages/api/src/db/schema.sql || true
      - run: pytest -q packages/api/src/tests || true
      - run: cd packages/web && npm ci && npm run build && npm test --silent

Risks and mitigations

Cluster stability across runs, fix seeds and cache embeddings
Rate limits, share polite pools, cache responses

Acceptance checklist

CI green on main, all quality gates met.
Freshness SLOs met for hot regions or feeds.
Performance budgets met or better.
A11y audits pass with zero critical findings.
Provenance and license panels render correct metadata.
Runbook covers stale feed handling, provider errors, and key rotation.

Implementation sequence

Adapters and schemas, ID resolution suite
Topics and velocity scoring with golden tests
Search API, maps, trails, and collections
Evidence pack and a11y audits

Runbook

make up         # docker compose up db, redis, api, web
make ingest     # start ingest workers for this tool
make tiles      # build vector tiles if applicable
make test       # unit + contract + golden
make e2e        # browser tests