Science in the Moment
Live discovery feed of new papers and their citation context, with topic maps and influence trails.
New works (query: machine learning)
- Genetic algorithms in search, optimization, and machine learning• Choice Reviews Online • 1989
- C4.5: Programs for Machine LearningJ. R. Quinlan • 1992
- Gaussian Processes for Machine LearningCarl Edward Rasmussen, Christopher K. I. Williams • 2005
- UCI Machine Learning RepositoryArthur Asuncion • 2007
- Pattern Recognition and Machine LearningChristopher Bishop • Journal of Electronic Imaging • 2007
- Data Mining: Practical Machine Learning Tools and TechniquesIan H. Witten, Eibe Frank, Mark A. Hall • Elsevier eBooks • 2011
- Genetic algorithms in search, optimization, and machine learningDavid E. Goldberg • 1988
- Gaussian Processes for Machine LearningCarl Edward Rasmussen, Christopher K. I. Williams • 2005
- Machine Learning : A Probabilistic PerspectiveKevin P. Murphy • 2012
- UCI Repository of machine learning databasesCatherine Blake • 1998
- Programs for Machine LearningSteven L. Salzberg, Alberto M. Segre • 1994
- Proceedings of the 24th international conference on Machine learningJohn Langford, Joëlle Pineau • 2007
- TensorFlow: A system for large-scale machine learningMartı́n Abadi, Paul Barham, Jianmin Chen • arXiv (Cornell University) • 2016
- Machine learning: Trends, perspectives, and prospectsMichael I. Jordan, Tom M. Mitchell • Science • 2015
- Machine learning in automated text categorizationFabrizio Sebastiani • ACM Computing Surveys • 2002
- Pattern Recognition and Machine Learning• Springer eBooks • 2006
- Pattern Recognition and Machine LearningW.R. Howard • Kybernetes • 2007
- Ensemble Methods in Machine LearningThomas G. Dietterich • Lecture notes in computer science • 2000
- TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed SystemsMartı́n Abadi, Ashish Agarwal, Paul Barham • arXiv (Cornell University) • 2016
- Scikit-learn: Machine Learning in PythonPedregosaFabian, VaroquauxGaël, GramfortAlexandre • Journal of Machine Learning Research • 2011
Provenance: OpenAlex • View API JSON
Build goals
Live discovery feed of new papers and their citation context, with topic maps and influence trails.
Stack
- Frontend: React 18, Mapbox GL or deck.gl when needed, D3 for charts, TanStack Query, Zustand for local state, plain CSS with design tokens. No runtime CSS frameworks.
- API: Python 3.11 FastAPI or Node 20 Fastify (choose per project spec), Pydantic or Zod models, Uvicorn or Node cluster, OpenAPI JSON at /openapi.json.
- Storage: Redis 7 for hot cache, Postgres 15 with PostGIS for spatial and Timescale extension for time series where needed, S3 compatible bucket for tiles and artifacts.
- Ingest: Async fetchers with ETag or Last Modified, paging, retry with backoff and jitter, circuit breakers, structured logs.
- Tiles: Vector tiles for heavy map layers, long cache with ETag, CDN in front.
- Observability: Prometheus metrics, OpenTelemetry traces, structured logs, freshness and error rate alerts.
- Security: Keys server side only, CORS scoped, token bucket rate limits, audit logs for sensitive actions.
Data sources
Source | Endpoint | Cadence | Access | Auth | Notes |
---|---|---|---|---|---|
arXiv API | export.arxiv.org/api/query | daily | Atom | None | Preprints and abstracts |
Crossref REST | api.crossref.org | frequent | REST JSON | None | DOI metadata and citations |
OpenAlex | api.openalex.org | frequent | REST JSON | None | Works, authors, venues, citations |
Semantic Scholar | api.semanticscholar.org/graph/v1 | frequent | REST JSON | None | Citations and related |
Architecture
Python FastAPI, embedding based topic models with seeded randomness, cross ID resolution, Elastic for search, caching and polite pool usage.
Models
Models are expressed in DB tables and mirrored as API schemas. All timestamps are UTC. All coordinates are WGS84. Stable IDs, soft deletes by valid_to when needed.
- work(id, title, abstract, ts, authors[], topics[], doi, arxiv_id, openalex_id)
- edge(src, dst, weight, kind)
- author(id, name, works[])
Algorithms
- Topic clustering with stability constraints
- Cross ID resolution with confidence scores
- Citation velocity scoring by recent cites
API surface
- GET /works?q=&since=&until=&topic=&author=&venue=&page=
- GET /citations?work_id=
- GET /topics?since=&until=
UI and visualization
- Force directed topic map with performance tuned forces
- Animated citation trails
- Semantic search with keyboard shortcuts and saved collections
Performance budgets
- Map interactions p95 under 16 ms with 5k nodes
- Search paging p95 under 300 ms
- FCP under 2 s on broadband mid tier laptop.
- API p95 under 300 ms for common list endpoints, p99 under 800 ms.
- Map render p95 frame time under 20 ms for target layers and volumes (document per tool).
- Frontend app code under 180 KB gzip excluding map library.
- API memory under 200 MB under normal load.
Accessibility
- WCAG 2.2 AA, automated axe checks clean, no critical issues.
- Keyboard navigable controls, focus rings visible, ARIA roles correct.
- Color contrast at or above 4.5 to 1, colorblind safe palettes.
- Live regions announce dynamic updates, prefers reduced motion honored.
Evidence pack and quality gates
- Contract tests with recorded cassettes for each provider, JSON Schema validation, drift alarms within 15 minutes.
- Load tests with k6, thresholds enforced in CI for p95 and p99.
- Lighthouse performance and a11y reports stored as CI artifacts.
- Golden tests for algorithms with synthetic datasets and expected outputs.
- Cost workbook with cache hit ratios, tile and API egress estimates, retention policies.
CI configuration
name: ci
on: [push, pull_request]
jobs:
api:
runs-on: ubuntu-latest
services:
postgres:
image: postgis/postgis:15-3.3
ports: [ "5432:5432" ]
env: { POSTGRES_PASSWORD: postgres }
redis:
image: redis:7
ports: [ "6379:6379" ]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "20" }
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- run: pip install -e packages/api[dev] || true
- run: psql postgresql://postgres:postgres@localhost:5432/postgres -f packages/api/src/db/schema.sql || true
- run: pytest -q packages/api/src/tests || true
- run: cd packages/web && npm ci && npm run build && npm test --silent
Risks and mitigations
- Cluster stability across runs, fix seeds and cache embeddings
- Rate limits, share polite pools, cache responses
Acceptance checklist
- CI green on main, all quality gates met.
- Freshness SLOs met for hot regions or feeds.
- Performance budgets met or better.
- A11y audits pass with zero critical findings.
- Provenance and license panels render correct metadata.
- Runbook covers stale feed handling, provider errors, and key rotation.
Implementation sequence
- Adapters and schemas, ID resolution suite
- Topics and velocity scoring with golden tests
- Search API, maps, trails, and collections
- Evidence pack and a11y audits
Runbook
make up # docker compose up db, redis, api, web
make ingest # start ingest workers for this tool
make tiles # build vector tiles if applicable
make test # unit + contract + golden
make e2e # browser tests