How to automatically monitor new ML research papers on Arxiv by keyword

Staying on Top of ML Research

With ~10,000 new papers on Arxiv every month, staying current in your specific niche is nearly impossible through manual browsing.

The Automation

I built an Arxiv scraper on Apify that:

  1. Keyword search: Define the topics you care about (e.g., “diffusion models”, “LLM alignment”, “RLHF”)
  2. Scheduled runs: Set it to check daily or hourly
  3. Structured output: Returns paper title, authors, abstract, arXiv URL, PDF link, and categories
  4. Easy integration: JSON output works with any webhook, Slack bot, or Notion database

Example use: Slack Bot

import requests

# Run the scraper
result = requests.post(
    "https://api.apify.com/v2/acts/technicaldost~arxiv-paper-scraper/run-sync",
    json={"keywords": ["diffusion models"], "maxResults": 10}
)

# Post to Slack
for paper in result.json():
    requests.post("YOUR_SLACK_WEBHOOK", json={
        "text": f"*New paper*: {paper[title]}n{paper[url]}"
    })

Why This Matters

Researchers and engineers waste hours browsing Arxiv. An automated pipeline means:

  • Zero missed papers in your niche
  • Daily digest delivered to your preferred platform
  • Easy collaboration with teams (shared paper feeds)

Try it on the Apify Store — free tier available.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

Managing BigQuery with Google ADK, MCP, Cloud Run, Streamlit, and OIDC Authentication

Next Post

PMM Powerhour Designing experiences that put the customer first

Related Posts