Software

1 minute read

How to automatically monitor new ML research papers on Arxiv by keyword

Kirsten Korosec Sean OKane Anthony Ha Theresa Loconsolo

June 25, 2026

Staying on Top of ML Research

With ~10,000 new papers on Arxiv every month, staying current in your specific niche is nearly impossible through manual browsing.

The Automation

I built an Arxiv scraper on Apify that:

Keyword search: Define the topics you care about (e.g., “diffusion models”, “LLM alignment”, “RLHF”)
Scheduled runs: Set it to check daily or hourly
Structured output: Returns paper title, authors, abstract, arXiv URL, PDF link, and categories
Easy integration: JSON output works with any webhook, Slack bot, or Notion database

Example use: Slack Bot

import requests

# Run the scraper
result = requests.post(
    "https://api.apify.com/v2/acts/technicaldost~arxiv-paper-scraper/run-sync",
    json={"keywords": ["diffusion models"], "maxResults": 10}
)

# Post to Slack
for paper in result.json():
    requests.post("YOUR_SLACK_WEBHOOK", json={
        "text": f"*New paper*: {paper[title]}n{paper[url]}"
    })

Why This Matters

Researchers and engineers waste hours browsing Arxiv. An automated pipeline means:

Zero missed papers in your niche
Daily digest delivered to your preferred platform
Easy collaboration with teams (shared paper feeds)

Try it on the Apify Store — free tier available.

Managing BigQuery with Google ADK, MCP, Cloud Run, Streamlit, and OIDC Authentication

June 25, 2026

Product Management

PMM Powerhour Designing experiences that put the customer first

June 25, 2026

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Boost Your Productivity: Atlassian’s AI-Powered Jira Revolution

Métricas de qualidade de software na era da IA

Microsoft is reportedly training salespeople to talk down OpenAI and Anthropic

Trending Tags

How to automatically monitor new ML research papers on Arxiv by keyword

Staying on Top of ML Research

The Automation

Example use: Slack Bot

Why This Matters

Leave a Reply Cancel reply

Previous Post

Managing BigQuery with Google ADK, MCP, Cloud Run, Streamlit, and OIDC Authentication

Next Post

PMM Powerhour Designing experiences that put the customer first

How to automatically monitor new ML research papers on Arxiv by keyword

Staying on Top of ML Research

The Automation

Example use: Slack Bot

Why This Matters

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts