Software

3 minute read

AI and the open web

July 3, 2023

It seems that there is a protectionist trend now where large platforms are restricting access to data more tightly. It is is seen mostly as a response to large language models such as GPT used by ChatGPT that are scooping up data from the web. If it leads to more closed behaviour on the web, it will become a negative trend.

Protectionist trend – Reddit, now Twitter

In June, Reddit raised prices on their API. Reddit’s owners are planning to take the company public, and they need to boost revenue from the social news site before they do. Reddit founder and CEO Steve Huffman told The New York Times “The Reddit corpus of data is really valuable, but we don’t need to give all of that value to some of the largest companies in the world for free.”

This has led to an ongoing strike with volunteer moderators that has caused mass disruption on the platform. Steve Huffman said that he is not backing off. He told The Associated Press, “Protest and dissent is important. The problem with this one is it’s not going to change anything because we made a business decision that we’re not negotiating on.” It has reached an impasse.

Yesterday, Elon Musk announced that Twitter is putting a limit on how many posts you can read per day. This is what he said in a tweet:

To address extreme levels of data scraping & system manipulation, we’ve applied the following temporary limits:

Verified accounts are limited to reading 6000 posts/day

Unverified accounts to 600 posts/day

New unverified accounts to 300/day

Later, Musk tweeted that the limit had been raised to 10,000, 1,000, and 500 respectively.

“Several hundred organizations (maybe more) were scraping Twitter data extremely aggressively, to the point where it was affecting the real user experience,” Musk said.

It doesn’t make sense that they are scraping data at scale. It is an inefficient way to gather that kind of data. Even if Twitter is worried that some companies are getting around paying for access to its API by scraping webpages, restricting usage for regular users seems like cutting off your nose to spite your face. Usually, businesses want to encourage people to use their service as much as possible, because that is how they make money!

How will it play out?

It is hard to tell how this will play out. It is a battle to monetize this new frontier. The data holders want a slice of the pie if they are a prime sources for language models to build knowledge and interact in a more human-like fashion.

It could be that this is being opportunistically used to increase prices for API access. Blame the bots! The truth is that it is hard to know what the reality unless you are behind the scenes.

Users suffer as they put in the middle. The market for third-party apps shrinks and can become untenable for some small businesses. That is bad for consumer choice.

Web standards need to adapt. At the moment, I guess AI bots are indexing pages like search engine bots based on the robots.txt file. Permission for using data for language models is not explicit as far as I know. You may have to explicitly block a bot to opt out. For example, OpenAI has published instructions for blocking its bot.

It is likely that regulation will be required in the long-term. The major players are large companies and they have a big advantage. It will depend on if they want to defend their high ground aggressively.

Final thoughts

Personally, I don’t see this as an alarming thing. This is a familiar fight. It is just something that we need to figure out.

Open information and commerce have always been incongruent. This is a battle over information — who produces it, how you access it, and who gets paid for it. In Reddit’s case, it is galling that their rich data is moderated by users for free — it will be an interesting test case to see how this side of the AI revolution evolves. It is an important how this is settled because it will shape what the web will be.

We should try to persevere openness, it is a great strength of the web. There needs to be a viable commercial solution to satisfy business needs. If one is not found, we need to mitigate harm being done through regulation.

news

DRF-YASG: The Superhero of API Documentation – Saving Developers from Documentation Despair!

July 3, 2023

Marketing

How To Find Current Trending Topics for Content Creation: The 10 Best Tools

July 3, 2023

slack-next-gen-platform-–-external-api-calls

8 min

Software

Slack Next-gen Platform – External API Calls

In this tutorial, you’ll learn how to make external API calls inside your own “function” that runs on…

Noa Ganot

December 23, 2022

finding-and-fixing-exposed-hardcoded-secrets-in-your-github-project-with-snyk

4 min

Software

Finding and fixing exposed hardcoded secrets in your GitHub project with Snyk

Snyk is an excellent tool for spotting project vulnerabilities, including hardcoded secrets. In this blog, we’ll show how…

Bryan Socransky

June 26, 2024

creating-flappy-bird-with-pygame.-part-–-1

6 min

Software

Creating flappy bird with pygame. Part – 1

Introduction We had made a little of our flappy game in the previous post. We could fly our…

Hassan Maishera

August 6, 2022

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Why Laravel is better then NodeJs/Express?

Callback Functions in JS

Building a Sports Score App with Flutter

Trending Tags

AI and the open web

Protectionist trend – Reddit, now Twitter

How will it play out?

Final thoughts

Leave a Reply Cancel reply

Previous Post

DRF-YASG: The Superhero of API Documentation – Saving Developers from Documentation Despair!

Next Post

How To Find Current Trending Topics for Content Creation: The 10 Best Tools

AI and the open web

Protectionist trend – Reddit, now Twitter

How will it play out?

Final thoughts

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts