Throttling Control - robots

This space is made available to users of Open Patent Services (OPS) web-service and now also to users of EPO’s bulk data subscription products such as 14. EPO worldwide bibliographic database (DOCDB), 14.11 EPO worldwide legal status database (INPADOC), 14.12 EP full text data, 14.1 EP bibliographic data (EBD)and more.

Users can ask each other questions, exchange experiences and solutions, post ideas. The moderator will use this space to announce changes or other relevant information.
Post Reply

martien
Posts: 31
Joined: Tue Jul 16, 2013 1:45 pm
Contact:

Throttling Control - robots

Post by martien » Wed Apr 25, 2018 10:06 am

Hi Support,

I have implemented the rules according the fair use policy and normally it's working quite well.
However, in some situations I still get the error CLIENT.RobotDetected, although to my understanding
there should not be a reason for waiting with a next call:

####### Ex 1 #### Change from yellow to black after 4 sec.
20180316 09:01:24 0 UA get: QUERY1
20180316 09:01:25 0 busy (images=green:100, inpadoc=green:45, other=green:1000, retrieval=green:100, search=yellow:15)
20180316 09:01:25 0 QuotaWeekUsed: 705572105; QuotaHourUsed: 164828218
20180316 09:01:29 0 UA get: QUERY2
20180316 09:01:29 0 overloaded (images=green:50, inpadoc=green:30, other=green:1000, retrieval=green:50, search=black:0)
20180316 09:01:29 0 QuotaWeekUsed: 705572385; QuotaHourUsed: 164828498 ThrottlingControlQuota
20180316 09:01:29 0 UA-status: 403 Forbidden CLIENT.RobotDetected

####### Ex 2 #### Change from green to black
20180424 03:11:49 0 UA get: QUERY1
20180424 03:11:50 0 idle (images=green:200, inpadoc=green:60, other=green:1000, retrieval=green:200, search=green:30)
20180424 03:11:50 0 QuotaWeekUsed: 536305934; QuotaHourUsed: 7117918
20180424 03:11:50 0 UA get: QUERY2
20180424 03:11:50 0 busy (images=green:100, inpadoc=green:45, other=green:1000, retrieval=green:100, search=black:0)
20180424 03:11:50 0 QuotaWeekUsed: 536306214; QuotaHourUsed: 7118198 ThrottlingControlQuota
20180424 03:11:50 0 UA-status: 403 Forbidden CLIENT.RobotDetected

In both cases, I should have expected a search=red warning before. Having a green status and
getting, uncontrollable, locked out for 900 seconds is not really fun.

Do you have any advise how to prevent those situations?

Regards, Martien


EPO / OPS Support
Posts: 1298
Joined: Thu Feb 22, 2007 5:32 pm

Re: Throttling Control - robots

Post by EPO / OPS Support » Wed Apr 25, 2018 11:58 am

Hi,

If you read the instructions in the user guide: https://www.epo.org/searching-for-paten ... html#tab-3 its stated that each node is protecting itself differently. The user guide explains that there is more than 1 node (2 in fact), and each gives colour status separately. Just because node 1 is green, it doesn't mean node 2 is also green. For example, you could have received a red from Node 2 many requests earlier but not noticed it.

Yellow status was at 09:01:25 - that is when you should slow right down to the number (ob a bit below) of mentioned requests per minute. As soon as you see a reduce request you have to drop right down.

Number of requests per minute have to be executed correctly and on time, especially Wednesdays are known to be very busy days.

regards,
Vesna for OPS support


martien
Posts: 31
Joined: Tue Jul 16, 2013 1:45 pm
Contact:

Re: Throttling Control - robots

Post by martien » Wed Apr 25, 2018 10:58 pm

Hi Vesna,

To my example #1, I didn't get a "red" condition within the last 15 days before (where my log started).
After "yellow" we slowed down a bit (4 secs) and got "black" as next result, without getting "red" before.

Is seems, that a (one) certain query used more than 25% of the request limit.

To ex #2, I now see in my logs, there where "red" conditions till 6 secs before:
20180424 03:11:44 0 busy (images=green:100, inpadoc=green:45, other=green:1000, retrieval=green:100, search=red:15)
followed by "green".

Thanks for your explanation, especially that I may not rely on "green".

Regards, Martien


EPO / OPS Support
Posts: 1298
Joined: Thu Feb 22, 2007 5:32 pm

Re: Throttling Control - robots

Post by EPO / OPS Support » Thu Apr 26, 2018 7:21 am

Hi,

Did you read pages 40-42 of our OPS manual? Especially the part about multiple instances and throttling? I find explanation very good, so I suggest you read it.
There are few issues in OPS:
[*]first of all, there are two nods, both have to be observed at all times,
[*]then 60 second rolling window principle of throttles on each specific service which gets really complicated when you are doing queries with more than one constituent. That means, a query for inpadoc family with biblio and legal events will use multiple instances (2, but 3 times :-) and each of them will have different throttle value at the time you start your query and will not set off at the same time (this is where the principle of 60 second rolling window comes in place for each of the 2 throttles (inpadoc=family, retrieval= biblio and then one more time inpadoc=legal – all 3 set at different time).
[*]the next issue is that throttling don’t only regulates your usage but everyone using specific service at the same time as you do. As soon as users start querying, the system will start decreasing numbers of allowed queries per minute for all users, not only for you. In this way we try to assure that everyone can get something.

And then there is a fact that Wednesdays are the heaviest day of the week because this is our publication date.

The recommendation in our manual states : “As a user adhering to the fair use policy, you should moderate your usage of the service over a 60 second window, according to the most negative response (number of queries) data received.” I think that is the most important information to remember.

OPS documentation is available here: https://www.epo.org/searching-for-paten ... html#tab-3

I hope this helps, but please read the documents, it explains it better then I can

Regards,
Vesna for OPS support


martien
Posts: 31
Joined: Tue Jul 16, 2013 1:45 pm
Contact:

Re: Throttling Control - robots

Post by martien » Thu Apr 26, 2018 1:55 pm

Hi Vesna,

I definitely didn't realize the table on p. 42. before.
Thanks again,

Martien


Hans-W
Posts: 1
Joined: Sat Jul 18, 2020 4:31 am

Re: Throttling Control - robots

Post by Hans-W » Sat Jul 18, 2020 4:58 am

Hi,

I have two questions regarding throttling and self-throttling.
There are several limits for data requests and up to now I have taken the following limits from the documentation:
Hour-Limit: 450 MB
Week-Limit: 4GB
Maintenance: Monday to Sunday: 5:00-5:30 CET
Self-throttling headers: several traffic lights

And there is the fair use policy and my first question: That means that if I limit the program to one request every 60 seconds, I will never exceed the fair use limit?

And the second question: If I have a list of several patent documents that I want to download, do I have to consider further limits? And if so which ones?

Regards, Hans


EPO / OPS Support
Posts: 1298
Joined: Thu Feb 22, 2007 5:32 pm

Re: Throttling Control - robots

Post by EPO / OPS Support » Mon Jul 20, 2020 8:49 am

Hi,

You need to observe throttling to know how much you can download per minute and to see if service is overall to busy to do any queries at all at that moment.

Unfortunately, I don't understand your second question. Please contact me directly to patentdata and send me an example of the url you had in mind when you ask that question.

Regards
Vesna for OPS


Post Reply