Throttling Control - robots

This space is made available to users of Open Patent Services (OPS) web-service and now also to users of EPO’s raw data subscription products such as 14.7 DOCDB - worldwide bibliographic master database, 14.11 Worldwide legal status (INPADOC), 14.12 EP full text in XML, 14.1 EBD and more.

Users can ask each other questions, exchange experiences and solutions, post ideas. The moderator will use this space to announce changes or other relevant information.
Post Reply

martien
Posts: 17
Joined: Tue Jul 16, 2013 1:45 pm

Throttling Control - robots

Post by martien » Wed Apr 25, 2018 10:06 am

Hi Support,

I have implemented the rules according the fair use policy and normally it's working quite well.
However, in some situations I still get the error CLIENT.RobotDetected, although to my understanding
there should not be a reason for waiting with a next call:

####### Ex 1 #### Change from yellow to black after 4 sec.
20180316 09:01:24 0 UA get: QUERY1
20180316 09:01:25 0 busy (images=green:100, inpadoc=green:45, other=green:1000, retrieval=green:100, search=yellow:15)
20180316 09:01:25 0 QuotaWeekUsed: 705572105; QuotaHourUsed: 164828218
20180316 09:01:29 0 UA get: QUERY2
20180316 09:01:29 0 overloaded (images=green:50, inpadoc=green:30, other=green:1000, retrieval=green:50, search=black:0)
20180316 09:01:29 0 QuotaWeekUsed: 705572385; QuotaHourUsed: 164828498 ThrottlingControlQuota
20180316 09:01:29 0 UA-status: 403 Forbidden CLIENT.RobotDetected

####### Ex 2 #### Change from green to black
20180424 03:11:49 0 UA get: QUERY1
20180424 03:11:50 0 idle (images=green:200, inpadoc=green:60, other=green:1000, retrieval=green:200, search=green:30)
20180424 03:11:50 0 QuotaWeekUsed: 536305934; QuotaHourUsed: 7117918
20180424 03:11:50 0 UA get: QUERY2
20180424 03:11:50 0 busy (images=green:100, inpadoc=green:45, other=green:1000, retrieval=green:100, search=black:0)
20180424 03:11:50 0 QuotaWeekUsed: 536306214; QuotaHourUsed: 7118198 ThrottlingControlQuota
20180424 03:11:50 0 UA-status: 403 Forbidden CLIENT.RobotDetected

In both cases, I should have expected a search=red warning before. Having a green status and
getting, uncontrollable, locked out for 900 seconds is not really fun.

Do you have any advise how to prevent those situations?

Regards, Martien


EPO / OPS Support
Posts: 942
Joined: Thu Feb 22, 2007 5:32 pm

Re: Throttling Control - robots

Post by EPO / OPS Support » Wed Apr 25, 2018 11:58 am

Hi,

If you read the instructions in the user guide: https://www.epo.org/searching-for-paten ... html#tab-3 its stated that each node is protecting itself differently. The user guide explains that there is more than 1 node (2 in fact), and each gives colour status separately. Just because node 1 is green, it doesn't mean node 2 is also green. For example, you could have received a red from Node 2 many requests earlier but not noticed it.

Yellow status was at 09:01:25 - that is when you should slow right down to the number (ob a bit below) of mentioned requests per minute. As soon as you see a reduce request you have to drop right down.

Number of requests per minute have to be executed correctly and on time, especially Wednesdays are known to be very busy days.

regards,
Vesna for OPS support


martien
Posts: 17
Joined: Tue Jul 16, 2013 1:45 pm

Re: Throttling Control - robots

Post by martien » Wed Apr 25, 2018 10:58 pm

Hi Vesna,

To my example #1, I didn't get a "red" condition within the last 15 days before (where my log started).
After "yellow" we slowed down a bit (4 secs) and got "black" as next result, without getting "red" before.

Is seems, that a (one) certain query used more than 25% of the request limit.

To ex #2, I now see in my logs, there where "red" conditions till 6 secs before:
20180424 03:11:44 0 busy (images=green:100, inpadoc=green:45, other=green:1000, retrieval=green:100, search=red:15)
followed by "green".

Thanks for your explanation, especially that I may not rely on "green".

Regards, Martien


EPO / OPS Support
Posts: 942
Joined: Thu Feb 22, 2007 5:32 pm

Re: Throttling Control - robots

Post by EPO / OPS Support » Thu Apr 26, 2018 7:21 am

Hi,

Did you read pages 40-42 of our OPS manual? Especially the part about multiple instances and throttling? I find explanation very good, so I suggest you read it.
There are few issues in OPS:
[*]first of all, there are two nods, both have to be observed at all times,
[*]then 60 second rolling window principle of throttles on each specific service which gets really complicated when you are doing queries with more than one constituent. That means, a query for inpadoc family with biblio and legal events will use multiple instances (2, but 3 times :-) and each of them will have different throttle value at the time you start your query and will not set off at the same time (this is where the principle of 60 second rolling window comes in place for each of the 2 throttles (inpadoc=family, retrieval= biblio and then one more time inpadoc=legal – all 3 set at different time).
[*]the next issue is that throttling don’t only regulates your usage but everyone using specific service at the same time as you do. As soon as users start querying, the system will start decreasing numbers of allowed queries per minute for all users, not only for you. In this way we try to assure that everyone can get something.

And then there is a fact that Wednesdays are the heaviest day of the week because this is our publication date.

The recommendation in our manual states : “As a user adhering to the fair use policy, you should moderate your usage of the service over a 60 second window, according to the most negative response (number of queries) data received.” I think that is the most important information to remember.

OPS documentation is available here: https://www.epo.org/searching-for-paten ... html#tab-3

I hope this helps, but please read the documents, it explains it better then I can

Regards,
Vesna for OPS support


martien
Posts: 17
Joined: Tue Jul 16, 2013 1:45 pm

Re: Throttling Control - robots

Post by martien » Thu Apr 26, 2018 1:55 pm

Hi Vesna,

I definitely didn't realize the table on p. 42. before.
Thanks again,

Martien


Post Reply