Skip to content

Conversation

hlmnrmr
Copy link

@hlmnrmr hlmnrmr commented Apr 1, 2015

So, this is an initial proposal, we kept it very simple as seen, simple in terms of endpoints available. Idea is as follows:

  • /items/ delivers individual News Items in ninjs
  • /items/ delivers list of News Items in ninjs as well. Notice that here we return list of items, which is different that packages.
  • /packages/ delivers an individual Package Item in ninjs representation, potentially embedding content for the News Items part of the package, this on-request via a parameter or being the default behavior.
  • /packages/ delivers list of Package Items, where the Package Items include only references.

From this proposal it is clear that we see News Item and Package Item as separate resources. They are actually separate resources and that should be reflected in the API. A Package Item is seen as a content type (i.e. composite) but it has an entirely different nature than text, video, audio, photo.

A package is build on editorial choice and it has an specific purpose. Delivering Package Items as part of /items would not be a good practice. The /packages endpoint will have even more relevance when the distribution system will be in place, but even since now having a clear separation from News Items can be considered a good strategy.

A few things to discuss:

1- You must wonder where the /search is. Well, we removed it. Why? Because /search makes sense only when searching across different resources. We have validated and searching for Package Items at the moment does not constitute a use case. Our proposal is to target only News Items for searching and therefore moving the search to the specific /items endpoint via the "q" parameter

2- We didn't look much at the Data schema for this first draft. We went with ninjs because we consider it has an important value from the business point of view. This is something that requires further discussion though.

3- Feedback on any other aspect is very much welcome as well.

content-api.apib Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have a hard limit in place and document it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also use a default soft limit, e.g. 50 or 100 so that by default one cannot accidentally pull a huge amount of data (up to the hard limit, which might be quite high).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well from the apiary doc I get it that the 25 is the default one.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, of course. I was replying directly to Adrian's comment without looking at the context in which it was written. My bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 is default one, but the user should not be able to specify 1000 for example and get this number or records.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you raise an error if the given value exceeds the hard limit, or would you silently retrieve a maximum of HARD_LIMIT items?
IMO an error is better as it makes it more explicit that there exists a hard limit, and it also immediately warns a client that there might be something wrong with its configuration or the code itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO hard limit doesn't have to be number of items. in case of reuters the limit is that you can get content for last 30 days. we can do something like that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or to generalize - there might co-exist several "hard" limits at the same time (configurable?). If an organization allows retrieving 1000s of items at the same time, as long as they are not "too old", then we should let them do it IMO (as long as it is technically feasible, of course).
The potential problem of somebody requesting "too much" data should be solved on the quota level anyway, and we'll need to add that sooner or later.

@amagdas
Copy link
Contributor

amagdas commented Apr 2, 2015

👍

content-api.apib Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a discussion regarding this for liveblog,
and the conclusion was that passing along information that this is a elastic search, mongo or not should not be relevant for open api.
In this regards the backend can be changed without affecting the open api.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree here. Clients should be oblivious to what system we use in the backend for storing data. We should specify what search parameters they can use and then transparently transform these queries into the actual elasticsearch queries or whatever is used on the server.

@vied12
Copy link

vied12 commented Apr 2, 2015

👌
Quite simple, I like it.

Can you add a link to define ninjs please ?

@plamut
Copy link

plamut commented Apr 2, 2015

@vied12 You mean this? http://dev.iptc.org/ninjs

content-api.apib Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this range will be given? [starting_date, enddate] or { gt: starting_date, lt: enddate}
data_since, date_until.
We need standardize the date_range format.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and because I like having only simple parameter, I would prefer
begin_date and end_date or
from_date and to_date
instead of date_range

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.) I prefer flat parameters as well, e.g. start_date and end_date, especially because one of the pair can be omitted in queries like "give me the items created after XYZ".

2.) We need to make sure that it's perfectly clear whether the date range boundaries are inclusive or exclusive (a common source of confusion). Preferably by finding even more descriptive, yet still nice, parameter names (e.g. newer_than)

3.) Since the parameters' data type will most likely be datetime (to be decided...), their names should reflect that to avoid any confusion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vied12 almost all open API use the since and until so from my point is just a conformity.
but still we may need this range for more then one field and then we need to find a solution for that.
ex: in liveblog we need range for updated and created fields.
http://stackoverflow.com/questions/7655403/twitter-api-tweets-for-time-range
http://stackoverflow.com/questions/6205161/select-date-range-to-get-insight-of-the-page

@ioanpocol
Copy link
Contributor

I added this comment also on api document.
As a client I want at every 10 minutes to get all new items and packages.
If an item is on a package I will get once as item and once when I'm processing the package?
Can we get on packages endpoint packages and items that is not in a package and on item endpoint only items from packages?
Probably some filter/search capabilities should be only on package endpoint. Here we can have the option to filter composite or simple items, and containing type as video, image or text.

@plamut
Copy link

plamut commented Apr 3, 2015

As a client I want at every 10 minutes to get all new items and packages.
The question is whether or not this represents a common use case. My and Holman's thinking is that it is more likely that someone wants to fetch either:

  1. Unstructured pieces of information ("items"), e.g. a bunch of new images that are not necessarily related to each other
    • OR -
  2. Complete news stories ("packages") as prepared by the editors.

It seemed less likely someone would simultaneously want to fetch both structured and unstructured content, thus the /search endpoint has been removed (for now at least).
An idea is to keep the initial version of the API simple and only add additional features if clients request it, because after all only the clients can tell us what they really need. At this stage we are mostly guessing and we would like to avoid implementing the stuff that eventually nobody would really need.

I hope the rationale behind the proposal is now more clear. On top of that it also allows us to defer decisions regarding the things you pointed out until there is a clear need to add such feature. But at that point we will have much more information and will be able to actually ask the clients about their exact needs.

+ date_range removed in favor of start_date and end_date for collections
+ hard limit added for collections
+ Elasticsearch query reference removed from *q* parameter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants