Posted
about 8 years
ago
by
Denis Laxalde
Following the introduction post about rethinking the web user
interface of CubicWeb, this article will address the topic of the Web API to
exchange data between the client and the server. As mentioned earlier, this
question is somehow central and
... [More]
deserves particular interest, and better early
than late. Of the two candidate representations previously identified
Hydra and JSON API, this article will focus on the later.
Hopefully, this will give a better insight of the capabilities and limits of
this specification and would help take a decision, though a similar experiment
with another candidate would be good to have. Still in the process of blog
driven development, this post has several open questions from which a
discussion would hopefully emerge...
A glance at JSON API
JSON API is a specification for building APIs that use JSON as a
data exchange format between clients and a server. The media type is
application/vnd.api+json. It has a 1.0 version available from mid-2015.
The format has interesting features such as the ability to build
compound documents (i.e. response made of several, usually
related, resources) or to specify filtering, sorting and pagination.
A document following the JSON API format basically represents resource
objects, their attributes and relationships as well as some links
also related to the data of primary concern.
Taking the example of a Ticket resource modeled after the tracker cube,
we could have a JSON API document formatted as:
GET /ticket/987654
Accept: application/vnd.api+json
{
"links": {
"self": "https://www.cubicweb.org/ticket/987654"
},
"data": {
"type": "ticket",
"id": "987654",
"attributes": {
"title": "Let's use JSON API in CubicWeb"
"description": "Well, let's try, at least...",
},
"relationships": {
"concerns": {
"links": {
"self": "https://www.cubicweb.org/ticket/987654/relationships/concerns",
"related": "https://www.cubicweb.org/ticket/987654/concerns"
},
"data": {"type": "project", "id": "1095"}
},
"done_in": {
"links": {
"self": "https://www.cubicweb.org/ticket/987654/relationships/done_in",
"related": "https://www.cubicweb.org/ticket/987654/done_in"
},
"data": {"type": "version", "id": "998877"}
}
}
},
"included": [{
"type": "project",
"id": "1095",
"attributes": {
"name": "CubicWeb"
},
"links": {
"self": "https://www.cubicweb.org/project/cubicweb"
}
}]
}
In this JSON API document, top-level members are links, data
and included. The later is here used to ship some resources (here a
"project") related to the "primary data" (a "ticket") through the "concerns"
relationship as denoted in the relationships object (more on this later).
While the decision of including or not these related resources along with the
primary data is left to the API designer, JSON API also offers a specification
to build queries for inclusion of related resources. For example:
GET /ticket/987654?include=done_in
Accept: application/vnd.api+json
would lead to a response including the full version resource along with the
above content.
Enough for the JSON API overview. Next I'll present how various aspects of
data fetching and modification can be achieved through the use of JSON API in
the context of a CubicWeb application.
CRUD
CRUD of resources is handled in a fairly standard way in JSON API, relying
of HTTP protocol semantics.
For instance, creating a ticket could be done as:
POST /ticket
Content-Type: application/vnd.api+json
Accept: application/vnd.api+json
{
"data": {
"type": "ticket",
"attributes": {
"title": "Let's use JSON API in CubicWeb"
"description": "Well, let's try, at least...",
},
"relationships": {
"concerns": {
"data": { "type": "project", "id": "1095" }
}
}
}
}
Then updating it (assuming we got its id from a response to the above
request):
PATCH /ticket/987654
Content-Type: application/vnd.api+json
Accept: application/vnd.api+json
{
"data": {
"type": "ticket",
"id": "987654",
"attributes": {
"description": "We'll succeed, for sure!",
},
}
}
Relationships
In JSON API, a relationship is in fact a first class resource as it is defined
by a noun and an URI through a link object. In this respect, the
client just receives a couple of links and can eventually operate on them
using the proper HTTP verb. Fetching or updating relationships is done using
the special <resource url>/relationships/<relation type> endpoint (self
member of relationships items in the first example). Quite naturally, the
specification relies on GET verb for fetching targets, PATCH for
(re)setting a relation (i.e. replacing its targets), POST for adding targets
and DELETE to drop them.
GET /ticket/987654/relationships/concerns
Accept: application/vnd.api+json
{
"data": {
"type": "project",
"id": "1095"
}
}
PATCH /ticket/987654/relationships/done_in
Content-Type: application/vnd.api+json
Accept: application/vnd.api+json
{
"data": {
"type": "version",
"id": "998877"
}
}
The body of request and response of this <resource
url>/relationships/<relation type> endpoint consists of so-called resource
identifier objects which are lightweight representation of resources
usually only containing information about their "type" and "id" (enough to
uniquely identify them).
Related resources
Remember the related member appearing in relationships links in the first
example?
[ ... ]
"done_in": {
"links": {
"self": "https://www.cubicweb.org/ticket/987654/relationships/done_in",
"related": "https://www.cubicweb.org/ticket/987654/done_in"
},
"data": {"type": "version", "id": "998877"}
}
[ ... ]
While this is not a mandatory part of the specification, it has an interesting
usage for fetching relationship targets. In contrast with the
.../relationships/... endpoint, this one is expected to return plain
resource objects (which attributes and relationships information in
particular).
GET /ticket/987654/done_in
Accept: application/vnd.api+json
{
"links": {
"self": "https://www.cubicweb.org/998877"
},
"data": {
"type": "version",
"id": "998877",
"attributes": {
"number": 4.2
},
"relationships": {
"version_of": {
"self": "https://www.cubicweb.org/998877/relationships/version_of",
"data": { "type": "project", "id": "1095" }
}
}
},
"included": [{
"type": "project",
"id": "1095",
"attributes": {
"name": "CubicWeb"
},
"links": {
"self": "https://www.cubicweb.org/project/cubicweb"
}
}]
}
Meta information
The JSON API specification allows to include non-standard information using a
so-called meta object. This can be found in various place of the document
(top-level, resource objects or relationships object). Usages of this
field is completely free (and optional). For instance, we could use this field
to store the workflow state of a ticket:
{
"data": {
"type": "ticket",
"id": "987654",
"attributes": {
"title": "Let's use JSON API in CubicWeb"
},
"meta": { "state": "open" }
}
Permissions
Permissions are part of metadata to be exchanged during request/response
cycles. As such, the best place to convey this information is probably within
the headers. According to JSON API's FAQ, this is also the recommended
way for a resource to advertise on supported actions.
So for instance, response to a GET request could include Allow headers,
indicating which request methods are allowed on the primary resource
requested:
GET /ticket/987654
Allow: GET, PATCH, DELETE
An HEAD request could also be used for querying allowed actions on links
(such as relationships):
HEAD /ticket/987654/relationships/comments
Allow: POST
This approach has the advantage of being standard HTTP, no particular
knowledge of the permissions model is required and the response body is not
cluttered with these metadata.
Another possibility would be to rely use the meta member of JSON API data.
{
"data": {
"type": "ticket",
"id": "987654",
"attributes": {
"title": "Let's use JSON API in CubicWeb"
},
"meta": {
"permissions": ["read", "update"]
}
}
}
Clearly, this would minimize the amount client/server requests.
More Hypermedia controls
With the example implementation described above, it appears already possible to
manipulate several aspects of the entity-relationship database following a
CubicWeb schema: resources fetching, CRUD operations on entities, set/delete
operations on relationships. All these "standard" operations are discoverable
by the client simply because they are baked into the JSON API format: for
instance, adding a target to some relationship is possible by POSTing to the
corresponding relationship resource something that
conforms to the schema.
So, implicitly, this already gives us a fairly good level of Hypermedia
control so that we're not so far from having a mature REST architecture
according to the Richardson Maturity Model. But beyond these "standard"
discoverable actions, the JSON API specification does not address yet
Hypermedia controls in a generic manner (see this interesting
discussion about extending the specification for this
purpose).
So the question is: would we want more? Or, in other words, do we need to
define "actions" which would not map directly to a concept in the application
model?
In the case of a CubicWeb application, the most obvious example (that I could
think of) of where such an "action" would be needed is workflow state
handling. Roughly, workflows in CubicWeb are modeled through two entity
types State and TrInfo (for "transition information"), the former being
handled through the latter, and a relationship in_state between the
workflowable entity type at stake and its current State. It does not appear
so clearly how would one model this in terms of HTTP resource. (Arguably we
wouldn't want to expose the complexity of Workflow/TrInfo/State data model to
the client, nor can we simply expose this in_state relationship, as a client
would not be able to simply change the state of a entity by updating the
relation). So what would be a custom "action" to handle the state of a
workflowable resource? Back in our tracker example, how would we advertise to
the client the possibility to perform "open"/"close"/"reject" actions on a
ticket resource? Open question...
Request for comments
In this post, I tried to give an overview of a possible usage of JSON
API to build a Web API for CubicWeb. Several aspects were discussed
from simple CRUD operations, to relationships handling or non-standard
actions. In many cases, there are open questions for which I'd love to receive
feedback from the community.
Recalling that this topic is a central part of the experiment towards building
a client-side user interface to CubicWeb, the more discussion it gets, the
better!
For those wanting to try and play themselves with the experiments, have a look
at the code. This is a
work-in-progress/experimental implementation, relying on Pyramid for content
negotiation and route traversals.
What's next? Maybe an alternative experiment relying on Hydra? Or an
orthogonal one playing with the schema client-side? [Less]
|
Posted
about 8 years
ago
by
Julien Cristau
An effort to port CubicWeb to a dual python 2.6/2.7 and 3.3+ code base was started by Rémi Cardona in summer of 2014. The first task was to port all of CubicWeb's dependencies:
logilab-common 0.63
logilab-database 1.14
logilab-mtconverter 0.9
... [More]
logilab-constraint 0.6
yams 0.40
rql 0.34
Once that was out of the way, we could start looking at CubicWeb itself. We first set out to make sure we used python3-compatible syntax in all source files, then started to go and make as much of the test suite as possible pass under both python2.7 and python3.4. As of the 3.22 release, we are almost there. The remaining pain points are:
cubicweb's setup.py hadn't been converted. This is fixed in the 3.23 branch as of https://hg.logilab.org/master/cubicweb/rev/0b59724cb3f2 (don't follow that link, the commit is huge)
the CubicWebServerTC test class uses twisted to start an http server thread, and twisted itself is not available for python3
the current method to serialize schema constraints into CWConstraint objects gives different results on python2 and python3, so it needs to be fixed (https://www.logilab.org/ticket/296748)
various questions around packaging and deployment: what happens to e.g. the cubicweb-common package installing into python2's site-packages directory? What does the ${prefix}/share/cubicweb directory become? How do cubes express their dependencies? Do we need a flag day? What does that mean for applications?
[Less]
|
Posted
about 8 years
ago
by
Julien Cristau
An effort to port CubicWeb to a dual python 2.6/2.7 and 3.3+ code base was started by Rémi Cardona in summer of 2014. The first task was to port all of CubicWeb's dependencies:
logilab-common 0.63
logilab-database 1.14
logilab-mtconverter 0.9
... [More]
logilab-constraint 0.6
yams 0.40
rql 0.34
Once that was out of the way, we could start looking at CubicWeb itself. We first set out to make sure we used python3-compatible syntax in all source files, then started to go and make as much of the test suite as possible pass under both python2.7 and python3.4. As of the 3.22 release, we are almost there. The remaining pain points are:
cubicweb's setup.py hadn't been converted. This is fixed in the 3.23 branch as of https://hg.logilab.org/master/cubicweb/rev/0b59724cb3f2 (don't follow that link, the commit is huge)
the CubicWebServerTC test class uses twisted to start an http server thread, and twisted itself is not available for python3
the current method to serialize schema constraints into CWConstraint objects gives different results on python2 and python3, so it needs to be fixed (https://www.logilab.org/ticket/296748)
various questions around packaging and deployment: what happens to e.g. the cubicweb-common package installing into python2's site-packages directory? What does the ${prefix}/share/cubicweb directory become? How do cubes express their dependencies? Do we need a flag day? What does that mean for applications?
[Less]
|
Posted
about 8 years
ago
by
Denis Laxalde
This post is an introduction of a series of articles dealing with an on-going
experiment on building a JavaScript user interface to CubicWeb, to ultimately
replace the web component of the framework. The idea of this series
is to present the main
... [More]
topics of the experiment, with open questions in order
to eventually engage the community as much as possible. The other side of this
is to experiment a blog driven development process, so getting feedback
is the very point of it!
As of today, three main topics have been identified:
the Web API to let the client and server communicate,
the issue of representing the application schema client-side, and,
the construction of components of the web interface (client-side).
As part of the first topic, we'll probably rely on another experimental work
about REST-fulness undertaken recently in pyramid-cubicweb (see this
head for source code). Then, it appears quite clearly that
we'll need sooner or later a representation of data on the client-side and
that, quite obviously, the underlying format would be JSON. Apart from
exchanging of entities (database) information, we already anticipate on the
need for the HATEOAS part of REST. We already took some time to look at
the existing possibilities. At a first glance, it seems that hydra is the
most promising in term of capabilities. It's also built using semantic web
technologies which definitely grants bonus point for CubicWeb. On the other
hand, it seems a bit isolated and very experimental, while JSON API
follows a more pragmatic approach (describe itself as an anti-bikeshedding
tool) and appears to have more traction from various people. For this reason,
we choose it for our first draft, but this topic seems so central in a new UI,
and hard to hide as an implementation detail; that it definitely deserves more
discussion. Other candidates could be Siren, HAL or Uber.
Concerning the schema, it seems that there is consensus around
JSON-Schema so we'll certainly give it a try.
Finally, while there is nothing certain as of today we'll probably start on
building components of the web interface using React, which is also
getting quite popular these days. Beyond that choice, the first practical task
in this topic will concern the primary view system. This task being neither
too simple nor too complicated will hopefully result in a clearer overview of
what the project will imply. Then, the question of edition will come up at
some point. In this respect, perhaps it'll be a good time to put the UX
question at a central place, in order to avoid design issues that we had in
the past.
Feedback welcome! [Less]
|
Posted
about 8 years
ago
by
Denis Laxalde
This post is an introduction of a series of articles dealing with an on-going
experiment on building a JavaScript user interface to CubicWeb, to ultimately
replace the web component of the framework. The idea of this series
is to present the main
... [More]
topics of the experiment, with open questions in order
to eventually engage the community as much as possible. The other side of this
is to experiment a blog driven development process, so getting feedback
is the very point of it!
As of today, three main topics have been identified:
the Web API to let the client and server communicate,
the issue of representing the application schema client-side, and,
the construction of components of the web interface (client-side).
As part of the first topic, we'll probably rely on another experimental work
about REST-fulness undertaken recently in pyramid-cubicweb (see this
head for source code). Then, it appears quite clearly that
we'll need sooner or later a representation of data on the client-side and
that, quite obviously, the underlying format would be JSON. Apart from
exchanging of entities (database) information, we already anticipate on the
need for the HATEOAS part of REST. We already took some time to look at
the existing possibilities. At a first glance, it seems that hydra is the
most promising in term of capabilities. It's also built using semantic web
technologies which definitely grants bonus point for CubicWeb. On the other
hand, it seems a bit isolated and very experimental, while JSON API
follows a more pragmatic approach (describe itself as an anti-bikeshedding
tool) and appears to have more traction from various people. For this reason,
we choose it for our first draft, but this topic seems so central in a new UI,
and hard to hide as an implementation detail; that it definitely deserves more
discussion. Other candidates could be Siren, HAL or Uber.
Concerning the schema, it seems that there is consensus around
JSON-Schema so we'll certainly give it a try.
Finally, while there is nothing certain as of today we'll probably start on
building components of the web interface using React, which is also
getting quite popular these days. Beyond that choice, the first practical task
in this topic will concern the primary view system. This task being neither
too simple nor too complicated will hopefully result in a clearer overview of
what the project will imply. Then, the question of edition will come up at
some point. In this respect, perhaps it'll be a good time to put the UX
question at a central place, in order to avoid design issues that we had in
the past.
Feedback welcome! [Less]
|
Posted
about 8 years
ago
by
Nicolas Chauvat
This CubicWeb blog that has been asleep for some months, whereas the development was active. Let me try to summarize the recent progress.
CubicWeb 3.21
CubicWeb 3.21 was published in July 2015. The announce was sent to the mailing list and changes
... [More]
were listed in the documentation.
The main goal of this release was to reduce the technical debt. The code was improved, but the changes
were not directly visible to users.
CubicWeb 3.22
CubicWeb 3.22 was published in January 2016. A mail was sent to the mailing list and the documentation was updated with the list of changes.
The main achievements of this release were the inclusion of a new procedure to massively import data when using a Postgresql backend, improvements of migrations and customization of generic JSON exports.
Roadmap and bi-monthly meetings
After the last-minute cancellation of the may 2015 roadmap meeting, we failed to reschedule in june, the summer arrived, then the busy-busy end of the year... and voilà, we are in 2016.
During that time, Logilab has been working on massive data import, full-js user interfaces exchanging JSON with the CubicWeb back-end, 3D in the browser, switching CubicWeb to Python3, moving its own apps to Bootstrap, using CubicWeb-Pyramid in production and improving management/supervision, etc. We will be more than happy to discuss this with the rest of the (small but strong) CubicWeb community.
So let's wish a happy new year to everyone and meet again in March for a new roadmap session !
[Less]
|
Posted
about 8 years
ago
by
Nicolas Chauvat
This CubicWeb blog that has been asleep for some months, whereas the development was active. Let me try to summarize the recent progress.
CubicWeb 3.21
CubicWeb 3.21 was published in July 2015. The announce was sent to the mailing list and
... [More]
changes were listed in the documentation.
The main goal of this release was to reduce the technical debt. The code was improved, but the changes
were not directly visible to users.
CubicWeb 3.22
CubicWeb 3.22 was published in January 2016. A mail was sent to the mailing list and the documentation was updated with the list of changes.
The main achievements of this release were the inclusion of a new procedure to massively import data when using a Postgresql backend, improvements of migrations and customization of generic JSON exports.
Roadmap and bi-monthly meetings
After the last-minute cancellation of the may 2015 roadmap meeting, we failed to reschedule in june, the summer arrived, then the busy-busy end of the year... and voilà, we are in 2016.
During that time, Logilab has been working on massive data import, full-js user interfaces exchanging JSON with the CubicWeb back-end, 3D in the browser, switching CubicWeb to Python3, moving its own apps to Bootstrap, using CubicWeb-Pyramid in production and improving management/supervision, etc. We will be more than happy to discuss this with the rest of the (small but strong) CubicWeb community.
So let's wish a happy new year to everyone and meet again in March for a new roadmap session !
[Less]
|
Posted
almost 9 years
ago
by
David Douard
CubicWeb can now be powered by Pyramid (thank you so much
Christophe) instead of Twisted.
I aim at moving all our applications to CubicWeb/Pyramid,
so I wonder what will be the best way to deliver them. For now, we have a
setup made of Apache +
... [More]
Varnish + Cubicweb/Twisted. In some
applications we have two CubicWeb instances with a naive load balacing
managed by Varnish.
When moving to cubicweb-pyramid, there are several options.
By default, a cubicweb-pyramid instance started via the cubicweb-ctl
pyramid command, is running a waitress wsgi http server.
I read it is common to deliver wsgi applications with nginx + uwsgi,
but I wanted to play with mongrel2 (that I
already tested with Cubicweb a while ago), and give a try to the
circus + chaussette stack.
I ran my tests :
using ab the simple Apache benchmark tool (aka ApacheBench) ;
on a clone of our logilab.org forge ;
on my laptop (Intel Core i7, 2.67GHz, quad core, 8Go),
using a postgresql 9.1 database server.
Setup
In order to be able to start the application as a wsgi app, a small
python script is required. I extracted a small part of the
cubicweb-pyramid ccplugin.py file into a elo.py file for
this:
appid = 'elo2'
cwconfig = cwcfg.config_for(appid)
application = wsgi_application_from_cwconfig(cwconfig)
repo = cwconfig.repository()
repo.start_looping_tasks()
I tested 5 configurations: twisted, pyramid, mongrel2+wsgid, uwsgi and circus+chaussette.
When possible, they were tested with 1 worker and 4 workers.
Legacy Twisted mode
Using good old legacy twisted setup:
cubicwebctl start -D -l info elo
The config setting that worth noting are:
webserver-threadpool-size=6
connections-pool-size=6
Basic Pyramid mode
Using the pyramid command that uses waitress:
cubicwebctl pyramid --no-daemon -l info elo
Mongrel2 + wsgid
I have not been able to use uwsgi-mongrel2 as wsgi backend for
mongrel2, since this uwsgi plugin is not provided by the uwsgi debian
packages. I've used wsgid instead (sadly, the project appears to be dead).
The mongrel config is:
main = Server(
uuid="f400bf85-4538-4f7a-8908-67e313d515c2",
access_log="/logs/access.log",
error_log="/logs/error.log",
chroot="./",
default_host="localhost",
name="test",
pid_file="/pid/mongrel2.pid",
bind_addr="0.0.0.0",
port=8083,
hosts = [
Host(name="localhost",
routes={'/': Handler(send_spec='tcp://127.0.0.1:5000',
send_ident='2113523d-f5ff-4571-b8da-8bddd3587475',
recv_spec='tcp://127.0.0.1:5001',
recv_ident='')
})
]
)
servers = [main]
and the wsgid server is started with:
wsgid --recv tcp://127.0.0.1:5000 --send tcp://127.0.0.1:5001 --keep-alive \
--workers <N> --wsgi-app elo.application --app-path .
uwsgi
The config file used to start uwsgi is:
[uwsgi]
stats = 127.0.0.1:9191
processes = <N>
wsgi-file = elo.py
http = :8085
plugin = http,python
virtualenv = /home/david/hg/grshells/venv/jpl
enable-threads = true
lazy-apps = true
The tricky config option there is lazy-apps which must be set,
otherwise the worker processes are forked after loading the cubicweb
application, which this later does not support. If you omit this, only one
worker will get the requests.
circus + chaussette
For the circus setup, I have used this configuration file:
[circus]
check_delay = 5
endpoint = tcp://127.0.0.1:5555
pubsub_endpoint = tcp://127.0.0.1:5556
stats_endpoint = tcp://127.0.0.1:5557
statsd = True
httpd = True
httpd_host = localhost
httpd_port = 8086
[watcher:webworker]
cmd = /home/david/hg/grshells/venv/jpl/bin/chaussette --fd $(circus.sockets.webapp) elo2.app
use_sockets = True
numprocesses = 4
[env:webworker]
PATH=/home/david/hg/grshells/venv/jpl/bin:/usr/local/bin:/usr/bin:/bin
CW_INSTANCES_DIR=/home/david/hg/grshells/grshell-jpl/etc
PYTHONPATH=/home/david/hg/grshells//grshell-jpl
[socket:webapp]
host = 127.0.0.1
port = 8085
Results
The bench are very simple; 100 requests from 1 worker or 500 requests
from 5 concurrent workers, getting the main index page for the
application:
One ab worker
ab -n 100 -c 1 http://127.0.0.1:8085/
We get:
Response times are:
Five ab workers
ab -n 500 -c 5 http://127.0.0.1:8085/
We get:
Response times are:
Conclusion
As expected, the legacy (and still default) twisted-based server is
the least efficient method to serve a cubicweb application.
When comparing results with only one CubicWeb worker, the
pyramid+waitress solution that comes with cubicweb-pyramid is the
most efficient, but mongrel2 + wsgid and circus + chaussette
solutions mostly have similar performances when only one worker is
activated. Surprisingly, the uwsgi solution is significantly less
efficient, and especially have some requests that take significantly
longer than other solutions (even the legacy twisted-based server).
The price for activating several workers is small (around 3%) but
significant when only one client is requesting the application. It is
still unclear why.
When there are severel workers requesting the application, it's not a
surpsise that solutions with 4 workers behave significanly better (we
are still far from a linear response however, roughly a 2x better for
4x the horsepower; maybe the hardware is the main reason for this
unexpected non-linear response).
I am quite surprised that uwsgi behaved significantly worse than the 2
other scalable solutions.
Mongrel2 is still very efficient, but sadly the wsgid server I've
used for these tests has not been developed for 2 years, and the uwsgi
plugin for mongrel2 is not yet available on Debian.
On the other side, I am very pleasantly surprised by circus +
chaussette. Circus also comes with some nice features like a nice web
dashboard which allows to add or remove workers dynamically:
[Less]
|
Posted
almost 9 years
ago
by
David Douard
CubicWeb can now be powered by Pyramid (thank you so much
Christophe) instead of Twisted.
I aim at moving all our applications to CubicWeb/Pyramid,
so I wonder what will be the best way to deliver them. For now, we have a
setup made of Apache +
... [More]
Varnish + Cubicweb/Twisted. In some
applications we have two CubicWeb instances with a naive load balacing
managed by Varnish.
When moving to cubicweb-pyramid, there are several options.
By default, a cubicweb-pyramid instance started via the cubicweb-ctl
pyramid command, is running a waitress wsgi http server.
I read it is common to deliver wsgi applications with nginx + uwsgi,
but I wanted to play with mongrel2 (that I
already tested with Cubicweb a while ago), and give a try to the
circus + chaussette stack.
I ran my tests :
using ab the simple Apache benchmark tool (aka ApacheBench) ;
on a clone of our logilab.org forge ;
on my laptop (Intel Core i7, 2.67GHz, quad core, 8Go),
using a postgresql 9.1 database server.
Setup
In order to be able to start the application as a wsgi app, a small
python script is required. I extracted a small part of the
cubicweb-pyramid ccplugin.py file into a elo.py file for
this:
appid = 'elo2'
cwconfig = cwcfg.config_for(appid)
application = wsgi_application_from_cwconfig(cwconfig)
repo = cwconfig.repository()
repo.start_looping_tasks()
I tested 5 configurations: twisted, pyramid, mongrel2+wsgid, uwsgi and circus+chaussette.
When possible, they were tested with 1 worker and 4 workers.
Legacy Twisted mode
Using good old legacy twisted setup:
cubicwebctl start -D -l info elo
The config setting that worth noting are:
webserver-threadpool-size=6
connections-pool-size=6
Basic Pyramid mode
Using the pyramid command that uses waitress:
cubicwebctl pyramid --no-daemon -l info elo
Mongrel2 + wsgid
I have not been able to use uwsgi-mongrel2 as wsgi backend for
mongrel2, since this uwsgi plugin is not provided by the uwsgi debian
packages. I've used wsgid instead (sadly, the project appears to be dead).
The mongrel config is:
main = Server(
uuid="f400bf85-4538-4f7a-8908-67e313d515c2",
access_log="/logs/access.log",
error_log="/logs/error.log",
chroot="./",
default_host="localhost",
name="test",
pid_file="/pid/mongrel2.pid",
bind_addr="0.0.0.0",
port=8083,
hosts = [
Host(name="localhost",
routes={'/': Handler(send_spec='tcp://127.0.0.1:5000',
send_ident='2113523d-f5ff-4571-b8da-8bddd3587475',
recv_spec='tcp://127.0.0.1:5001',
recv_ident='')
})
]
)
servers = [main]
and the wsgid server is started with:
wsgid --recv tcp://127.0.0.1:5000 --send tcp://127.0.0.1:5001 --keep-alive \
--workers <N> --wsgi-app elo.application --app-path .
uwsgi
The config file used to start uwsgi is:
[uwsgi]
stats = 127.0.0.1:9191
processes = <N>
wsgi-file = elo.py
http = :8085
plugin = http,python
virtualenv = /home/david/hg/grshells/venv/jpl
enable-threads = true
lazy-apps = true
The tricky config option there is lazy-apps which must be set,
otherwise the worker processes are forked after loading the cubicweb
application, which this later does not support. If you omit this, only one
worker will get the requests.
circus + chaussette
For the circus setup, I have used this configuration file:
[circus]
check_delay = 5
endpoint = tcp://127.0.0.1:5555
pubsub_endpoint = tcp://127.0.0.1:5556
stats_endpoint = tcp://127.0.0.1:5557
statsd = True
httpd = True
httpd_host = localhost
httpd_port = 8086
[watcher:webworker]
cmd = /home/david/hg/grshells/venv/jpl/bin/chaussette --fd $(circus.sockets.webapp) elo2.app
use_sockets = True
numprocesses = 4
[env:webworker]
PATH=/home/david/hg/grshells/venv/jpl/bin:/usr/local/bin:/usr/bin:/bin
CW_INSTANCES_DIR=/home/david/hg/grshells/grshell-jpl/etc
PYTHONPATH=/home/david/hg/grshells//grshell-jpl
[socket:webapp]
host = 127.0.0.1
port = 8085
Results
The bench are very simple; 100 requests from 1 worker or 500 requests
from 5 concurrent workers, getting the main index page for the
application:
One ab worker
ab -n 100 -c 1 http://127.0.0.1:8085/
We get:
Response times are:
Five ab workers
ab -n 500 -c 5 http://127.0.0.1:8085/
We get:
Response times are:
Conclusion
As expected, the legacy (and still default) twisted-based server is
the least efficient method to serve a cubicweb application.
When comparing results with only one CubicWeb worker, the
pyramid+waitress solution that comes with cubicweb-pyramid is the
most efficient, but mongrel2 + wsgid and circus + chaussette
solutions mostly have similar performances when only one worker is
activated. Surprisingly, the uwsgi solution is significantly less
efficient, and especially have some requests that take significantly
longer than other solutions (even the legacy twisted-based server).
The price for activating several workers is small (around 3%) but
significant when only one client is requesting the application. It is
still unclear why.
When there are severel workers requesting the application, it's not a
surpsise that solutions with 4 workers behave significanly better (we
are still far from a linear response however, roughly a 2x better for
4x the horsepower; maybe the hardware is the main reason for this
unexpected non-linear response).
I am quite surprised that uwsgi behaved significantly worse than the 2
other scalable solutions.
Mongrel2 is still very efficient, but sadly the wsgid server I've
used for these tests has not been developed for 2 years, and the uwsgi
plugin for mongrel2 is not yet available on Debian.
On the other side, I am very pleasantly surprised by circus +
chaussette. Circus also comes with some nice features like a nice web
dashboard which allows to add or remove workers dynamically:
[Less]
|
Posted
almost 9 years
ago
by
David Douard
CubicWeb can now be powered by Pyramid (thank you so much
Christophe) instead of Twisted.
I aim at moving all our applications to CubicWeb/Pyramid,
so I wonder what will be the best way to deliver them. For now, we have a
setup made of Apache +
... [More]
Varnish + Cubicweb/Twisted. In some
applications we have two CubicWeb instances with a naive load balacing
managed by Varnish.
When moving to cubicweb-pyramid, there are several options.
By default, a cubicweb-pyramid instance started via the cubicweb-ctl
pyramid command, is running a waitress wsgi http server.
I read it is common to deliver wsgi applications with nginx + uwsgi,
but I wanted to play with mongrel2 (that I
already tested with Cubicweb a while ago), and give a try to the
circus + chaussette stack.
I ran my tests :
using ab the simple Apache benchmark tool (aka ApacheBench) ;
on a clone of our logilab.org forge ;
on my laptop (Intel Core i7, 2.67GHz, quad core, 8Go),
using a postgresql 9.1 database server.
Setup
In order to be able to start the application as a wsgi app, a small
python script is required. I extracted a small part of the
cubicweb-pyramid ccplugin.py file into a elo.py file for
this:
appid = 'elo2'
cwconfig = cwcfg.config_for(appid)
application = wsgi_application_from_cwconfig(cwconfig)
repo = cwconfig.repository()
repo.start_looping_tasks()
I tested 5 configurations: twisted, pyramid, mongrel2+wsgid, uwsgi and circus+chaussette.
When possible, they were tested with 1 worker and 4 workers.
Legacy Twisted mode
Using good old legacy twisted setup:
cubicwebctl start -D -l info elo
The config setting that worth noting are:
webserver-threadpool-size=6
connections-pool-size=6
Basic Pyramid mode
Using the pyramid command that uses waitress:
cubicwebctl pyramid --no-daemon -l info elo
Mongrel2 + wsgid
I have not been able to use uwsgi-mongrel2 as wsgi backend for
mongrel2, since this uwsgi plugin is not provided by the uwsgi debian
packages. I've used wsgid instead (sadly, the project appears to be dead).
The mongrel config is:
main = Server(
uuid="f400bf85-4538-4f7a-8908-67e313d515c2",
access_log="/logs/access.log",
error_log="/logs/error.log",
chroot="./",
default_host="localhost",
name="test",
pid_file="/pid/mongrel2.pid",
bind_addr="0.0.0.0",
port=8083,
hosts = [
Host(name="localhost",
routes={'/': Handler(send_spec='tcp://127.0.0.1:5000',
send_ident='2113523d-f5ff-4571-b8da-8bddd3587475',
recv_spec='tcp://127.0.0.1:5001',
recv_ident='')
})
]
)
servers = [main]
and the wsgid server is started with:
wsgid --recv tcp://127.0.0.1:5000 --send tcp://127.0.0.1:5001 --keep-alive \
--workers --wsgi-app elo.application --app-path .
uwsgi
The config file used to start uwsgi is:
[uwsgi]
stats = 127.0.0.1:9191
processes =
wsgi-file = elo.py
http = :8085
plugin = http,python
virtualenv = /home/david/hg/grshells/venv/jpl
enable-threads = true
lazy-apps = true
The tricky config option there is lazy-apps which must be set,
otherwise the worker processes are forked after loading the cubicweb
application, which this later does not support. If you omit this, only one
worker will get the requests.
circus + chaussette
For the circus setup, I have used this configuration file:
[circus]
check_delay = 5
endpoint = tcp://127.0.0.1:5555
pubsub_endpoint = tcp://127.0.0.1:5556
stats_endpoint = tcp://127.0.0.1:5557
statsd = True
httpd = True
httpd_host = localhost
httpd_port = 8086
[watcher:webworker]
cmd = /home/david/hg/grshells/venv/jpl/bin/chaussette --fd $(circus.sockets.webapp) elo2.app
use_sockets = True
numprocesses = 4
[env:webworker]
PATH=/home/david/hg/grshells/venv/jpl/bin:/usr/local/bin:/usr/bin:/bin
CW_INSTANCES_DIR=/home/david/hg/grshells/grshell-jpl/etc
PYTHONPATH=/home/david/hg/grshells//grshell-jpl
[socket:webapp]
host = 127.0.0.1
port = 8085
Results
The bench are very simple; 100 requests from 1 worker or 500 requests
from 5 concurrent workers, getting the main index page for the
application:
One ab worker
ab -n 100 -c 1 http://127.0.0.1:8085/
We get:
Response times are:
Five ab workers
ab -n 500 -c 5 http://127.0.0.1:8085/
We get:
Response times are:
Conclusion
As expected, the legacy (and still default) twisted-based server is
the least efficient method to serve a cubicweb application.
When comparing results with only one CubicWeb worker, the
pyramid+waitress solution that comes with cubicweb-pyramid is the
most efficient, but mongrel2 + wsgid and circus + chaussette
solutions mostly have similar performances when only one worker is
activated. Surprisingly, the uwsgi solution is significantly less
efficient, and especially have some requests that take significantly
longer than other solutions (even the legacy twisted-based server).
The price for activating several workers is small (around 3%) but
significant when only one client is requesting the application. It is
still unclear why.
When there are severel workers requesting the application, it's not a
surpsise that solutions with 4 workers behave significanly better (we
are still far from a linear response however, roughly a 2x better for
4x the horsepower; maybe the hardware is the main reason for this
unexpected non-linear response).
I am quite surprised that uwsgi behaved significantly worse than the 2
other scalable solutions.
Mongrel2 is still very efficient, but sadly the wsgid server I've
used for these tests has not been developed for 2 years, and the uwsgi
plugin for mongrel2 is not yet available on Debian.
On the other side, I am very pleasantly surprised by circus +
chaussette. Circus also comes with some nice features like a nice web
dashboard which allows to add or remove workers dynamically:
[Less]
|