Richer and better structured data in Open Food Facts

Richer and better structured data in Open Food Facts

When we designed the Open Food Facts database structure in 2012, we tried to make it as simple and smart as possible for users to quickly enter food product data. So for instance we let users enter nutrition facts per 100g or per serving, and we would then automatically compute one from the other using the serving size. We also automatically converted salt from and to sodium, energy in kJ from and to energy in kcal etc.

This worked great in the beginning, but over the years we found that for some uses, we need to better reflect the complexity of some food products in our models:

  • We added nutrition facts for prepared products
  • We found that rounding rules for nutrition facts labels can impact computations of some scores like the Nutri-Score
  • We started to estimate the percentage of each ingredient so that we could estimate nutrients that are not on the nutrition facts label (like most minerals, vitamins and fatty acids)
  • We now have nutrition data from other sources than the packaging, as we now have hundreds of food manufacturers who send us food product data directly through our free pro platform.

So last year, thanks to a grant from the  NGI0 Commons Fund from  NLnet  and the European Commission, we embarked in a journey with our reusers, contributors and volunteers to rethink how we store and process nutrition data in Open Food Facts.

After many brainstorming sessions, we decided on a new structure that keeps the same smart and simple approach for most use cases, while retaining the full richness of the data for uses cases that need it:

  • We now separately store input nutrition sets from different sources: packaging data, data from manufacturers, data from other open databases like the USDA, nutrition data estimated from ingredients etc.
  • We also compute an aggregated nutrition set combined from the different input sources, that gives preference to the most trusted sources (manufacturer data and data from packaging). This aggregated set clearly indicates which nutrient came from which source, so that reusers can decide whether to use estimated or not for instance.

This refactor took almost a year as nutrition data is at the core of many algorithms in Open Food Facts systems. We also took careful steps to ensure that we could maintain backward compatibility with the hundreds of reusers of our API and database so that they could continue to read and write nutrition data seamlessly with the older API and schema version.

The new nutrition data schema structured is detailed in our OpenAPI documentation.

This project was funded through the NGI0 Commons Fund, a fund established by NLnet with financial support from the European Commission’s Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101135429. Additional funding is made available by the Swiss State Secretariat for Education, Research and Innovation (SERI).