How I helped Open Food Facts to better recognise ingredients in my country
And how you can too !
Whether you are a long-time contributor of Open Food Facts and would like to go further with contributing in the taxonomy, or you are a to-be developer willing to contribute on your first open-source project, this article is made for you.
1-Introduction
Open Food Facts is the Wikipedia of food products. It indeed, has some similarities with Wikipedia: a lot of information, for free, and where everyone can contribute (open-source). Open Food Facts has also a mobile App, allowing us to scan barcode of food products and understand all labels, ingredients, table that appear on it. For example, if you do some allergies or have food restriction, you can be informed – by scanning the barcode – that a product contains ingredients or traces of unwished ingredients.
Recently, Open Food Facts reached 2 700 000 products globally! However, in some other countries, let us take the example of Croatia, where I am living, only 2 000 products are referenced in Open Food Facts.
And if we compare the number of ingredients referenced in Open Food Facts database in English language with the ingredients referenced in Open Food Facts database in Croatian language, only about 15% of the ingredients referenced in English exist in Croatian language.
Remark: when we refer to an ingredient (or a label, or a category) and its translation in all languages, we talk about “taxonomy” (see the blog entry of Yukti: https://blog.openfoodfacts.org/en/news/on-taxonomies)
And if we look at the ingredients of a Croatian product, we can see that many ingredients in Croatian are not referenced in the database of Open Food Facts as you can see in Figure 1 where some ingredients are highlighted.
Remark: this screenshot was taken before the new version of the website of Open Food Facts, so, the design is a bit different.
Too bad! But since Open Food Facts is open-source and everybody can contribute to it… let’s try to contribute!
2-Prerequisites
Create an account on Open Food Facts: https://hr.openfoodfacts.org/
Create an account on GitHub: https://github.com/
Install Git if you are using Windows (alternatively, you can install WSL): https://git-scm.com/downloads
Remark: in this article, we use git commands for local development (to create a new branch, commit and push, as we will see later), but be aware that there exists a more user-friendly solution called Github desktop allowing you to do the same in an easier way.
Install an editor to update the file: https://code.visualstudio.com/Download, for example.
3-Find a good example
We will not describe how to scan a product and extract its ingredients list in this article but this is something that is also done by Open Food Facts contributors like you and me. Let us pick up a product that already has the ingredients list in Open Food Facts. We will stick to Croatian products but the same can be done for any product of any country where the ingredients in the local language are missing in the taxonomy.
As it is our first contribution, let us start safely by picking up a product that has the ingredients list in both English and Croatian. Like this, it will be easier to add the Croatian ingredients name under the English one.
For example, let us pick up this product (see Figure 2a) called “choco” from the brand “delicia”: https://hr.openfoodfacts.org/product/3859889106191/choco-delicia
Remark: we are taking this product among any other. We were not asked by this brand to choose this product. And we did not choose this product to tarnish their reputation.
As we can see in Figure 2b, it has both HR and GB/USA langages for the ingredients list.
4-Clone the project locally
4-1-Fork the repository of Open Food Facts
Go on the repository of Open Food Facts called openfoodfacts-server: https://github.com/openfoodfacts/openfoodfacts-server.
And follow the steps described in GitHub documentation, starting from (2), to fork the repository into your own GitHub account: https://docs.github.com/en/get-started/quickstart/fork-a-repo#forking-a-repository.
4-2-Clone the repository locally
Follow the steps described in GitHub documentation to clone the repository that you just forked: https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository#cloning-a-repository.
Remark: the projects size is about 3GB. Be sure you have enough space on your computer.
It is possible that you are prompted to use a token to be able to clone the repository. In which case, follow the steps described in GitHub documentation to create a token: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token#creating-a-fine-grained-personal-access-token
4-3-Create a new branch
Now, we will create a new branch because it is a good practice to make the developments in a different branch than the main branch.
Open a terminal (terminal for Mac/Linux, WSL or Git for Windows) and go inside the openfoodfacts-server folder. Enter the following command to create a new branch – and, at the same time – switch to this new branch:
git checkout -b <name of your new branch>
Try to put a name that is explicit, that will help you and the others to understand what it is about. In our example, for our Choco product from Delicije brand for which we want to extract Croatian (hr) ingredients for the taxonomy, we will run:
git checkout -b hr_taxonomy_delicije_choco
5-Your contribution
Now, we have the project locally in the folder openfoodfacts-server. Inside this folder, there are a lot of folders and files. No need to understand what is inside all of them, we will go in the taxonomies folder only. This folder contains all the taxonomies from additives to vitamins, going through countries, labels, categories, etc. and of course ingredients!
Let’s open the ingredients.txt file using Visual Studio Code (or a text editor).
And at the same time, on our web-browser let us have a look at the ingredients that are unknown: https://hr.openfoodfacts.org/product/3859889106191/choco-delicia. We scroll down and we click on the button Details of the analysis of the ingredients. See Figure 3. The highlighted ingredients are missing in the ingredients.txt file. Today, the ingredients are not highlighted anymore as we will see at the end of the article.
For example, the second one, ekstradjevičansko kokosovo ulje, is not recognized. Nema problema! We look at the English version of the ingredients (go on the product page: https://hr.openfoodfacts.org/product/3859889106191/choco-delicia, and click on Edit the page button to be able to see all pictures) as depicted in Figure 4:
We see that ekstradjevičansko kokosovo ulje in Croatian is extra virgin coconut oil in English.
We then, search for extra virgin coconut oil in the ingredients.txt file (Figure 5).
We add the Croatian word for this ingredient (Figure 6, mind the order: English first, then by alphabetical order):
We continue with the next missing term: smeđi šećer od šećerne trske (raw cane sugar in English)
We continue with the third term: kapljice tamne čokolade (dark chocolate chips in English)
Remark:we do not write the 10%. This 10% is used to analyse the amount of the ingredients. But is not needed for the taxonomy.
We continue with the fourth term: kakao masa (cocoa mass in English)
Remark: there is already a different Croatian word for this ingredient: kakaova masa (see Figure 7a). Those are synonyms. We will add a comma and add kakao masa (see Figure 7b).
Figure 7: (a) search of cocoa mass and (b) adding kakao masa as a synonym in the ingredients.txt file.
We continue with sojin lecitin (soya lecithin in English)
Remark: this ingredient was not extracted correctly. Indeed, because the ingredient sojin lecitin was at the end of the line, it was split in sojin leci- and tin. Hence, we have to edit the product on the web-browser and write correctly sojin lecitin.
We continue with sredstvo za rahljenje (raising agent in English)
Remark: it is not an ingredient! We should write it in another file for the taxonomy called additives_classes.txt. We update the file exactly the same way as ingredients.txt.
We continue with amonijev bikarbonat (ammonium bycarbonate in English)
Remark: as you can imagine since it is written as follows in the ingredients list: “sredstvo za rahljenje: amonijev bikarbonat”, amonijev bikarbonat should not be written in the ingredients.txt but in another file called additives.txt where are referenced all the E101, E102, etc. that we can find on food products.
Remark 2: the English word ammonium bycarbonate is not in the additives.txt file. But because a search on internet shows only results for ammonium bicarbonate, we will assume that ammonium bycarbonate is a synonym for ammonium bicarbonate. Which one is in the additives.txt file: it is E503(ii).
Remark 3: we could additionally write ammonium bycarbonate as a synonym for ammonium bicarbonate in English, but let us leave it as it is for now.
And finally, we continue with the last word: aroma vanilije (vanilla flavor in English)
6-Save, commit, push and pull request
We are done with our contribution. We save all the files that we modified. We close them. We come back to our terminal in our openfoodfacts-server folder.
We add the files that we modified before to commit:
$ git add taxonomies/
$ git commit -m “adding Croatian translation for ingredients of choco delicia”
Try to put a message that is explicit, that will help you and the others to understand what it is about.
$ git push
Because the branch exists only locally, you will get a message suggesting you to add –set-upstream origin <name of your new branch> so that it will create the branch on the remote repository. Just copy the suggestion, paste it and push enter (something like: git push –set-upstream origin hr_taxonomy_delicije_choco)
Go on your forked openfoodfacts-server repository on GitHub website. You should see a new message highlighted “hr_taxonomy_delicije_choco had recent pushes less than a minute ago” and a “Compare & pull request” button next to it. Click on this button. It opens a pull request from your forked openfoodfacts-server repository with your new branch into the Open Food Facts openfoodfacts-server repository in the main branch. See Figure 8. Start the title of the pull request with “taxonomy:” otherwise your pull request will fail. This prefix is used to sort the different pull requests.
Add a message for the reviewer to explain what it is about.
Finally, click on the “Create Pull Request” button.
You should see a couple of automated tasks being executed (Figure 9). At the end, we expect everything to be green. Another contributor should review the changes and – if everything is fine – approves the changes.
After about one working day our pull request is approved! See Figure 10.
Our contribution is not yet visible on the website of Open Food Facts. We have to wait for the next release to be able to see it. As you can see in Figure 10, after the merge into openfoodfacts:main branch, there is an automatic action: chore(main): release 1.9.0. The commit message we wrote earlier is added into the next release notes: https://github.com/openfoodfacts/openfoodfacts-server/pull/7154
Now, we just have to wait that the next release is deployed. It can take a couple of weeks!
Meanwhile, we can delete the branch from our repository, as there are no reasons to keep it further after it has been merged into the main branch of the Open Food Facts repository.
7-The result
Some time has passed. The new release has been deployed.
Similarly to Figure 3 previously, on our web-browser let us have a look at the ingredients that are unknown: https://hr.openfoodfacts.org/product/3859889106191/choco-delicia. We scroll down and we click on the button Details of the analysis of the ingredients. See Figure 11. There are no highlighted ingredients! All ingredients are known!
Additionally, we can see in Figure 12, that after the ingredients analysis, Open Food Facts is informing us of the Nova group (level of processing, see https://hr.openfoodfacts.org/nova for more information) of the product as well as that the product is free of palm oil, vegan and vegetarian. In our case, Palm oil free and Vegan was already written on the package of the product. But it is not always the case.
8-Conclusion & Next steps
We have learned how to make our first contribution to the open-source project of Open Food Facts.
Do we have to do that all over again for all food products?
Not really. Look at this ingredients list for another product of the same brand called Detox (https://hr.openfoodfacts.org/product/3859889106160/detox-delicia, Figure 13 Top), before (Figure 13 Middle) and after (Figure 13 Bottom) our contribution.
The ingredients that we added for the previous product are used for that products and all other products of the Open Food Facts database.
We can continue with our next contribution for other products. In Croatian language, or any other language. Now, that we understood how it is working, we can cover also products that have ingredients list in a single language as well.
Aknowledgements: thanks to Charles Nepote for fruitful discussions.
Article by Benoît Boucher
No Comments