Upgrading an application on Elastic Beanstalk from Amazon Linux 1 to 2

These days i have the pleasure to upgrade an application running on AWS Elastic Beanstalk from platform Multi-container Docker to new platform Docker. The former is based on Amazon Linux 1 (AL1) while the latter runs on Amazon Linux 2 (AL2). I follow this guide.

I have encountered the following advantages so far:

  • links between containers are now fully bidirectional: on AL1 those links could only be defined and used in one direction. If you wanted bidirectional communication you had to implement service discovery yourself
  • the limit on 10 containers per app is gone: in my tests i could easily spin up 12 containers.
    • apparently this limitation came from the automatic translation of an Elastic Beanstalk app into a task definition. The limitation is documented as task definition max containers.
    • on AL1 one got the following error:
frontend-staging - ERROR: Service:AmazonECS, Code:ClientException, Message:Too many containers., Class:com.amazonaws.services.ecs.model.ClientException
  • i could restart single containers without affecting the other running containers: on AL1 this would cause a restart of all containers for reasons unknown to me.
  • simplified logging – all services log to /var/log/eb-docker/containers/eb-current-app/eb-stdouterr.log which can be streamed to CloudWatch.

Magento and timezones

Today i had the issue that after dumping and importing a Magento Database from one instance of MySQL into another one the order dates got transported 2 hours into the future. How come?!?

Investigation and testing showed that the source database was running with timezone „GMT“ aka „UTC“ aka +00. The target database was running in system timezone which was CEST aka „Europe/Berlin“ aka +02. Still no problem in sight as Magento anyway stores all values in timezone „GMT“!

Turns out Magento uses data type timestamp for order dates etc. MySQL documentation states:

MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME.) By default, the current time zone for each connection is the server’s time. The time zone can be set on a per-connection basis. As long as the time zone setting remains constant, you get back the same value you store. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions. The current time zone is available as the value of the time_zone system variable. For more information, see Section 5.1.13, “MySQL Server Time Zone Support”.

https://dev.mysql.com/doc/refman/5.7/en/datetime.html

Exactly this happened: If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored.

What was the solution? Well first i wanted to set the target timezone in the dump file. This way target MySQL would convert into GMT during importing and back when the shop reads values. This works except that all values get converted into summer time as it is summer now.

Alternative and better solution: Just set the timezone for each of the shop’s connections by adding the follwing in app/etc/local.xml:

<initStatements><![CDATA[SET NAMES utf8;SET time_zone = '+00:00';]]></initStatements>

This way MySQL doesn’t convert timestamp values in those connections and Magento gets GMT as expected from the database.

So in my view there are some interesting observations:

  • Why does Magento 1 by design a double conversion of date values by storing them as timestamp in MySQL (except when the server’s system timezone is GMT)?
    • Magento creates GMT dates
    • storing such a value in MySQL interprets it as in the connections timezone and transforms it into GMT for storage.
    • reading is done vice versa: read as GMT and convert into connections timezone
  • datetime would be a more fitting data type in my view
  • Magento 2 consequently fixes this by setting in web/vendor/magento/framework/DB/Adapter/Pdo/Mysql.php:
$this->_connection->query("SET time_zone = '+00:00'");
  • MySQL automagically converts timestamps with the correct timezone offset valid for the indivdiual timestamp in the connection’s timezone

Today i corrupted MySQL…here’s how

If you want to change MySQL setting innodb_log_file_size to improve performance the top answer on stack overflow says you have to delete the redo log in /var/lib/mysql/ib_logfile* so that MySQL creates it with the new size. Before deleting you obviously shutdown mysql.

But a normal shutdown of MySQL is a fast shutdown which might leave changes stored in logs but not in table pages. This is normally OK as mysql works through the logs and processes all changes.

But by deleting the redo log…you might create an inconsistent database! As table pages might be marked dirty but the changes can’t be loaded anymore from the redo log. I did it many times with fast shutdown not knowing the risk but today mysql got corrupted with the database still working but in inconsistent state.

How to prevent this?

Old answer was: switch off fast shutdown by SET GLOBAL innodb_fast_shutdown = 0
After shutdown you can now safely delete the redo log. But this is only relevant up to MySQL 5.6.8.

So what is the correct way in modern mysql to change innodb_log_file_size?

  • shutdown mysql
  • change innodb_log_file_size
  • start mysql
    • documentation says: “If InnoDB detects that the innodb_log_file_size differs from the redo log file size, it writes a log checkpoint, closes and removes the old log files, creates new log files at the requested size, and opens the new log files.“

Of course we had backups so nothing happened except for the work.

Lessons learned

  • MySQL is really robust: it starts and allows creating a dump even with
    • corrupted databases
    • corrupted mysql database (missing tables etc)
  • read the documentation
  • read all answers on stack overflow carefully – not just the top one 😉

Chrome vs. Firefox

Just as Chrome has its jumping the shark moment* Microsoft moves to Chromium engine…they don’t have any luck with browsers since 1995 😉

I’m migrating to Firefox…

(*) So what’s wrong with Chrome?

  • removing Adblocker API
  • flagging slow sites
  • hiding parts of the current URL in the address bar – protocol + „www.“ are no longer shown. This can be fixed:
    1. goto chrome://flags
    2. Search UI Hide.
    3. Disable these 3.
  • this might be controversial: promoting encryption (HTTPS) in an aggressive way as this is per se a good measure for websites with data entry but is also a burden for website owners.

Many turn to centralized services like AMP by Google and CloudFlare to encrypt their website and make it faster. So these measures weaken an independent web and strengthen big corporations like Google.

Docker in production

Use option userland-proxy=false to avoid having a separate docker-proxy process per mapped port from a container to host. Somehow this option survived with default true into Docker 18.09.

It even isn’t documented anymore: Docs for Docker 17.09 still had a section about option userland-proxy and a good explanation how it works. This page even mentions the advantage of running without userland proxies but with iptables rules instead:

this alternative is preferred for performance reasons

https://docs.docker.com/v17.09/engine/userguide/networking/default_network/binding/

It even contains a warning that with iptables rules port conflicts are harder to detect.

But documentation for Docker 18.09 doesn’t contain an explanation about it nor its performance penalty.

So having a userland process copying traffic from host port to container port from my point of view doesn’t make sense as the same can be accomplished by iptables rules in the kernel. As the necessary rules are completely managed by Docker this is a simple improvement.

Check logging

Check logging in /var/lib/docker/containers/*/*log

  • check if you have verbose containers
  • in my case biggest producer of logs was PHP-FPM. So i had to disable access logs in its config.
  • setup log rotation in Docker

Example config

So my current config /etc/docker/daemon.json looks like this:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "userland-proxy": false
}

Suprises in software development

Sometimes software development can be quite surprising. For example, one day you get confused because code you wrote doesn’t run as expected. You think about it some time and find a workaround but still can’t explain why it ran unexpectedly. Because you got other work to do you postpone thinking about it further.

Then the next day you read an article which references a change in a programming language and suddenly you get the explanation for that unexpected behaviour the day before. Tada!

This just happened to me with PHP’s ternary operator:

$b = 1;
$c = 2;
echo (!empty($b)) ? $b : (!empty($c)) ? $c : 0;

What do you expect to be echoed? I expect 1 to be echoed which reflects the behaviour of most other programming languages. PHP prints 2 instead. This was the cause for that unexpected behaviour in my program – which i accidentally worked around by not using the ternary operator but didn’t understand why.

PHP groups the ternary operator from left as other programming languages group it from right. This behaviour is explained as request for change in PHP RFC: Deprecate left-associative ternary operator. This article is referenced as RFC for implementation in PHP 7.4. So along with many other changes PHP moves towards doing less unexpected things which is very good.

Associativity in general is explained in this article.

Restore customers from backup in Magento

I recently had two different clients who deleted customers from their instance of Magento1 and Magento2. Although it says:

Once Is Chance, Twice is Coincidence, Third Time Is A Pattern

unknown

i think this is a pattern which needs to be fixed in UI.

But first we have to recover the missing customers. Hopefully you have a sql dump of the shops’s database!

# restore backup into database mage_restore
> gunzip -c magento.sql.gz|mysql mage_restore
# extract customer from this database
> mysqldump -t --skip-extended-insert mage_restore customer_address_entity customer_address_entity_datetime customer_address_entity_decimal customer_address_entity_int customer_address_entity_text customer_address_entity_varchar customer_entity customer_entity_datetime customer_entity_decimal customer_entity_int customer_entity_text customer_entity_varchar wishlist wishlist_item > customers_restore.sql
# import in shop database
> mysql magento < customers_restore.sql

Add

--insert-ignore

if partial data exists in tables (that is: not alle customers were deleted or customers created accounts in the shop after the delete event).

Match orders in Magento 1

After importing the customers we have to link their orders back to their accounts as the reference in an order is set to NULL when the customer is deleted.

-- link orders
UPDATE magento.sales_flat_order as orig inner join mage_restore.sales_flat_order as res on orig.entity_id=res.entity_id set orig.customer_id=res.customer_id;
UPDATE sales_flat_order_grid INNER JOIN sales_flat_order ON sales_flat_order_grid.entity_id = sales_flat_order.entity_id SET sales_flat_order_grid.customer_id = sales_flat_order.customer_id; 
-- check
SELECT orig.entity_id, res.entity_id, orig.customer_id, res.customer_id from magento.sales_flat_order as orig inner join mage_restore.sales_flat_order as res on orig.entity_id=res.entity_id; 

Match orders in Magento 2

-- link orders
UPDATE magento2.sales_order as orig inner join mage_restore.sales_order as res on orig.entity_id=res.entity_id set orig.customer_id=res.customer_id;
-- if you get warnings due to missing keys see below to fix!
UPDATE sales_order_grid INNER JOIN sales_order ON sales_order_grid.entity_id = sales_order.entity_id SET sales_order_grid.customer_id = sales_order.customer_id;
-- check
SELECT orig.entity_id, res.entity_id, orig.customer_id, res.customer_id from magento2.sales_order as orig inner join mage_restore.sales_order as res on orig.entity_id=res.entity_id; 

Fix missing keys

If partial data exists in tables add keyword

ignore

in first UPDATE.
This situation arises if – after customer deletion – formerly existing customers created new accounts. So the old one can’t be created due to duplicate email addresses.
Check via

select orig.entity_id, res.entity_id, orig.customer_id, res.customer_id, c.email, c2.entity_id from magento2.sales_order as orig inner join mage_restore.sales_order as res on orig.entity_id=res.entity_id inner join mage_restore.customer_entity as c on res.customer_id=c.entity_id inner join magento2.customer_entity as c2 on c.email=c2.email where res.customer_id is not null and orig.customer_id is null;

You can fix those as well by matching the new customer account on the email address:

update magento2.sales_order as orig inner join mage_restore.sales_order as res on orig.entity_id=res.entity_id inner join mage_restore.customer_entity as c on res.customer_id=c.entity_id inner join magento2.customer_entity as c2 on c.email=c2.email set orig.customer_id=c2.entity_id where res.customer_id is not null and orig.customer_id is null;

In Magento2 you have to reindex customer_grid as final step!

Microsoft announces WSL 2

In an exciting move Microsoft announces WSL2 for Windows 10 as a successor for WSL – Windows Subsystem for Linux. WSL2 uses a real Linux kernel to be able to run native Linux executables on Windows.

The announcement mentions key features of WSL2 such as

„dramatic file system performance increases”
„you can run more Linux apps in WSL 2 such as Docker”

So probably this setup will be better suited for web development than MacOS (though not as good as the real deal Linux)
I will try my benchmark project fwshop2 with Docker on this new platform to compare to Linux and MacOS. The results for Linux and MacOS are documented in this article.

Tuning for Docker on Mac

Install Ubuntu 😉

Seriously: In a project all developers using a Mac had poor performance compared to those using a Ubuntu Linux system. While Docker per se is a Linux technology and obviously runs best on Linux the runtimes on Mac were way too slow to be acceptable. So we began research to optimize performance on Mac and ease our daily work.

During research i came across an article about tuning disk performance for Docker on Mac. All proposed steps seemed reasonable accept updating Docker to ‚edge‘ – this seems not necessary any more as the software is matured. Further insight gave an article about using NFS which we didn’t pursue.

First, we defined a disk heavy test task in our project to run on a container to measure runtimes for later comparison.

> time bin/console build
real    41m34.257s

For comparison: The same task ran in 4 minute 25 seconds on a Linux system! That’s 89% faster!

> time bin/console build
real    4m25s

1. Check disk image format

If you upgraded your Mac from an earlier MacOS version like Sierra or El Capitan to High Sierra or Mojave Docker might still be using its old disk format. If you’re still on .qcow2 then upgrade to .raw if system disk filesystem is APFS. Unfortunately this is easiest done by resetting Docker to factory defaults which deletes all images so all container images have to be downloaded and built again.

Check for disk image format: we want .raw for performance on APFS.

After resetting we build our images and measure 19% of speedup:

> time bin/console build
real    33m48.665s

2. Use :cached: for volumes

Simply add :cached: to volumes with many files/much disk I/O. Use an override file so your platform specific changes don’t have to be put in project wide docker-compose.yml. So here is a snippet from docker-compose.override.yml:

version: '3'
services:
  nginx:
    volumes:
      - .:/var/www/html:cached

We measure and achieve about of 44% of speedup:

> time bin/console build
real    18m57.775s
# a newer machine is even faster
> time bin/console build
real    16m20.238s

So all in we achieved 54% faster runtime for our test task! Unfortunately i didn’t find further improvements so we checked another approach.

3. Use Docker with Vagrant

We create a Vagrant machine based on Ubuntu 18.04 to run Docker. Vagrant’s provisioning is used to install latest Docker version. Key is to use NFS for good I/O performance. The Vagrantfile looks as follows:

Vagrant.configure("2") do |config|
    config.vm.box = "ubuntu/bionic64"

    config.vm.network :private_network, ip: "192.168.111.16"

    config.vm.provider :virtualbox do |v|
        v.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
        v.customize ["modifyvm", :id, "--memory", 2576]
        v.customize ["modifyvm", :id, "--name", "dockerdev"]
        v.customize ["modifyvm", :id, "--cpus", 2]
    end

    config.vm.synced_folder "../", "/vagrant", :type => "nfs", mount_options:['nolock,vers=3,udp,noatime,actimeo=1']
    config.vm.provision "shell", inline: <<-SHELL
      sudo apt-get update && sudo apt-get install \
        apt-transport-https \
        ca-certificates \
        curl \
        software-properties-common
      curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
      sudo apt-key fingerprint 0EBFCD88
      sudo add-apt-repository \
        "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
        $(lsb_release -cs) \
        stable"
      sudo apt-get update && sudo apt-get -y install docker-ce docker-compose
      sudo usermod -aG docker vagrant
    SHELL
    # config.vm.provision :shell, path: "bootstrap.sh"
    config.ssh.forward_agent = true
end

After creating the box with vagrant up && vagrant ssh we start our Docker services. The newer machine reaches a pretty good time:

> time bin/console build
real	6m3.552s

So this is a pretty impressive improvement – all in all 85,5% – which allows working with Docker on a Mac.

Conclusion

Surprisingly Docker in a VirtualBox managend by Vagrant is faster than Docker for Mac!

This solution might be worth a try: https://github.com/davidalger/warden Under the hood it uses mutagen instead of NFS – don’t know, how that performs.

Rsnapshot auf Synology DSM

Ich nutze gerne das Backup-Tool rsnapshot, weil es einfach und zuverlässig funktioniert. Das erzeugte Backup ist platzsparend (via Nutzung von hard links) und erlaubt trotzdem sofortigen Zugriff auf Dateien ohne vorheriges zeitraubendes Entpacken.

Für Offsite-Backups bietet sich ein Synology NAS an, weil es gut funktioniert und sehr zuverlässig ist.

Auf der Synology ist rsnapshot nicht via DSM Package Manager installierbar.
Da es nur von Perl abhängt klappt die Installation auch einfach manuell:

1. Schritt:

Im DSM Package Manager in Rubrik „Entwickler“ Perl installieren.

Für die folgenden Schritte öffnen wir eine Shell auf der Synology:

2. Schritt:

rsnapshot aus tar entpacken und vorbereiten:

wget https://github.com/rsnapshot/rsnapshot/releases/download/1.4.2/rsnapshot-1.4.2.tar.gz
tar xvzf rsnapshot-1.4.2.tar.gz
cd rsnapshot-1.4.2
./configure 

3. Schritt:

Manuell installieren:

sudo su
cp rsnapshot rsnapshot-diff /usr/local/bin/
chmod 755 /usr/local/bin/rsnapshot /usr/local/bin/rsnapshot-diff
cp rsnapshot.conf.default /usr/local/etc/rsnapshot.conf
exit 

4. Schritt:

/usr/local/etc/rsnapshot.conf anpassen: Pfade eintragen, MySQL-Backup erstellen, ssh auskommentieren etc.

Ich muss folgende Ziele kommentieren, da rm beim Löschen alter Backups Fehler erzeugt und das Backup nicht durchgeläuft.

#cmd_rm     /bin/rm

5. Schritt:

crontab einrichten, z.B.

0 */4 * * *        rsnapshot hourly
40 1 * * *         rsnapshot daily
20 3 * * 6         rsnapshot weekly
10 5 1 * *         rsnapshot monthly

Fertig!