Using imapsync to migrate mailboxes

Recently i had to migrate some mailboxes from one imap server to the other. After a bit of research into tools for imap migration running on Linux i decided to use imapsync. It has all the relevant features and is maintained. Running it showed that it tries to do the right things by default. As i will have to do such a migration in the future i document my setup.

Installation

imapsync is a great tool but somehow doesn’t come with perlbrew preparation.

imapsync has a lot of dependencies (mostly perl libs) which are installed via the system’s package manager. But sometimes you just want to execute a migration and not have all those libs on the system afterwards. So a Docker setup setup is provided which solves this easily. But Perl has its established solution for this problem, so we want to use this: perlbrew.

So i prepared a cpanfile which will hopefully be included in the future.

requires 'App::cpanminus';
requires 'Authen::NTLM';
requires 'CGI';
requires 'Compress::Zlib';
requires 'Crypt::OpenSSL::RSA';
requires 'Data::Dumper';
requires 'Data::Uniqid';
requires 'Digest::HMAC';
requires 'Digest::HMAC_MD5';
requires 'Digest::MD5';
requires 'Dist::CheckConflicts';
requires 'Encode';
requires 'Encode::Byte';
requires 'Encode::IMAPUTF7';
requires 'File::Copy::Recursive';
requires 'File::Tail';
requires 'IO::Socket::INET';
requires 'IO::Socket::INET6';
requires 'IO::Socket::SSL';
requires 'IO::Tee';
requires 'JSON';
requires 'JSON::WebToken';
requires 'JSON::WebToken::Crypt::RSA';
requires 'HTML::Entities';
requires 'LWP::UserAgent';
requires 'Mail::IMAPClient';
requires 'MIME::Base64';
requires 'Module::Implementation';
requires 'Module::Runtime';
requires 'Module::ScanDeps';
requires 'Net::SSLeay';
requires 'Package::Stash';
requires 'Package::Stash::XS';
requires 'PAR::Packer';
requires 'Parse::RecDescent';
requires 'Pod::Usage';
requires 'Readonly';
requires 'Regexp::Common';
requires 'Sys::MemInfo';
requires 'Term::ReadKey';
requires 'Test::Fatal';
requires 'Test::Mock::Guard';
requires 'Test::MockObject';
requires 'Test::More';
requires 'Test::Pod';
requires 'Test::Requires';
requires 'Test::Deep';
requires 'Text::ParseWords';
requires 'Try::Tiny';
requires 'Unicode::String';
requires 'URI::Escape';

Just save as cpanfile and run cpanm --installdeps .

Running

There a good examples for testing the migration of single mailboxes (https://imapsync.lamiral.info/examples/imapsync_example.sh) and doing it in bulk (https://imapsync.lamiral.info/examples/sync_loop_unix.sh). From there you can craft your own scripts.

Don’t forget to test your new mail setup using https://www.mail-tester.com/ 😉

WordPress upload permissions

Here you find a quick fix for a wordpress problem i found nowhere written:

I have a wordpress instance running on shared hosting which requires having group read bit set in permissions of uploaded files to be readable by the web server. But uploaded files were set to 0600 instead of 0640 and were thus not viewable on the website.

First i tried setting FS_CHMOD_FILE but this didn’t work as it is not used for uploaded files.

Then i found that wordpress determines the permission bits of uploaded files by taking the permissions of the target directory and if needed the parent directory of it and removing the execution bits. So to solve the problem just set the permissions of wp-content/uploads and all sub directories to 0750. This way wordpress sets file permissions for uploads to 0640 as required.

Upgrading an application on Elastic Beanstalk from Amazon Linux 1 to 2

These days i have the pleasure to upgrade an application running on AWS Elastic Beanstalk from platform Multi-container Docker to new platform Docker. The former is based on Amazon Linux 1 (AL1) while the latter runs on Amazon Linux 2 (AL2). I follow this guide.

I have encountered the following advantages so far:

  • links between containers are now fully bidirectional: on AL1 those links could only be defined and used in one direction. If you wanted bidirectional communication you had to implement service discovery yourself
  • the limit on 10 containers per app is gone: in my tests i could easily spin up 12 containers.
    • apparently this limitation came from the automatic translation of an Elastic Beanstalk app into a task definition. The limitation is documented as task definition max containers.
    • on AL1 one got the following error:
frontend-staging - ERROR: Service:AmazonECS, Code:ClientException, Message:Too many containers., Class:com.amazonaws.services.ecs.model.ClientException
  • i could restart single containers without affecting the other running containers: on AL1 this would cause a restart of all containers for reasons unknown to me.
  • simplified logging – all services log to /var/log/eb-docker/containers/eb-current-app/eb-stdouterr.log which can be streamed to CloudWatch.

Magento and timezones

Today i had the issue that after dumping and importing a Magento Database from one instance of MySQL into another one the order dates got transported 2 hours into the future. How come?!?

Investigation and testing showed that the source database was running with timezone „GMT“ aka „UTC“ aka +00. The target database was running in system timezone which was CEST aka „Europe/Berlin“ aka +02. Still no problem in sight as Magento anyway stores all values in timezone „GMT“!

Turns out Magento uses data type timestamp for order dates etc. MySQL documentation states:

MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME.) By default, the current time zone for each connection is the server’s time. The time zone can be set on a per-connection basis. As long as the time zone setting remains constant, you get back the same value you store. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions. The current time zone is available as the value of the time_zone system variable. For more information, see Section 5.1.13, “MySQL Server Time Zone Support”.

https://dev.mysql.com/doc/refman/5.7/en/datetime.html

Exactly this happened: If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored.

What was the solution? Well first i wanted to set the target timezone in the dump file. This way target MySQL would convert into GMT during importing and back when the shop reads values. This works except that all values get converted into summer time as it is summer now.

Alternative and better solution: Just set the timezone for each of the shop’s connections by adding the follwing in app/etc/local.xml:

<initStatements><![CDATA[SET NAMES utf8;SET time_zone = '+00:00';]]></initStatements>

This way MySQL doesn’t convert timestamp values in those connections and Magento gets GMT as expected from the database.

So in my view there are some interesting observations:

  • Why does Magento 1 by design a double conversion of date values by storing them as timestamp in MySQL (except when the server’s system timezone is GMT)?
    • Magento creates GMT dates
    • storing such a value in MySQL interprets it as in the connections timezone and transforms it into GMT for storage.
    • reading is done vice versa: read as GMT and convert into connections timezone
  • datetime would be a more fitting data type in my view
  • Magento 2 consequently fixes this by setting in web/vendor/magento/framework/DB/Adapter/Pdo/Mysql.php:
$this->_connection->query("SET time_zone = '+00:00'");
  • MySQL automagically converts timestamps with the correct timezone offset valid for the indivdiual timestamp in the connection’s timezone

Today i corrupted MySQL…here’s how

If you want to change MySQL setting innodb_log_file_size to improve performance the top answer on stack overflow says you have to delete the redo log in /var/lib/mysql/ib_logfile* so that MySQL creates it with the new size. Before deleting you obviously shutdown mysql.

But a normal shutdown of MySQL is a fast shutdown which might leave changes stored in logs but not in table pages. This is normally OK as mysql works through the logs and processes all changes.

But by deleting the redo log…you might create an inconsistent database! As table pages might be marked dirty but the changes can’t be loaded anymore from the redo log. I did it many times with fast shutdown not knowing the risk but today mysql got corrupted with the database still working but in inconsistent state.

How to prevent this?

Old answer was: switch off fast shutdown by SET GLOBAL innodb_fast_shutdown = 0
After shutdown you can now safely delete the redo log. But this is only relevant up to MySQL 5.6.8.

So what is the correct way in modern mysql to change innodb_log_file_size?

  • shutdown mysql
  • change innodb_log_file_size
  • start mysql
    • documentation says: “If InnoDB detects that the innodb_log_file_size differs from the redo log file size, it writes a log checkpoint, closes and removes the old log files, creates new log files at the requested size, and opens the new log files.“

Of course we had backups so nothing happened except for the work.

Lessons learned

  • MySQL is really robust: it starts and allows creating a dump even with
    • corrupted databases
    • corrupted mysql database (missing tables etc)
  • read the documentation
  • read all answers on stack overflow carefully – not just the top one 😉

Chrome vs. Firefox

Just as Chrome has its jumping the shark moment* Microsoft moves to Chromium engine…they don’t have any luck with browsers since 1995 😉

I’m migrating to Firefox…

(*) So what’s wrong with Chrome?

  • removing Adblocker API
  • flagging slow sites
  • hiding parts of the current URL in the address bar – protocol + „www.“ are no longer shown. This can be fixed:
    1. goto chrome://flags
    2. Search UI Hide.
    3. Disable these 3.
  • this might be controversial: promoting encryption (HTTPS) in an aggressive way as this is per se a good measure for websites with data entry but is also a burden for website owners.

Many turn to centralized services like AMP by Google and CloudFlare to encrypt their website and make it faster. So these measures weaken an independent web and strengthen big corporations like Google.

Docker in production

Use option userland-proxy=false to avoid having a separate docker-proxy process per mapped port from a container to host. Somehow this option survived with default true into Docker 18.09.

It even isn’t documented anymore: Docs for Docker 17.09 still had a section about option userland-proxy and a good explanation how it works. This page even mentions the advantage of running without userland proxies but with iptables rules instead:

this alternative is preferred for performance reasons

https://docs.docker.com/v17.09/engine/userguide/networking/default_network/binding/

It even contains a warning that with iptables rules port conflicts are harder to detect.

But documentation for Docker 18.09 doesn’t contain an explanation about it nor its performance penalty.

So having a userland process copying traffic from host port to container port from my point of view doesn’t make sense as the same can be accomplished by iptables rules in the kernel. As the necessary rules are completely managed by Docker this is a simple improvement.

Check logging

Check logging in /var/lib/docker/containers/*/*log

  • check if you have verbose containers
  • in my case biggest producer of logs was PHP-FPM. So i had to disable access logs in its config.
  • setup log rotation in Docker

Example config

So my current config /etc/docker/daemon.json looks like this:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "userland-proxy": false
}

Suprises in software development

Sometimes software development can be quite surprising. For example, one day you get confused because code you wrote doesn’t run as expected. You think about it some time and find a workaround but still can’t explain why it ran unexpectedly. Because you got other work to do you postpone thinking about it further.

Then the next day you read an article which references a change in a programming language and suddenly you get the explanation for that unexpected behaviour the day before. Tada!

This just happened to me with PHP’s ternary operator:

$b = 1;
$c = 2;
echo (!empty($b)) ? $b : (!empty($c)) ? $c : 0;

What do you expect to be echoed? I expect 1 to be echoed which reflects the behaviour of most other programming languages. PHP prints 2 instead. This was the cause for that unexpected behaviour in my program – which i accidentally worked around by not using the ternary operator but didn’t understand why.

PHP groups the ternary operator from left as other programming languages group it from right. This behaviour is explained as request for change in PHP RFC: Deprecate left-associative ternary operator. This article is referenced as RFC for implementation in PHP 7.4. So along with many other changes PHP moves towards doing less unexpected things which is very good.

Associativity in general is explained in this article.

Restore customers from backup in Magento

I recently had two different clients who deleted customers from their instance of Magento1 and Magento2. Although it says:

Once Is Chance, Twice is Coincidence, Third Time Is A Pattern

unknown

i think this is a pattern which needs to be fixed in UI.

But first we have to recover the missing customers. Hopefully you have a sql dump of the shops’s database!

# restore backup into database mage_restore
> gunzip -c magento.sql.gz|mysql mage_restore
# extract customer from this database
> mysqldump -t --skip-extended-insert mage_restore customer_address_entity customer_address_entity_datetime customer_address_entity_decimal customer_address_entity_int customer_address_entity_text customer_address_entity_varchar customer_entity customer_entity_datetime customer_entity_decimal customer_entity_int customer_entity_text customer_entity_varchar wishlist wishlist_item > customers_restore.sql
# import in shop database
> mysql magento < customers_restore.sql

Add

--insert-ignore

if partial data exists in tables (that is: not alle customers were deleted or customers created accounts in the shop after the delete event).

Match orders in Magento 1

After importing the customers we have to link their orders back to their accounts as the reference in an order is set to NULL when the customer is deleted.

-- link orders
UPDATE magento.sales_flat_order as orig inner join mage_restore.sales_flat_order as res on orig.entity_id=res.entity_id set orig.customer_id=res.customer_id;
UPDATE sales_flat_order_grid INNER JOIN sales_flat_order ON sales_flat_order_grid.entity_id = sales_flat_order.entity_id SET sales_flat_order_grid.customer_id = sales_flat_order.customer_id; 
-- check
SELECT orig.entity_id, res.entity_id, orig.customer_id, res.customer_id from magento.sales_flat_order as orig inner join mage_restore.sales_flat_order as res on orig.entity_id=res.entity_id; 

Match orders in Magento 2

-- link orders
UPDATE magento2.sales_order as orig inner join mage_restore.sales_order as res on orig.entity_id=res.entity_id set orig.customer_id=res.customer_id;
-- if you get warnings due to missing keys see below to fix!
UPDATE sales_order_grid INNER JOIN sales_order ON sales_order_grid.entity_id = sales_order.entity_id SET sales_order_grid.customer_id = sales_order.customer_id;
-- check
SELECT orig.entity_id, res.entity_id, orig.customer_id, res.customer_id from magento2.sales_order as orig inner join mage_restore.sales_order as res on orig.entity_id=res.entity_id; 

Fix missing keys

If partial data exists in tables add keyword

ignore

in first UPDATE.
This situation arises if – after customer deletion – formerly existing customers created new accounts. So the old one can’t be created due to duplicate email addresses.
Check via

select orig.entity_id, res.entity_id, orig.customer_id, res.customer_id, c.email, c2.entity_id from magento2.sales_order as orig inner join mage_restore.sales_order as res on orig.entity_id=res.entity_id inner join mage_restore.customer_entity as c on res.customer_id=c.entity_id inner join magento2.customer_entity as c2 on c.email=c2.email where res.customer_id is not null and orig.customer_id is null;

You can fix those as well by matching the new customer account on the email address:

update magento2.sales_order as orig inner join mage_restore.sales_order as res on orig.entity_id=res.entity_id inner join mage_restore.customer_entity as c on res.customer_id=c.entity_id inner join magento2.customer_entity as c2 on c.email=c2.email set orig.customer_id=c2.entity_id where res.customer_id is not null and orig.customer_id is null;

In Magento2 you have to reindex customer_grid as final step!

Microsoft announces WSL 2

In an exciting move Microsoft announces WSL2 for Windows 10 as a successor for WSL – Windows Subsystem for Linux. WSL2 uses a real Linux kernel to be able to run native Linux executables on Windows.

The announcement mentions key features of WSL2 such as

„dramatic file system performance increases”
„you can run more Linux apps in WSL 2 such as Docker”

So probably this setup will be better suited for web development than MacOS (though not as good as the real deal Linux)
I will try my benchmark project fwshop2 with Docker on this new platform to compare to Linux and MacOS. The results for Linux and MacOS are documented in this article.