Daniel Ennis (Aikar)

Tuning the JVM – G1GC Garbage Collector Flags for Minecraft

Introduction

After many weeks of studying the JVM, Flags, and testing various combinations, I came up with a highly tuned set of Garbage Collection flags for Minecraft. I tested these on my server, and have been used for years. I then announced my research to the public, and to this day, many servers have been using my flag recommendations for years and reporting great improvement to garbage collection behavior.

These flags are the result of a ton of effort, and results of seeing it in production on various server sizes, plugin lists and server types. They have proven themselves repeatedly.

I strongly suggest using these flags to start your server. These flags help keep your server running CONSISTENT without any large garbage collection spikes. CPU may be slightly higher, but your server will be overall more reliable and stable TPS.

The JVM Startup Flags to use

Use these flags exactly, only changing Xmx and Xms. These flags work and scale accordingly to any size of memory, even 500MB)


Recommended Memory

I recommend using up to 10GB, No matter how few players! If you can’t afford 10Gb of memory, give as much as you can, but ensure you leave the operating system some memory too. G1GC operates better with more memory.

Going over 10GB may start to become subjective, but should be ok. But very few servers really need more than 10GB.

If you are running with 10GB or less memory for MC, you should not adjust these parameters.

Higher Old Generation Memory Needs

If you have a high player count and use more than 10GB of memory, and are seeing old generation lag spikes, you may want to adjust the following:

  • -XX:G1MaxNewSizePercent=60
  • -XX:G1NewSizePercent=40

Technical Explanation of the Flags:

  1. -Xms matching -Xmx – Why: You should never run your server with the case that -Xmx can run the system completely out of memory. Your server should always be expected to use the entire -Xmx! You should then ensure the OS has extra memory on top of that Xmx for non MC/OS level things. Therefore, you should never run MC with -Xmx settings you can’t support if java uses it all.Now, that means if -Xms is lower than -Xmx – YOU HAVE UNUSED MEMORY! Unused memory is wasted memory. G1 (and probably even CMS to a certain threshold, but I’m only stating what I’m sure about) operates better with the more memory its given. G1 adaptively chooses how much memory to give to each region to optimize pause time.If you have more memory than it needs to reach an optimal pause time, G1 will simply push that extra into the old generation and it will not hurt you (This may not be the case for CMS, but is the case for G1)The fundamental idea of improving GC behavior is to ensure short lived objects die young and never get promoted.With the more memory G1 has, the better assurance you will get that objects are not getting prematurely promoted to the old generation.G1 Operates differently than previous collectors and is able to handle larger heaps more efficiently. If it does not need the memory given to it, it will not use it. The entire engine operates differently and does not suffer from too large of heaps, and this is industry wide accepted information that under G1 to keep Xms and Xmx the same!
  2. UnlockExperimentalVMOptions – needed for some the below options
  3. TargetSurvivorRatio: I’m sure your all use to seeing this one suggested. Good news! It’s actually a good flag to use :DThis setting controls how much of the Survivor space is ABLE to be used before promotion. If survivor gets too full, stuff starts promoting to Old Gen. The reason behind this is to be able to handle memory allocation spikes.However, MC allocation rate for most part is pretty steady (steadily high…..), and when its steady its safe to raise this value to avoid premature promotions.
  4. G1NewSize Percent: These are the important ones. In CMS and other Generations, tweaking the New Generation results in FIXED SIZE New Gen and usually is done through explicit size setting with -Xmn.With G1, things are better! You now can specify percentages of an overall desired range for the new generation.With these settings, we tell G1 to not use its default 5% for new gen, and instead give it 50% at least!Minecraft has an extremely high a memory allocation rate, ranging to at least 800 Megabytes a second on a 30 player server! And this is mostly short lived objects (Block Position)Now, this means MC REALLY needs more focus on New Generation to be able to even support this allocation rate. If your new gen is too small, you will be running new gen collections 1-2+ times per second, which is really bad.You will have so many pauses that TPS has risk of suffering, and the server will not be able to keep up with the cost of GC’s.Then combine the fact that objects will now promote faster, resulting in your Old Gen growing faster.Given more NewGen, we are able to slow down the intervals of Young Gen collections, resulting in more time for short lived objects to die young and overall more efficient GC behavior.if you run with a larger heap (15GB+), you may want to lower the minimum to say 40%, but don’t go lower than 30%. This will let G1 have more power in its own assumptions.
  5. G1MixedGCLiveThresholdPercent: Controls when to include Mixed GC’s in the Young GC collection, keeping Old Gen tidy without doing a normal Old Gen GC collection. When your memory is less than this percent, old gen won’t even be included in ‘mixed’ collections. Mixed are not as heavy as a full old collection, so having small incremental cleanups of old keeps memory usage light.
  6. AlwaysPreTouch: AlwaysPreTouch gets the memory setup and reserved at process start ensuring it is contiguous, improving the efficiency of it more. This improves the operating systems memory access speed.
  7. +DisableExplicitGC: Many plugins think they know how to control memory, and try to invoke garbage collection. Plugins that do this trigger a full garbage collection, triggering a massive lag spike. This flag disables plugins from trying to do this, protecting you from their bad code.
  8. MaxGCPauseMillis=100: This setting controls how much memory is used in between the Minimum and Maximum ranges specified for your New Generation. This is a “goal” for how long you want your server to pause for collections. 100 is equal to 2 ticks, aiming for an at most loss of 2 ticks. This will result in a short TPS drop, however Spigot and Paper both can make up for this drop instantly, meaning it will have no meaningful impact to your TPS. 100ms is lower than players can recognize.
  9. +ParallelRefProcEnabled: Optimizes the GC process to use multiple threads

Using Large Pages

Also for Large Pages – It’s even more important to use -Xms = -Xmx! Large Pages needs to have all of the memory specified for it or you could end up without the gains. This memory will not be used by the OS anyways, so use it.
Additionally use these flags (Metaspace is Java 8 Only, don’t use it for Java7):

Code:
 -XX:+UseLargePagesInMetaspace

Thanks to https://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection for helping reinforce my understanding of the flags and introduce improvements!


Changelog

  • 10/4/2018: Removed AggressiveOpts and InitiatingHeapOccupancyPercent. Aggressive is removed in Java 11, and IHOP may hurt performance in Java 11. You should remove them for Java 8 too.
  • 8/18/2018: Adjusted MixedGCLiveThreshold to 35 (from 50) to ensure mixed GC’s start earlier.
    Added notes about recommended use of 10GB of memory.
    Added more flag documentation
  • 5/24/2018: Added -XX:+ParallelRefProcEnabled

 

Truth about Saturated Fats, and why Coconut Oil is not the worst ever

You may of recently seen something being shared recently  “Coconut oil isn’t healthy. It’s never been healthy.” from USAToday/BBC.
The AHA decided (or people with money) that since the public has increased use of Coconut Oil, that they needed to send a “refresher warning” about the dangers of “Saturated Fat.”
However, there’s a major problem here. These articles/studies assume that Total Cholesterol (TC) is the way to calculate risk for CVD (Cardiovascular Disease).
This is based on the “Lipid Hypothesis” (LH) theory.
However, more and more studies are coming out disproving the LH. As reported in the documentary “Fat Head”, linked at bottom of this post, the LH was founded on manipulated data by not including other data sources that showed inconsistent results regarding intake of fat and heart disease.
 
I’ve done quite a bit of research into this stuff lately, and found some great pubmed articles about saturated fats and cholesterol.
 
 
Studies continuously show that saturated fats alone might not actually be linked to CVD.
 
The study linked above show that replacing saturated fats in your diet with carbohydrates actually increases your risk of CVD.
 
Ultimately it boils down to that your Total LDL Cholesterol — DOES – NOT – MATTER —
 
This is referred to as “LDL-C” on blood work.
 
What matters is LDL-P, the count of SMALL particles.
 
Yes, Saturated fats are increasing LDL-C, but they are doing so by creating larger fluffy particles that do not cause CVD, and may even be beneficial/protective.
 
What actually matters is your HDL to LDL ratio, and the LDL-P counts.
 
So, about that Coconut Oil….
To call out the EXTREMELY important information from those 2 pubmed articles.
 
Lauric Acid, the C12 chain of saturated fat, which is 50% or more of coconut oil, is found to increase HDL in such great numbers that it’s disproportionate to the LDL increase, overall meaning that it improves your cholesterol by giving you more HDL than it does LDL.
 
Your Cholesterol Ratio will lower by consuming C12 chain saturated fats, which coconut oil is high in, as well as massively useful MCT chains (6 and 8).
 
Saturated Fat will be a primary way that will raise your HDL.
And the LDL raise that it causes will be of the fluffy kind, which are not a concern.
 
If you want to learn more about how the lies of the Lipid Hypothesis started, and learn more about cholesterol, see this documentary: https://www.youtube.com/watch?v=Pue5qVW5k8A
 
 
Now, go use that coconut oil and feel good about it.

Coding With Aikar – Live Streams!

Hello,

Lately I have been live streaming while working on various Empire Minecraft or Website related tasks! I like the idea of Live Coding as it invites others to learn from you, and on the flip side, allows others to offer constructive criticism to you to improve your code.

I’ve done some work to be able to split stream to 4 (or more if needed) services at the same time, so you can subscribe and watch on any of the following services:

NOTICE: Due to Mixer being a much superior platform, I’ve decided to end streaming to other platforms so that I can use FTL
Please subscribe and view my streams on Mixer only.
The method I used to stream to multiple before is still below.

 

Please subscribe to get alerts on when I go live!

discord-logo

https://aikardiscord.emc.gs

If you are interested in Live Streaming to multiple services at once, follow this guide:

http://linustechtips.com/main/topic/174603-how-to-live-stream-to-multiple-services-with-a-rtmp-server/

And here is my example config file:

I’m also using this fancy Bash Script using xdotool to automate switching scenes with OBS, install xdotool, assign 2 scenes a hotkey for control + alt + [ and control + alt + ]

Spigot Tick Limiter: Don’t use “max-tick-time”!

Something that many of us in the Spigot Development community really dislike is Spigot’s Tick Limiter. Here is my response to what a user wrote on a spigot thread:

PaperSpigot does not offers "max-tick-time", that's why I consider spigot over it^^

There’s a reason PaperSpigot doesn’t offer it: You shouldn’t use it! That system is fundamentally broken in implementation and can cause inconsistencies with your server.

I Strongly recommend using Paper Minecraft, which offers better performance than Spigot, WAY more features, and does not have this buggy system.

paper.emc.gs

Why it’s broken

With my Entity Activation Range implementation, I found it is not safe to skip ticking some entities, and then, on the entities that you do skip, you must call some elements of code to keep other parts in consistent and in an expected state.

Tick Limiter blows all of that out of the water, and ignores all of that research and code wrote to ensure entities stay consistent.

Additionally, Tile Entities have not even received a real full pass to ensure they all behave when skipping ticks, and I am confident that skipping ticks actually FULLY BREAKS Some tile entities, as they expect a consistent tick rate in relation to the servers current tick ID. Skip a tick? Now that code that acts on TickID % 20 == 0 will not even be hit!

Tick Limiter also suffers bugs that will skip ticking entities at random each impartial pass. You may have an entity that goes 2 seconds without a tick! I have an upcoming PR that will fix that specific bug though, but it doesn’t fix the overall flaws and design of the system

Then, the idea behind it is simply flawed. Entities and Tile entities are not some optional element of the server. It is unstudied/unproven what drastically bad things can happen when you now throw vanilla logic out the door about when an entity tick rate in comparison to the rest of the server.

Not to mention it skips ticking players! meaning things like chunk loading and Entity/Tile Entity state sent to the client gets delayed.

And finally, it’s not helping you. If your server is so lagged out that you are making the tick limiter kick into effect:

DO SOMETHING ABOUT IT

Don’t band-aid it with a system that HIDES the fact it’s lagging so bad. If Tick Limiter is ‘doing something’ for you, your server is overloaded and you’re not able to support the current load that is being presented to it. You should either reduce your player  count, reduce your entity count, upgrade your hardware, and consider ideas for adding new minecraft servers to your server to distribute the load.

Your server should be able to stay under 50 ms (20 TPS) within what Entity Activation Range provides you, which is a SAFE form of skipping Entity Ticking. Then your entities stay in sync with the servers tick rate, and results in less unknown state of your server.

How To Disable It

Set each settings for max-tick-time to 1000 to ensure it does not get ran on your server.

Getting it Removed / Future Improvements to Minecraft

I asked Spigot to remove this patch as it’s caused us to do extra / hard work just to figure out how to make REAL performance improvements work with that system. It would have been better for everyone if that system just didn’t exists, as I have 1 pending PR that was made difficult because of it. I have another idea that would DRASTICALLY improve server performance, that is simply not possible to support the idea of a tick limiter with it.

But, thankfully I’ve given this idea to Mojang and hopefully they will implement it and force Spigot to remove it.

Spigot doesn’t want to remove it because of uninformed users of the feature not realizing how bad it is, THINKING it’s ok to use and doing good for their server when it’s not.

Please, share this post and get users to stop using this broken Spigot feature and push for its removal.

Friends don’t let friends use the tick limiter.
Put in your Spigot signature:

[url=http://aikar.co/2015/10/08/spigot-tick-limiter-dont-use-max-tick-time/]You should not use Spigot's "max-tick-time" - Learn Why[/url]

Advanced Minecraft Marketing Analytics

I’ve mentioned a few times the power of my analytics system in the Spigot community and recently in AdminCraft on Reddit. Most advertising campaigns simply track impressions and clicks. But to a server owner, we need more data than that…. We need to know who actually joined the server and continues to play!

I have designed such a system, where I can track clicks, how many joined the game server, then out of those, who stay to the day 5, 15, 30, 60 and 90 markers! I can see when campaigns generate a lot of clicks, yet low players, or worse: players who don’t actually stay.

This article will give a rough idea on how I did it incase someone is curious to implement it on their own.

I want to be straight that the implementation isn’t perfect or the most optimized, but hey it gives you data where you normally would not have data, so that’s a clear winner!

Requirements / Disclaimer

This system only works for advertising campaigns that involve website clicks. Campaigns that serve purely as a banner and a server connect address can not be realistically tracked.

My implementation all revolves around Google Analytics, but the concept can work for any form of marketing system you want to adapt it to.

You should be using analytics.js (I do not think this works with ga.js, the old Analytics system) on every page of your site.

You will also need to be an experienced developer or have one on hand.

I WILL NOT HELP ANYONE IMPLEMENT THIS. I do not do contract work / consulting services, but feel free to ask questions on Spigot IRC.

Tracking that click

First off, my system requires that all inbound campaigns need to land the user on the website. If someone see’s your server connect address and connects directly, then you can’t track that – but hey! That’s usually a FREE conversion , so don’t complain 🙂

Here is an example database table that I use:

CREATE TABLE `conversions` (
 `clientid` varchar(255) CHARACTER SET latin1 NOT NULL,
 `convert_id` varchar(255) CHARACTER SET latin1 NOT NULL,
 `ip` int(10) unsigned NOT NULL,
 `ua` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
 `date` int(11) unsigned NOT NULL,
 `lp` text CHARACTER SET latin1,
 `converted` tinyint(4) unsigned NOT NULL DEFAULT '0'
 PRIMARY KEY (`clientid`),
 KEY `ip` (`ip`),
 KEY `converted` (`converted`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

our goal is to assign a Google Analytics Client Id to every user. Additionally, I assign a unique ID to every user separate to the client ID, as some users may have addons that constantly change their Google Analytics Client ID, so this prevents inserting an entry into DB every time their client id changes.

When a user comes in and has a Google Client ID set, assign them a convert ID. then insert an entry into the database.

Explanation of fields:

  • clientid: Google Analytics Client ID
  • convert_id: Field you generate to associate to a session
  • ip: users IP address – to be passed later to Google Analytics for conversion and match to a newly joined player
  • ua: Users User Agent – to be passed later to Google Analytics for conversion
  • date: current time when seen
  • lp: URL the player first landed on – so you can know what web page they visited.
  • converted: has this record triggered a conversion not.

So, once you have their IP and GA client ID tagged into the database off that first page load… You got them 🙂 If they join, you will know what campaign they came from.

Monitoring Conversions

Now that you have them in the database, you need to have a monitoring system that checks your game server. This is the trickiest part of pulling this off, as how you do this will really vary based on your servers data infrastructure.

We have a users database table that we can easily query on and find who has played in the past few hours and has not triggered conversion events. So we run a query, get the users IP address, and then look for matching conversion table entries on the same IP address that is also not marked as converted.

In the event 2 people joined on same IP from 2 diff devices, you would have 2 entries in the conversions table, and then each would match up to each other, still resulting in 2 conversions. If 2 accounts join but only 1 device hit the website, then only 1 conversion should be marked initially, but they are likely to then trigger a 2nd conversion event once they hit the website with a new device (that doesn’t match the convert_id you generated).

Now… the actual marking of a conversion when this happens!

Setting up your conversion event

Now you need to configure Google Analytics to have a goal to count the conversions.

Go to Admin > Your Property > Your Primary View > Goals

Add a new Goal using an Event like so:

This step doesn’t have to be done first, but it does need to be done to view it and there’s no benefit to do it after rather than before.

Google Analytics Measurement Protocol

The way the Google Analytics tracking system works is extensively documented, and even encouraged for you to do your own backend server side reporting of the event on behalf of the user!

Read all the documentation: Google Analytics Measurement Protocol.

Now we’ve got a list of rows from the conversion table of who joined the game server, with their GA Client Id IP and URL.

Now your backend monitoring system needs to call out to Google Analytics in the exact same fashion the client browsers do, but I’ll save you the trouble, here is the code I use to do this:

 

I do even further events like Day 5/15/30/60/90 tracking, so I pass in other events for those. To support that type of data, you will have to analyze your data and figure out the best way to compute who should be converted for those cases.

Tagging Every Inbound Link

Now you have the infrastructure set up, you need to ensure every direct campaign you control is tagged! This involves setting utm_* parameters on your inbound links.

Learn more about UTM Parameters

I personally use our emc.gs url shortner, then create links that redirect to pages with the utm parameters, so that I can do “reddit.emc.gs” and it will tag that link with the Reddit tracking parameters. If you want to set up a url shortener like we have, check out YourLS.

I have modified it for our needs though, making it so &gac = &utm_campaign, &gas = &utm_source and &gam  = &utm_medium for quick changes of properties like reddit.emc.gs/?gam=ad-4

I suggest setting up a redirect system on your main website domain and not an alternate domain like we do, because many advertising systems dislike domain mismatches, and we had to set up empireminecraft.com/go/foo to redirect to emc.gs/foo which then redirects back to the target of emc.gs/foo just to trick some of those platforms.

Enjoy having data about your campaigns and who sends you the best traffic 🙂

Bash Colors for Minecraft Shell Scripts

If you ever wanted to write scripts in Node.JS (or any JavaScript runtime) for Minecraft, one may be interested in the ability to convert the Minecraft Color Codes into their Bash Color Code versions.

I have wrote a few methods (color map credit to mcrcon) that will help with outputting colored text using Minecraft color codes.

Usage is like

console.log(mccolor(c("3", "red Text) + c("a", "green text")));

Async Development – Task Chain – Java Control Flow for Bukkit

Reposting because Google has bugged out and ended up dropping this from Google :/ So making it look new to get it back on Google.

So as any Bukkit developer knows, the API is not thread safe! And to make matters worse, there is no concrete Java Control Flow API in Bukkit.

However at Empire Minecraft our server very heavily depends on our MySQL Database to provide features. So running database queries on the main thread is common but undesired, and Java control flow is needed.

Running queries async creates complicated java control flow issues, need to run this query… now need to access the bukkit api, so return to Sync processing, oh wait, now I need to act again with another database query!

Easy to avoid all that java control flow trouble by running everything sync – but then performance can be hurt.

Node.JS / JavaScript has this problem in great detail, and turns code into call back hell, so there are plenty of Flow Control libraries out there like Q, Async, Chainsaw and more.

Therefor, to avoid Java running into this same callback mess, I wrote an elegant Java Flow Control system on top of the Bukkit Scheduler.

Read more

Productivity – A key skill in development

Been a while since I posted, but I’ve recently got a few things down lately I felt the need to share.

A major asset to being a developer is the ability to be productive. Remember, time is money. One thing that always irks me when working with junior developers or designers is them doing actions that I felt could be done much quicker with a proper IDE or practice. So much time is wasted doing repetitive things every single day.

Ever watched someone navigate a folder hierarchy to get to a file they want to open? A good 30 seconds or more can be spent there, especially if that file is on a network mount!

With a proper IDE with good navigation support, navigating to a desired file can be drastically reduced. I personally own a license to IntelliJ IDEA 14 Professional, and use it for my Java projects and until recently Web Projects. I value efficiency so much, that I now have even purchased PhpStorm 8, even though IDEA supported PHP. Namely for things like Debugging, but thats another topic.

PhpStorm tailors the experience to web development more, but some of the key notes are still in IDEA.

Things such as opening a file, if you work in an extensive codebase thats been developed over many years, can likely be pretty deep and take a while. But with IDEA and PhpStorm (And likely other IDE’s too, I know VIM does similar), a Key Combo away gives you a nice search box that you can type in a class name or file name and find the exact file you want.

If that file is on a network mount and you have a varying file structure that you cant remember exactly where that file resides, how many minutes per ACTION are you wasting opening up that file?

Then inspections… The ability to see errors in your code before you even run it. I’m not talking about just syntax errors here. Typo in a variable name? That’s not a syntax error, but a good IDE will show you that variable is unused, and another variable hasn’t been defined yet, giving you a sign you have a problem before you even leave your editor.

This is just the tip of the iceberg. IDE’s are built to save you time, yet so many people feel they are hard core because they built a website in Notepad++. You are not hard core, you are wasting time. Use the tools that have been built to make you more productive, and strive to increase your productivity.

I personally recommend Jetbrains products as they have a STRONG focus on productivity (Help -> Productivity Guide) and get more small productivity features all combined into 1 product that many other smaller editors have. You have so many goodies to play with to make you more productive. But the key point, use an IDE, be productive, save you / your company time, and stop trying to show off by claiming you did it all in Notepad. The only people your impressing is juniors who don’t know any better.

If you have a good developer skillset, don’t waste your precious time on inefficiencies in workflow.

Apache Macros – Simplify your config

Using Apache Macros

Many people host small time hobby websites or even websites for family members, friends and clients on a single server. This will lead to quite a lot of repetition for the same apache site definitions over and over again. Thankfully Apache Macros mod will solve many of these issues. This mod will let you create config templates, that can then be re-used over multiple sections of code, allowing you to pass in variables to fill in on use.

Lets get it installed!

sudo apt-get install libapache2-mod-macro
sudo a2enmod macro

Now, you can start using Macros in your site definitions, to replace common configurations.

» Official Documentation for Apache Macros

Examples of Apache Macros

In this you can see a pretty simple Domain macro. This will set the ServerName, and sets a www. alias. Now we look at the Site macro. In this example you will see it calling other macros for Log, GrantAccess and ForceDomain.

To use this, one could simply add inside of the <VirtualHost> this line:

Use Site mysite.com

And then accessing mysite.com would redirect to www.mysite.com, and log to /var/log/apache2/sites/mysite.com_access.log. Needless to say that likely cuts out 99% of the configuration you’re doing for a simple wordpress site you host for a relative.

And since the Apache Macros are parsed at config load, there’s no impact to your servers performance for using it!

For enterprise grade setups, you’re likely already using Puppet to get the same benefits and only running 1 product per server any-ways, but for those of us kicking it hobby level, Apache macros helps quits a bit! Enjoy 🙂

Filtering Spam before Forwarding Email with Postfix/SpamAssassin

One feature many cPanel/Shared Webhosts has is an option to forward your email to a different address. Very useful if you want to have multiple email addresses but check it all in one place (Gmail) like I do. But if you’re like me, you’ve likely migrated onto your own dedicated server you manage yourself, and its likely your making mistakes with email forwarding and filtering spam!

The problem is that that when you receive spam, you are also forwarding spam to your email provider, which makes them upset with you and tarnishes your servers IP address. I did this for years! I always thought that Gmail would be smart enough to see the path in the headers to realize it was forwarded – but then thinking about it – why would Gmail trust me that those servers actually sent the email and that I didn’t just spoof those Received: lines to blame someone else?

When I recently migrated my host, I put in a lot more effort into filtering the spam before it even hits Gmail, and learned quite a few things.

Filtering Spam with Postfix

First Off: Initial Connection Client Checks – These stop a majority of the spammers, and its so simple!
Add this line to your /etc/postfix/main.cf:

smtpd_client_restrictions = permit_mynetworks permit_sasl_authenticated reject_unauth_destination reject_rbl_client zen.spamhaus.org reject_rbl_client bl.spamcop.net reject_rbl_client cbl.abuseat.org reject_unknown_client permit

This will enforce a lot of restrictions on the client, namely the Zen Spamhaus check, which knocks out so many spammer connections!

Filtering Spam with SpamAssassin

If you haven’t already installed SpamAssassin, do so now. There is a bit to this than I want to put into this post, so follow this sites guide: http://plecko.com.hr/?p=389

His instructions look spot on to me. Key thing I did not do on my setup and I just realized I needed to do: enable CRON=1! I’ve been running with stale SA Rules… But his guide covers it!

Next up is this page: http://wiki.apache.org/spamassassin/ImproveAccuracy

One thing it mentions is missing Perl Modules that SpamAssassin can try to use. For me, I had to run these commands to get them all installed.

sudo apt-get install libgeoip-dev
sudo cpan Geo::IP Mail::DKIM Encode::Detect DBI IO::Socket::IP Digest::SHA1 Net::Patricia

I don’t know what some of them are for, but SpamAssassin is obviously trying to use them, so give them to it!

Passing SPF Checks

Then there is SRS Rewriting. One problem with forwarding email is that it makes every one of your emails now fail SPF checks, because it looks like your server is sending mail for InsertBigNameDomain.com which does not authorize you to send mail on their behalf.

SPF is considered a “broken” implementation, and it is preferred that system admins use DKIM instead as a way to verify authenticity of an email, so ideally you need to rewrite the return path to be your own server name instead.

I used this guide: https://www.mind-it.info/forward-postfix-spf-srs/
Which summarizes down to

sudo apt-get install cmake sysv-rc-conf
cd /usr/local/src/
wget https://github.com/roehling/postsrsd/archive/master.zip
unzip master
cd postsrsd-master/
make
sudo make install
sudo postconf -e "sender_canonical_maps = tcp:127.0.0.1:10001"
sudo postconf -e "sender_canonical_classes = envelope_sender"
sudo postconf -e "recipient_canonical_maps = tcp:127.0.0.1:10002"
sudo postconf -e "recipient_canonical_classes = envelope_recipient"
sudo sysv-rc-conf postsrsd on
sudo service postsrsd restart
sudo service postfix reload

Now when you inspect a received emails header, you will see that the ReturnPath is now something like  <SRS0+9CLa=52=paypal.com=service@starlis.com>
And your SPF will now pass (You do have SPF records set right for your domain?)

Dropping the Spam

Now the final part… getting rid of that spam before it goes to Gmail!

In /etc/postfix/header_checks (you likely will need to create this file), add this simple line:

/^X-Spam-Level: \*{5,}.*/ DISCARD spam

then in /etc/postfix/main.cf:

header_checks = regexp:/etc/postfix/header_checks

This will drop the spam, but you may want to only drop higher level spam, so instead you could change the 5 to a 7, and then add to your /etc/spamassassin/local.cf (might already be there commented out):

rewrite_header Subject *****SPAM*****

This makes it so that any spam that doesn’t get dropped, has SPAM prepended to the header, which Gmail suggests you do if you do end up forwarding spam to Gmail.

With this approach, low score (5-6) spam will be forwarded but makes Gmail happy that you told them its spam ahead of time, and 7+ spam won’t even bother forwarding.

Taking these steps will help you maintain a good mail sending reputation (Hopefully I don’t have to repair mine too much…). Good luck 🙂

Final note for Gmail users

And one final step if you are using Gmail, ensure EVERY email address that you receive mail from that is forwarded to Gmail is added as a “Send Mail As” account. Gmail uses this list to know it is a forwarded address, and will be more lenient in spam rules. I don’t know if other ESP’s do this, but Gmail has requested you do this if you forward mail to them.

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 221 other subscribers

I am Senior Software Engineer and Entrepeneur. I am an enthusiast and love creating things. I operate my own side company in my free time called Starlis LLC, working in Minecraft.

I enjoy doing things right and learning modern technologies.