Archive for the ‘server’ Category

Webserver switched from lighttpd to nginx

Saturday, December 5th, 2009

For some time now, I have been experiencing weird bugs with my webserver. I was running a lighttpd, version 1.4.22. Two annoying bugs occurred:

The first was more a nuisance when uploading files through a POST, which resulted in error 417 "Expectation failed" on the first attempt (while on the second it worked).

The second bug resulted in a reproducible denial-of-service, as it crashed the webserver. It occurred whenever a firefox-webbrowser attempted to connect to the server through HTTPS.

I hoped these bugs to be solved by upgrading from 1.4.22 to 1.4.23 and 1.4.24, but that did not happen. As I needed a solution to these bugs, I decided to switch from lighttpd to nginx.

The transition went surprisingly smooth. The configuration is a bit more complicated as it was with lighttpd, but easily set up and quite good explained on nginx's website, too.

Now, nginx runs all my websites, interacts with PHP through FastCGI, and problems seem to be gone.

New Server - Update (5): ROC - my personal royal PitA

Wednesday, December 2nd, 2009

Yeah, I guess it was a bit naive, but I tried it again. After successfully using the harddiscs in the Compaq DL360 without the ROC-Chip, at full speed and fully write-cached, I thought I re-insert the ROC-Chip and give it another try. I configured the harddiscs to run as a single RAID-1 drive...

...and yeah, you'll probably guess it already: it failed. The ROC-Chip seems to detect and disable the on-disc write-cache hardwired, as it seems. While installing the NetBSD operating system, the performance went down like hell, and I got furious.

The only solution seems to be a 64-bit PCI SmartArray-Controller-Card, which needs to be connected via a special internal SCSI-cable to the mainboard in order to bypass the onboard controller. I am currently trying to find it on the internet, but eBay was no help so far. May be I have found a reseller who can help - I have to contact them, yet. I hope they can provide such a cable, as the hardware-RAID seems to be a lost cause otherwise on that box, and I really don't want to create a new RAIDFrame RAID-1 on a system with only two harddrives total. Especially not, when the machine has the (theoretical) hardware-RAID capability as the DL360 has.

Well, we'll see how it will turn out.

New Server - Update (4): Harddisks, SmartArray, and ROC

Wednesday, May 6th, 2009

In several postings in january, I wrote about a new server, which should replace the current server "unknown". I wrote about serious problems delaying that exchange, and these problems lay in the harddisk-performance.

As I wrote, the "new" server is a Compaq DL-360 (Yes, that is the complete model-name, it is the first generation of DL-360ies) . It came with two 18.2GB harddisks, and had a wonderful fast performance. I replaced both harddisks with compaq-certified 72GB UW-320 harddisks, and installed NetBSD 5.0 on it. While unpacking pkgsrc.tar.bz2, I wondered why the machine took so long...over six minutes for tarball of about 40MB. 2 minutes would have been more appropriate. At that time, I had no further for experimenting and analyzing, but on last sunday, I took up the matter again. I plugged the 70GB harddisks into Ymir, reconfiguring the Mylex DAC960PD-Controller to accept the new disk-pair as a RAID-1 drive, and ran

# tar -xjf pkgsrc.tar.bz2

It was over after 01:55 minutes, and that on a PPro-200. The DL-360 has a Pentium III 1266 and thrice as much RAM.

After this result, I examined the DL-360 again, searched the internet for information, and started a discussion on IRCnet-channel #netbsd. The advice given most often was "that sound like a write-back-cache-problem". And it was - or better, is.

What has confused me all the time was the fact that the machine claimed to have a SmartArray-Controller. Well, it has, but only a stripped-down one: It lacks a write-back-cache.  The integrated SmartArray-Controller is called "ROC" in the manual, which is short for "RAID-On-a-Chip". It is an addon-card, which I have removed now, and unless I can find a SmartArray-Controller-Card (yes, I know, they are sold on eBay), and more importantly the cable which connects the extra SmartArray-Controller with the scsi-backplane to override the ROC-Controller, I will install a NetBSD on the machine in a RAIDFrame RAID-1 again. In the end, a Software-RAID is better than no RAID at all, and RAIDFrame is really great. Unknown runs it since I have setup the machine the first time, which is several years now.

Back online

Saturday, April 18th, 2009

After one and a half days downtime, the system is back online, running happily on a NetBSD 5.0_RC3. With it the blogs and all the other services hosted and provided by this machine are available again of course.
I will provide a detailled report of the upgrade-process later, and the page about this machine will as well be updated later.

Server-outage within the next week

Sunday, April 12th, 2009

Hello everyone, with this post I'd like to inform you about a planned outage of this server within the next few days, and apologize for it. The cause is an OS-upgrade from NetBSD 3.1 to NetBSD 5.0 RC3.

The exact date of the outage is not fixed yet, but it can be expected to happen on thursday. It is hard to say how long it will take, because I have to upgrade the server's operating system and as with all such major system-upgrades, every piece of installed software has to be recompiled against the then installed libs. Though I will of course prioritize and compile the mostly needed applications such as lighttpd, mysql and php first, it will still take some time to be done.

Please be patient, as I will be back online, and stay tuned. :-)

Nidhoegg: Updating a Security Host to NetBSD 5.0 RC3

Monday, April 6th, 2009

My network at home is secured against attacks from outside (and inside as well) by a strict set of ACLs within my Cisco C836 Router, and a SUN Ultra-5 serving as a proxy-host for several protocols. Until recently, this Machine was running a NetBSD 3.0.1 Userland with a self-compiled 3.1RC2 Kernel. It's pkgsrc-installed applications were also just as outdated, and so I decided to upgrade the whole box.

Usually, when computers engaged in securing your networks are to be upgraded, one goes for stability, security and definitely not for the newest OS-Version. In this case though, because of "RC" in NetBSD does not mean "development" (that would be "-current"), and because NetBSD 5.0 has certain features NetBSD 4.0.1 does not have, i.e. wapbl, I decided to really go and try NetBSD 5.0RC3. Following is a report about how the upgrade-process worked out...  :)

The Upgrade

Installing NetBSD 5.0RC3

I downloaded the bootable ISO-image from a french mirror-server, because either I was too blind, or the german mirrors currently do not have NetBSD-daily-images. After burning the image onto a CD, making some final backups, and sending a small prayer to /dev/null, the almighty Computer-God, I rebooted the machine. Starting from CD on a SPARC-box is usually as easy as nothing else. After System-Initialization (a.k.a POST on PCs), simply press Stop-A, and at the OK-Prompt enter

OK> boot cdrom

This time, however, the system told me that there was a problem reading the device. I remembered having had problems with the concurrent use of CD-ROM and Harddisk, and opened the Ultra-5. As I remembered, the CD-ROM-Drive was unplugged. I reattached it, and voila, now it worked.

So I ran the installer, partitioned my 8GB Harddisk in a single partition using FFSv2 with the option "log" enabled. This option was introduced by the wapbl-driver donated from Wasabi Systems Inc., and significantly improved by the NetBSD-developers. It provides a journalling filesystem on top of FFSv2, and improves the overall-performance of the filesystem as well. What I was ignorant of at this time of the process was the fact that the sparc64-port of NetBSD does not boot from FFSv2, because it has some issues with the different superblocks of FFSv2. Because I did not notice the culprit to be FFSv2, I tried several things, before finally using FFSv1 for the root-partition ("/"), and splitting up the partitions like this:

Filesystem        Size  Mounted on
/dev/wd0a         1.0G  /
/dev/wd0d         2.4G  /var
/dev/wd0g         2.9G  /usr
/dev/wd0e         482M  /home
mfs:165            62M  /tmp
/dev/wd1d          28G  /data

All partitions (except /tmp and / ) are using FFv2/wabbl (option log). The drive wd1 was added later after the installation process was finished and the CD-ROM drive was unplugged again. It's partition wd1d is using FFSv2 with wapbl, too, of course. :)

I added the log-option to the root-partition using FFSv1, too, just as a test.  mount(8) tells me, it is using the option, though I can hardly imagine that. wapbl(4) states about FFSv1 and the log-options this:

WAPBL requires the super block to be in the UFS2 format. Older FFSv1 file systems will need to be updated to the newer super block layout with the -c option to fsck_ffs(8).

Since I definitely did not update the superblocks (remember the booting-issue), and the machine definitely is booting now, I don't trust mount(8)'s output in this regard anymore, and have changed the option for the root-partition from log to softdep.

Note: log and softdep are mutual exclusive and lead in RC3 to a kernel-trap, though the kernel claims to ignore softdep. I have already submitted a problem report regarding this issue (kern/41161).

Configuring and Installing Packages

Now after the installation was completed, I started to download the pkgsrc-current tarball, again from the french-server since the german servers do not seem to mirror them as well. After untarring pkgsrc into /data, I set some CFLAGS, since I like to squeeze out every bit of performance available, as long as it does not break the system. My CFLAGS on the Ultra-5 look like this:

CFLAGS=-mcpu=ultrasparc -mtune=ultrasparc -m64 -mvis -O3 \
       -funroll-loops -fomit-frame-pointer

Of course the CXXFLAGS are equal to CFLAGS. :)

Now I bootstrapped pkgsrc (I know that on NetBSD this is not necessary, but out of a habit, and because the mk.conf is automatically generated and the tools are compiled with the above CFLAGS as well, I do it. It's a matter of taste, and according to some people probably of waste (of time) as well, but I don't mind :) ). Parallel to the bootstrapping, I configured the DHCP-, NTP- and SSH-server that came with NetBSD-base and launched them, so that those were already up and running again.

After that was done, I started installing the services I required:

  • misc/screen
  • net/bind9 - name-server
  • net/dante - socks-proxy
  • www/squid30 - webproxy
  • www/adzap - advertisement-zapper for squid
  • net/ra-rtsp-proxy - real-audio rtsp-proxy

With screen installed, I left the machine to itself and changed into the more comfortable living-room and my laptop. I ssh'ed to nidhoegg, launched screen, and continued from there with installing the apps in the list above. ntpd and named9 are running chrooted, btw.

I mentioned www/adzap in the list above, a squid-redirector I came to love over the last years while using squid 2.x . Sadly, it is incompatible to squid 3.0, obviously because the way squid talks to its redirectors has changed significantly. May be, if I find the time, I will look into this, and patch it so that is work with squid 3.0, too. Currently, adzap is not in use anymore, because the adzap-processes were dying so fast that browsing was impossible.

But before I could use the squid at all, I had to build my own NetBSD-kernel. Why, you might ask. The answer is simple: because I use diskd as storage-method, and diskd requires the following options set in the kernel, which aren't in GENERIC:

options         SYSVMSG         # System V-like message queues
options         SYSVSEM         # System V-like semaphores
options         SYSVSHM         # System V-like memory sharing

options         SHMMAXPGS=8192
options         MSGMNB=16384
options         MSGSSZ=64
options         MSGTQL=512

Ok, since NetBSD 4.99.35 this could be set through sysctl(8) and the kern.ipc.* variables, but I was not aware of that before writing this post. And besides, it was a good test / experiment to cross-compile a SPARC64-Kernel on an x86-System (I used my primergy-monster to compile the kernel). The kernel was build in about half an hour. Here is the report:

===> Summary of results:
         build.sh command: ./build.sh -m sparc64 -u -N 2 -U   \
                -T obj.sparc64/tooldir -D obj.sparc64/destdir \
                -R obj.sparc64/releasedir -V OBJMACHINE=1 -j8 \
                kernel=NIDHOEGG
         build.sh started: Sun Apr  5 18:16:21 CEST 2009
         NetBSD version:  5.0_RC3
         MACHINE:         sparc64
         MACHINE_ARCH:    sparc64
         Build platform:  NetBSD 4.0 i386
         HOST_SH:         /bin/sh
         TOOLDIR path:    /usr/build/src/obj.sparc64/tooldir
         DESTDIR path:    /usr/build/src/obj.sparc64/destdir
         RELEASEDIR path: /usr/build/src/obj.sparc64/releasedir
         makewrapper:     /usr/build/src/obj.sparc64/tooldir/
                                           bin/nbmake-sparc64
         Updated /usr/build/src/obj.sparc64/tooldir/bin/
                                          nbmake-sparc64
         Building kernel without building new tools
         Building kernel:  NIDHOEGG
         Build directory:  /usr/build/src/sys/arch/sparc64/
                               compile/obj.sparc64/NIDHOEGG
         Kernels built from NIDHOEGG:
          /usr/build/src/sys/arch/sparc64/compile/
                       obj.sparc64/NIDHOEGG/netbsd
         build.sh ended:   Sun Apr  5 18:42:30 CEST 2009
===> .

Of course the building of the toolchain is not included in this report, it's solely the kernel. I forgot to log how long that took, but from experiences with earlier NetBSD 5.0 builds (though for x86-platform), I can say that is usually takes about an hour.

The compilation went through without incident, and the kernel booted like a charm.

Squid was started and after the adzap-issue was identified and the adzap-call was removed from squid's configuration, all worked fine.

Conclusion

All in all it was a smooth process. The small problems that came up were quickly solved, some of them were caused by me being tired and inattentive, others because I had to learn that things do not always work as I wished them to (e.g. booting from FFSv2 and squid-3.0/adzap).

Since sunday, nidhoegg is running stable and fast, and yes, while copying files between disks, I do really think that wapbl's performance-improvements are recognizable. I copied several large files, and though I did not have measured the time, it felt way faster than on FFSv1, which was used on nidhoegg while running NetBSD 3.1RC2.

For all interest in the dmesg-output, read nidhoegg's page.

Hardening Sendmail - supplement

Tuesday, March 31st, 2009

In my last post I wrote about hardening sendmail against DDoS-Attacks. As someone has pointed out to me, I have missed an important option:

define(`confMAX_DAEMON_CHILDREN', `count')dnl

This option defines the maximum number of sendmail-processes allowed, before sendmail start rejecting incoming connections with a temporary error.

count should be chosen with great care. I recommend to check the average number of sendmail-processes per second on a "standard" day, and triple that number. This way, you ensure that even in peak-times you will have a high enough limit, but in case of real trouble the number of processes won't explode, and the machine will remain operable.

Example: If you have an average of 20 concurrent sendmail-processes, set count to 60.  I would never recommend a value below 30, though.

Hardening Sendmail against DDoS

Sunday, March 29th, 2009

For some time now, I was experiencing a strange behavior of my server: from time to time, without an ascertainable pattern, the server would stop reacting to network-requests. The teamspeak-server, which runs on it, would kick anyone connected to it, and nothing particular special could be found in the logs. When this happened last Thursday, and I was kicked out of the teamspeak-server myself, I tried to ssh onto my server - which took about 30 seconds. This was irritating, and I run "uptime" to check the server's load - it was way beyond 70. The next thing was a call to top, and here I saw the culprit: sendmail. A call to ps verified the sendmail was running with way too many processes, all in "RCPT TO:" state or something similar. I stopped sendmail and killed remaining processes manually, so that I could work again in real time. Looking through maillog, I began to understand what going on: spammers were DDoSing my mail-server. Though I already had some settings in my sendmail.mc that would make the server unattractive for spammers, they were obviously not sufficient, and especially not against DDoS-attacks. So I changed my configuration a bit.

Connection Controlling

FEATURE(`access_db')dnl
FEATURE(`delay_checks', `friend')dnl
FEATURE(`ratecontrol', `nodelay',`terminate')dnl
FEATURE(`conncontrol', `nodelay',`terminate')dnl

After the already existing line "FEATURE(`access_db')dnl" I added the lines to enable rate and connection-controlling.

The option "nodelay" is important, because I am using the delay-checks-feature, and these checks are not to be delayed.

The "terminate"-option tells sendmail to kill all connections exceeding the later defined limits with a temporary error-message. The properly configured and standard-compliant smtp-client will try again later, spammers usually don't.

The rate-control feature enables control over how often a single host is allowed to connect per a defined window. It was introduce by sendmail version 8.13.0, and uses the access.db for defining the limits for single hosts, groups of hosts or all hosts.

The window-size is defined using the this option:

define(`confCONNECTION_RATE_WINDOW_SIZE',`window')dnl

with a default-value for window of "60s" (60 seconds).

My access.db-entries for the rate-control-feature look like this:

ClientRate:localhost                    0
ClientRate:localhost.localdomain        0
ClientRate:127.0.0.1                    0
ClientRate:                             5

The first three lines tell sendmail to ingore rate-limits for the localhost, and the last line imposes a limit of 5 connections per window for all hosts.

In cases of a DDoS, it might not be sufficient to limit the connections of a single host per minute, because a DDoS comes from multiple hosts at the same time. This is why Sendmail come with another option:

define(`confCONNECTION_RATE_THROTTLE', `5')dnl

This defines the overall number of concurrent connection the server accepts per second, before queuing incoming connection-request regardless of the host. The connections will not be rejected but stalled until the next second. This means that for the above example that when 20 connection-requests arrive, the first five (1-5) are processed in second one, the second five (6-10) in second two, the third five (11-15) in second three, and the final five (16-20) in second four.

The conn-control feature enables control over the number of concurrent connections a single host is allowed to run simultaneously. Like rate-control, this feature was introduced with Sendmail 8.13, and the access.db is used to define settings for single-hosts, host-ranges and "all hosts", too.

My access.db-entries for the conn-control-feature look like this:

ClientConn:localhost                    0
ClientConn:localhost.localdomain        0
ClientConn:127.0.0.1                    0
ClientConn:                             3

The entries are read similar to rate-control. The last line defines a default of 3 concurrent connections, the first three disable the feature for localhost.

Greeting Pause

A common technique of spammers, trojans and viruses is the so-called slamming. The SMTP-Standard requires the client to wait with the HELO/EHLO-Command until the server has sent its greeting line. Slamming is to ignore this, and to start sending immediately.

FEATURE(`greet_pause', `2000')dnl

With the above feature, Sendmail can be configured to delay the sending of this greeting. The value is in milliseconds, so in the example above, the greeting-pause would be two seconds. A client issuing the HELO/EHLO during this pause will cause Sendmail to answer with

554 smtp.nifelhei.info not accepting messages

and the greeting will not be send. Sendmail will log such attempts with a message like

rejecting messages from <host> due to pre-greeting traffic.

and terminate the connection.

You can use the access.db again to define host-specific greeting-pause times, or to exclude certain hosts from the pause. The following example would exclude localhost from the delay. You can use this to whitelist smtp-servers who do slamming but are otherwise "friendly".

GreetPause:localhost                    0
GreetPause:localhost.localdomain        0
GreetPause:127.0.0.1                    0

Please note:

RFC 2821 specifies 5 minutes as the maximum timeout for the initial connection greeting. Therefore, if you specify a time longer than 300000 milliseconds (i.e. 5 minutes), sendmail will not wait longer than 5 minutes, to maintain RFC compliance.

Recipient-Controlling

After setting up the controlling mechanisms for incoming connections, there is a another level of control that can be applied. Many spammers try to send a single mail with hundreds of recipients. This is also known as "recipient flooding". Sendmail can be configured to limit the number of recipients a message may have, as well throttling down all those clients who try to add more recipient than a certain threshold by pausing a hardcoded full second between each accepted recipient. The options are as follows:

define(`confBAD_RCPT_THROTTLE', `2')dnl
define(`confMAX_RCPTS_PER_MESSAGE', `25')dnl

BAD_RCPT_THROTTLE sets the threshold which invokes the one-second-delay. For the example above this means that with the third RCPT TO: sendmail will pause one full second, before sending the response.

MAX_RCPTS_PER_MESSAGE limits the absolute maximum number of recipients for each message to the value given (25 for the above example). Every RCPT TO: exceeding this number will be rejected with an appropriate message. The standard-compliant server will collect the rejected RCPT TOs and requeue the message for all yet outstanding recipients. (Yes, spammers won't.)

Timeouts

Sendmail, in order to get as many as possible mails through, has very generous timeout-defaults. These values are often measured in days, where today seconds or minutes would suffice. Long timeouts mean long bound resources for probably unsolicited connections. I have defined much shorter values for several timeouts:

define(`confTO_INITIAL', `30s')dnl
define(`confTO_CONNECT', `30s')dnl
define(`confTO_ACONNECT', `1m')dnl
define(`confTO_ICONNECT', `30s')dnl
define(`confTO_HELO', `30s')dnl
define(`confTO_MAIL', `30s')dnl
define(`confTO_RCPT', `30s')dnl
define(`confTO_DATAINIT', `1m')dnl
define(`confTO_DATABLOCK', `1m')dnl
define(`confTO_DATAFINAL', `1m')dnl
define(`confTO_RSET', `30s')dnl
define(`confTO_QUIT', `30s')dnl
define(`confTO_MISC', `30s')dnl
define(`confTO_COMMAND', `30s')dnl
define(`confTO_CONTROL', `30s')dnl
define(`confTO_LHLO', `30s')dnl
define(`confTO_AUTH', `30s')dnl
define(`confTO_STARTTLS', `30s')dnl

I won't go into much detail about each timeout, because that would be beyond the scope of this posting, but these values are much more reasonable than the defaults.

Other means of protecting your server agains spammers:

TCPWrappers

Besides everything sendmail can be configured to do and not to do, sendmail has another advantage: It can be compiled to use TCP Wrapper.

While scanning the logs for the causes of the astronomous load, I noticed millions of attempts from hosts of dial-in providers, which usually strongly indicates spam-bot afflicted private hosts.

I have added theses networks to my /etc/hosts.deny file, with the effect that the number of connections to the server was reduced almost immediately. While one might ask for the wisdom of blocking whole networks, think about this: by what necessity does a private dial-in host have to have its own smtp-server attempting to connect to your smtp-server? Usually, a private person can use the mail-server of his/her provider,  and that one won't be blocked, because I am blocking the dial-in-subnets specifically.

Here is the current (March 29, 2009) list of blocked networks:

sendmail: .adsl.alicedsl.de
sendmail: .tukw.qwest.net
sendmail: .internetdsl.tpnet.pl
sendmail: .dynamicIP.rima-tde.net
sendmail: .staticIP.rima-tde.net
sendmail: .home.otenet.gr
sendmail: .pppoe.mtu-net.ru
sendmail: .static.link.com.eg
sendmail: .adsl-1.sezampro.yu
sendmail: .speedy.telkom.net.id
sendmail: .pool.ukrtel.net
sendmail: .taiwanmobile.net
sendmail: .veloxzone.com.br
sendmail: .bielskpodlaski.mm.pl
sendmail: .bb-static.vsnl.net.in
sendmail: .dynamic.163data.com.cn
sendmail: .vsnl.net.in
sendmail: .adsl.tpnet.pl
sendmail: .airtelbroadband.in
sendmail: .ip.adsl.hu
sendmail: .tktelekom.pl
sendmail: .radiocom.ro
sendmail: .static.asianet.co.th
sendmail: .static.versatel.nl
sendmail: .dsl.telesp.net.br
sendmail: .cable.telstraclear.net
sendmail: .bb.netvision.net.il
sendmail: .ip.fastwebnet.it
sendmail: .pppoe.avangarddsl.ru
sendmail: .adsl.proxad.net
sendmail: .adsl.sta.mcn.ru
sendmail: .adsl.paltel.net
sendmail: .iam.net.ma
sendmail: .mobile.playmobile.pl
sendmail: .broadband3.iol.cz
sendmail: .business.telecomitalia.it
sendmail: .sonora.tx.cebridge.net
sendmail: .3g.claro.net.br
sendmail: .wi.res.rr.com
sendmail: .mtnl.net.in
sendmail: .static.gvt.net.br
sendmail: .dynamic.orange.es
sendmail: .ttnet.net.tr
sendmail: .ip.cybergrota.com.pl
sendmail: .static.user.ono.com
sendmail: .dsl.brasiltelecom.net.br
sendmail: .bk21-dsl.surnet.cl

Conclusion

After changing the configuration using the above described possibilities, the load of the sever decreased enormously, and there are far less sendmail-processes now running at the same time, thus binding far less resources. DDoS-spam-attacks are still not impossible, but they will have a harder time to get the machine down now. :)

New Server - Update (3)

Tuesday, January 6th, 2009

Work on the new server has been postponed until further notice. As it turned out, the 72GB harddisks seem to be a bit broken, at least according to the performance they show. Unless I have a solution for this problem (most likely two new harddisks, which is a little bit expensive solution) this project has come to halt.

New Server - Update (2)

Sunday, January 4th, 2009

Finally it works! The problem lay in the compaq's dual-processor-capability. Because of it, it required NetBSD's GENERIC_MP-Kernel, even when only one CPU is installed. Though this seems strange on the first glance, it's quite logical, because the MP-Kernel contains drivers for multi-processor-environment related hardware, which is always present, independent of how many processors are actually installed. The standard GENERIC-Kernel, of course, does not support such components, and thus it froze.

Now, pkgsrc is bootstrapped and the packages (sendmail, lighttpd, etc.)  are being installed.