Debugging memoey leak on Windows 10

For a while now, my Windows PC will suddenly go OOM after 3-weeks of usage. But whenever I was inspecting the Task Manager, I found no clue as to why the system will go OOM despite plenty of available memory left. Today, I caught my system “red-handed” again, and this time, the data on the Task Manager does not add up.

96% memory usage despite the process memory figures do not add up.

RamMap result – occupied by Nonpaged Pool. After reading some materials from the Internet, this seems to be related to memory leak.

PoolMon diagnostic, lock on the suspect – dxgmms1.sys. (Bytes gradually increasing, and was not being released at all)

Used command: .\poolmon.exe -p -d /g “C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\triage\pooltag.txt”

(Note: You could skip the installation of Visual Studio and only download the WDK if you only need this tool – https://learn.microsoft.com/en-us/windows-hardware/drivers/other-wdk-downloads#step-2-install-the-wdk)

Checked the file, totally legit Microsoft-signed, DirectX related driver file. Unfortunately, I do not have a good way to further locate the possible mitigation. For now, I have updated my graphics card driver and upgraded Windows 10 to the latest 22H2, and see if anything gets improved.

Ref:
https://blog.csdn.net/weixin_40188600/article/details/82853017
https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/finding-a-memory-leak

Following the takeover of LTT channel, it is time to revisit session cookies

Google article: Phishing campaign targets YouTube creators with cookie theft malware

With 2-factor authentication becoming more and more main-stream nowadays, attackers turn back to the ancient way – tricking people into downloading malware, thus getting the cookies information stored on their local computer, and being able to log in to their active accounts using those session cookies.

It seems that a qualified security software on a Windows system is still a must, and isn’t leaving any day soon.

靜候事故報告的阿里雲多區可用性疑惑

阿里雲故障公告:https://www.alibabacloud.com/zh/notice/repair1218

一般來說,一個地域的一個可用區故障不可怕,然而阿里雲本次香港 C 區帶來的後果卻更像一個地域故障。中控基本不可用,自己在 C 區的 ECS 資源經過了 10+ 小時未恢復。多家報障的大型客戶理應有多可用區的設定,卻也未能恢復服務。實在是令人質疑阿里雲的多可用區設計。

更新:事後複盤報告 – https://www.alibabacloud.com/zh/notice/066572

Just a very good video to learn about the foundations of Cloud Spanner:

https://www.youtube.com/watch?v=QPpSzxs_8bc

隨記

久違的看了一眼服務器的監控圖表,發現連接數異常的多(對比網站流量來說),奇怪的打開了 netstat / tcpdump,一臉 SYN_RECV。

雖然不至於造成 SYN FLOOD,直接把來源 IP 段 BAN 了了事。(然後換成一堆 AWS 的 IP 段發過來了,好傢伙…)

新部署:

iptables -A INPUT -p tcp --dport 80 -m state --state NEW -m recent --update --seconds {value} --hitcount {value} --name "syn-fw" -j DROP
iptables -A INPUT -p tcp --dport 80 -m state --state NEW -m recent --set --name "syn-fw"
iptables -A INPUT -p tcp --dport 80 -m state --state NEW -m recent --update --seconds {value} --hitcount {value} --name "syn-fw" -j LOG --log-prefix "[syn-fw] " --log-level 4


Refs:
https://serverfault.com/a/1033162
http://www.snowman.net/projects/ipt_recent/

  • Bonus:Fail2ban filter

  • [INCLUDES]
    before = common.conf


    [Definition]
    _daemon = kernel
    failregex = ^%(__prefix_line)s\[syn-fw\].*SRC=<HOST> DST=.*$
    ignoreregex =

    The new era

    # nginx -V
    nginx version: nginx/1.21.3
    built by gcc 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
    built with OpenSSL 3.0.0 7 sep 2021
    TLS SNI support enabled
    configure arguments: --with-openssl=.../openssl-3.0.0 --with-openssl-opt='enable-ec_nistp_64_gcc_128 enable-tls1_3'
    

    Took such a long time to compile though… 😅