commit 4c6e3623d8e1dafccb08c7ea80e1a1fbcb158fc8
parent f92512849b9e2f42d5846baa868aad52f3c12dd6
Author: Andrew Laack <andrew@laack.co>
Date: Tue, 2 Jun 2026 10:35:17 -0500
WIP blog post; tracking searches over time
Diffstat:
4 files changed, 51 insertions(+), 70 deletions(-)
diff --git a/posts/wip/captchas.md b/posts/wip/captchas.md
@@ -1,52 +0,0 @@
-# CAPTCHAs
-
-## Background
-
-I use VPNs most of the time despite concerns their usage may limit my fourth amendment rights [1]. One annoyance of using VPNs is being hit with CAPTCHAs. This isn't an issue when I use my Searxng instance because it doesn't have CAPTCHAs, but all modern, freely available, search engines do.
-
-## Methodology
-
-EDIT: TODO - I used librewolf instead of mullvad browser because of some issues with multi-search engine opening
-
-Each query was sent while connected to a U.S. based ProtonVPN exit node. While the exit nodes changed over time, the exit node was consistent across each search engine on the basis of a given query. To achieve this, I used the Multi engine search Firefox extension [2]. Additionally, I used my browser of choice, that supports Javascript, the Mullvad browser [3] to perform these evaluations. The only changes to the browser were adding the multi-search extension, removing the mullvad browser extension, and adding each of the search engines I was interested in testing as search engines in the browser settings. Finally, I created a "New Identity" after every 5 searches to see evaluate how often CAPTCHAs are shown when users have already completed them within the same "session". I also rotated my IP address every 5 searches by using a different random US-based protonvpn server. I considered resetting after every search, or using my browser naturally, but thought this would be a more consistent way to evalutate CAPTCHA rates.
-
-Alongside tracking CAPTCHA hit rates, I also tracked how long it took to pass through the CAPTCHAs, and the search index of the result I was looking for. This index was the first result that contained a satisfactory answer to my query. I also tracked slop count, which was the count of the top 5 results that were AI slop / SEO spam sites, based on my subjective definitions of both. Since I am using Mullvad browser, which comes with uBlock origin out of the box, advertisements were never (obviously and statedly) displayed in search results.
-
-## Limitations
-
-A few limitations are listed below:
-
-- Mullvad browser
- - I could be treated differently by each of the browsers on this basis
-- Multi-search queries
- - It is possible data is being shared between search engines on the backend, resulting in some search engines showing CAPTCHAs based on searches in other search engines that match the current search. This seems slightly unlikely, but Duckduckgo does primarily use Bing on the backend, so it's possible.
-
-## Description of My Search Habits
-
-All searches are tracked in a md file [4???], but at a high-level, I was working, reading, and other general usage things. I don't think it's fair to say these are my normal habits because there was a non-zero amount of friction added by marking down this data as I used the web, but I didn't conciously make any changes to how I search the web.
-
-## Selected Search Engines
-
-- Google
-- Startpage
-- Brave Search
-- noai.duckduckgo.com
-- Bing
-- Ecosia
-- Qwant
-- Mojeek
-- Yahoo
-
-## Unused Search Engines
-
-- Kagi
- - Kagi lacks a privacy respecting payment method. As such, this is a non-starter for me. If there was a browser that accepted crypto, and had accounts similar to Mullvad, I would consider using it.
-- Searxng
- - Searxng is what I like to use, but it does have some drawbacks. Specifically, if you aren't sharing your Searxng instance with other people, the IP address of yours server will get tied to your identity for tracking, reducing many of the privacy benefits associated with using a VPN. Additionally, since my Searxng instance is hosted on Hetzner, it frequently returns no results due to all upstreams replying with CAPTCHAs. While an interesting concept, I find it breaks down in practice.
-- Perplexity
- - I'll write about this.
-
-[1] - https://www.wired.com/story/using-a-vpn-may-subject-you-to-nsa-spying/
-[2] - https://addons.mozilla.org/en-US/firefox/addon/multi-engine-search/
-[3] - https://mullvad.net/en/browser/
-[4] - TODO
diff --git a/posts/wip/the-big-three-privacy-search-engines.md b/posts/wip/the-big-three-privacy-search-engines.md
@@ -0,0 +1,17 @@
+# The Big Three (Privacy) Search Engines
+
+The big three privacy search engines are 1) Brave Search 2) DuckDuckGo 3) Startpage. These are the most popular privacy search engines because they generally provide sensible enough results, have privacy policies that aren't anti-consumer, and make sensible technical decisions that are verifiable by looking at the requests browsers make to their search engines to minimize tracking.
+
+## Indices
+
+Brave Search uses its own index.
+
+DuckDuckGo primarily uses Bing.
+
+Startpage primarily uses Google.
+
+## Quality Comparison
+
+## IP Rotation and Browser Fingerprint
+
+Every day I switched exit nodes using the protonvpn cli. I also cleared all cached data from librewolf at the end of the day.
diff --git a/python/search-engines/query.py b/python/search-engines/query.py
@@ -6,15 +6,9 @@ import time
# list of search engines
class Engine(Enum):
- GOOGLE = 'google'
STARTPAGE = 'startpage'
BRAVE_SEARCH = 'brave_search'
DDG = 'ddg'
- BING = 'bing'
- ECOSIA = 'ecosia'
- QWANT = 'qwant'
- MOJEEK = 'mojeeek'
- YAHOO = 'yahoo'
class Query:
query_message: str
@@ -25,6 +19,7 @@ class Query:
pow_captcha: bool
answer_index: int
slop_sites_top_5: int
+ unrelated_sites : int
def create_query():
@@ -39,19 +34,14 @@ def create_query():
def ensure_csv():
if not os.path.exists('search.csv') or os.path.getsize('search.csv') == 0:
with open('search.csv', 'w', newline='') as f:
- f.write('query,engine,time,captcha_hit,captcha_time,pow_captcha,slop_sites_top_5,answer_index\n')
+ f.write('query,engine,time,captcha_hit,captcha_time,pow_captcha,slop_sites_top_5,answer_index,unrelated_sites\n')
def write_query(query):
f = open('search.csv', 'a')
csvwriter = csv.writer(f)
- csvwriter.writerow([query.query_message, query.engine.value, query.time, query.captcha_hit, query.captcha_time, query.pow_captcha, query.slop_sites_top_5, query.answer_index])
+ csvwriter.writerow([query.query_message, query.engine.value, query.time, query.captcha_hit, query.captcha_time, query.pow_captcha, query.slop_sites_top_5, query.answer_index, query.unrelated_sites])
f.close()
-def get_csv_row_count():
- f = open('search.csv', 'r')
-
- return len(f.readlines()) - 1
-
ensure_csv()
query = create_query()
@@ -64,7 +54,7 @@ for engine in Engine:
if captcha == 'y':
query.captcha_time = float(input("How many seconds did it take to solve (-1 means failed to solve): "))
query.captcha_hit = True
- query.pow_captcha = input("Was the captcha a PoW patcha? (y/n): ") == "y"
+ query.pow_captcha = input("Was the captcha a PoW captcha? (y/n): ") == "y"
else:
query.pow_captcha = False
query.captcha_hit = False
@@ -72,9 +62,7 @@ for engine in Engine:
query.slop_sites_top_5 = int(input("Slop sites / SEO site count in the top 5: "))
query.answer_index = int(input("First index containing the answer (-1 means no answer on first page of results): "))
+ query.unrelated_sites = int(input("Unrelated site count: "))
+ print()
write_query(query)
-
-
- if get_csv_row_count() % 5 == 0:
- print("Rotate IP and clear fingerprint.")
diff --git a/python/search-engines/search.csv b/python/search-engines/search.csv
@@ -0,0 +1,28 @@
+query,engine,time,captcha_hit,captcha_time,pow_captcha,slop_sites_top_5,answer_index,unrelated_sites
+"fim language models ""open weight"" code completion",startpage,1780355824.6081214,False,0,False,0,2
+"fim language models ""open weight"" code completion",brave_search,1780355824.6081214,False,0,False,0,-1
+"fim language models ""open weight"" code completion",ddg,1780355824.6081214,False,0,False,0,-1
+Granite4.0 code completion model,startpage,1780356056.7660637,False,0,False,0,1
+Granite4.0 code completion model,brave_search,1780356056.7660637,False,0,False,0,1
+Granite4.0 code completion model,ddg,1780356056.7660637,False,0,False,0,1
+mellum fim vs qwen2.5-coder-3b code completion model benchmarks,startpage,1780356578.8307145,False,0,False,0,2
+mellum fim vs qwen2.5-coder-3b code completion model benchmarks,brave_search,1780356578.8307145,False,0,False,1,5
+mellum fim vs qwen2.5-coder-3b code completion model benchmarks,ddg,1780356578.8307145,False,0,False,2,-1
+supermaven / cursor models vs mellum,startpage,1780357180.0816693,False,0,False,1,-1
+supermaven / cursor models vs mellum,brave_search,1780357180.0816693,False,0,False,2,-1
+supermaven / cursor models vs mellum,ddg,1780357180.0816693,False,0,False,4,-1
+code completion models economic impact,startpage,1780357606.6343193,False,0,False,1,1
+code completion models economic impact,brave_search,1780357606.6343193,False,0,False,0,-1
+code completion models economic impact,ddg,1780357606.6343193,False,0,False,2,-1
+tagging markdown files,startpage,1780367477.1498914,False,0,False,1,1
+tagging markdown files,brave_search,1780367477.1498914,False,0,False,2,1
+tagging markdown files,ddg,1780367477.1498914,False,0,False,1,3
+mediawiki markdown syntax,startpage,1780367771.9564865,False,0,False,0,1
+mediawiki markdown syntax,brave_search,1780367771.9564865,False,0,False,0,1
+mediawiki markdown syntax,ddg,1780367771.9564865,False,0,False,0,1
+vimwiki syntax,startpage,1780367936.4018636,False,0,False,0,1
+vimwiki syntax,brave_search,1780367936.4018636,False,0,False,0,1
+vimwiki syntax,ddg,1780367936.4018636,False,0,False,0,1
+wiki.vim,startpage,1780368340.5770388,False,0,False,0,1,2
+wiki.vim,brave_search,1780368340.5770388,False,0,False,0,1,1
+wiki.vim,ddg,1780368340.5770388,False,0,False,1,1,4