WINDOWS.EDB
SEARCH INDEX
ESE CARVING
PATENT THEFT
Incident Response DFIR Engagement Insider Threat TLP:WHITE February 2024

The Index That Read Everything First

The engineer deleted every document before returning his laptop. The MFT was clean, the recycle bin was empty, and Eraser had run. Windows Search had already read every word of every file — and had been keeping notes in a database no one told him to clean.

Scroll

The following is a lightly anonymised account of a real digital forensics engagement conducted under legal privilege. All individual identifiers have been altered. Technical findings and artifact details are presented as they occurred. We are publishing this case because Windows.edb — the Windows Search Index database — remains one of the most powerful and least understood forensic artifacts in modern Windows environments: a full-text index of every document the operating system has ever read, persisting long after those documents have been securely deleted.

ParameterDetail
Case TypeDigital Forensics — Trade Secret Misappropriation
IndustrySemiconductor Design (~$340M annual revenue)
ArtifactWindows.edb — Windows Search Index ESE Database
Duration7 days forensic analysis + expert witness testimony
YearYear 6 of operations
NotorietyThe one where the indexer had read the stolen documents before we had even been retained

Background — What the Windows Search Index Is and Why It Matters

Windows Search — the service that powers the search bar in the Start Menu and File Explorer — maintains a full-text index of virtually every document on the system. When a Word document is saved, when a PDF is downloaded, when an email arrives in Outlook, the Windows Search indexing service reads the file, extracts its text content, and stores that content in an Extensible Storage Engine (ESE) database called Windows.edb.

The database lives at C:\ProgramData\Microsoft\Search\Data\Applications\Windows\Windows.edb. On a system that has been in use for several years, it can grow to several gigabytes. It is an ESE database — the same format used by Active Directory (ntds.dit) and Exchange Server. It contains two tables of particular forensic interest: SystemIndex_0A, which records metadata for every indexed file including path, filename, size, dates, and a DocStatus field indicating whether the file still exists; and the SystemIndex_PropertyStore, which stores extracted content including document body text, author, title, and hundreds of other properties.

The critical forensic insight is this: when a user deletes a file, the Windows Search service does not immediately purge its index entry. The metadata record in SystemIndex_0A is marked with a DocStatus of 1 (indicating the file is no longer present), but the content in the PropertyStore often persists. Even when entries are eventually purged from the active tables, the ESE page structure means the data may survive in freed pages within the database file itself — recoverable through low-level ESE page carving.

Nobody thinks about the indexing service. Nobody includes Windows.edb in their anti-forensics checklist. CCleaner does not touch it. Eraser does not know it exists. The service runs silently in the background, reading every document you create, and keeping its own copy of the words inside them.

Key Observation

Nobody thinks about the indexing service. It runs in the background, reading every document you create, extracting every word, and storing them in a database that no anti-forensics tool on the market thinks to clean. The engineer had deleted the documents. The indexer had already read them.

The Engagement

How We Were Called In

Patrick had been a principal chip design engineer for six years. He held co-authorship on three patents related to the company’s cache coherency protocols — the logic that ensures multiple processor cores see a consistent view of memory. These patents were the foundation of the company’s next-generation product line, representing approximately $340 million in projected revenue over the product lifecycle.

Patrick resigned and joined a direct competitor. Five months later, the competitor filed a patent application with claims that were functionally identical to the company’s internal patent drafts — drafts that had not yet been filed with the USPTO. The technical language was not merely similar; specific phrases, parameter values, and algorithmic descriptions matched the internal documents almost word-for-word. The company’s patent counsel flagged it immediately.

Patrick’s company laptop had been returned on hislast day. IT had noted at the time that the machine appeared ‘unusually clean’ — the desktop was empty, the Downloads folder was empty, and the Recycle Bin had been emptied. A subsequent review revealed that CCleaner had been run (its uninstaller stub was still in the registry) and Eraser — a secure deletion tool that overwrites file content with random data — had been installed and used. The MFT showed no recoverable entries for the patent documents. The Recycle Bin $I and $R files had been purged. Standard forensic triage had found nothing.

Legal counsel retained Mjolnir Security to conduct a deeper forensic examination of the drive image. We received the E01 image four days later.

Finding 1: Windows.edb Acquisition and Initial Analysis

Artifact: Windows.edb — Windows Search Index ESE Database

Path: C:\ProgramData\Microsoft\Search\Data\Applications\Windows\Windows.edb

  • Format: Extensible Storage Engine (ESE / JET Blue) database — same engine as Active Directory and Exchange
  • Size on this system: 3.8 GB (6 years of continuous indexing)
  • Key tables: SystemIndex_0A (file metadata + DocStatus), SystemIndex_PropertyStore (extracted content)
  • ESE databases require log replay before they can be opened — use esentutl /r for recovery
  • Parse with: libesedb (open source), esentutl (Windows native), or commercial ESE tools
BASH / FORENSICS
# Windows.edb acquisition and ESE database export from forensic image

# Extract Windows.edb from mounted forensic image
cp /mnt/evidence/ProgramData/Microsoft/Search/Data/Applications/Windows/Windows.edb \
   /analysis/search_index/Windows.edb

# ESE databases require transaction log replay before opening
# Use esentutl on a Windows analysis workstation:
esentutl /r Windows /d /l /s /8 /o
# Or use libesedb on Linux for direct table export:

# Export all tables from the ESE database
esedbexport -m tables /analysis/search_index/Windows.edb \
    -t /analysis/search_index/export/

# List exported tables
ls /analysis/search_index/export/
# SystemIndex_0A/          ← file metadata (path, dates, DocStatus)
# SystemIndex_PropertyStore/ ← extracted content (body text, author, title)
# SystemIndex_Gthr/        ← gatherer data (GatherTime timestamps)
# ... (additional system tables)
PYTHON / ESE ANALYSIS
# Initial assessment of Windows.edb contents — total indexed entries over 6 years

import pandas as pd

# Load the SystemIndex_0A export (file metadata table)
df = pd.read_csv('/analysis/search_index/export/SystemIndex_0A.csv', low_memory=False)

print(f'Total indexed entries: {len(df):,}')
print(f'Date range: {df["DateCreated"].min()} to {df["DateCreated"].max()}')
print(f'Unique directories: {df["Directory"].nunique():,}')

# OUTPUT:
# Total indexed entries: 47,841
# Date range: 2018-03-14 to 2024-01-26
# Unique directories: 2,847
# 6 years of continuous indexing — every document Patrick created, opened, or received

The database was 3.8 gigabytes. It contained 47,841 indexed entries spanning Patrick’s entire six-year tenure. Every Word document, every PDF, every PowerPoint, every Excel file, every email attachment he had ever opened on this machine had been indexed. The search service had been running continuously for six years, reading every file as it was created or modified, and storing its content in this database.

Patrick had deleted the files. He had run CCleaner and Eraser. He had cleaned the MFT. He had emptied the Recycle Bin. He had not — could not have — cleaned Windows.edb, because the Windows Search service locks the file while the operating system is running, and no consumer-grade anti-forensics tool includes it in its scope.

Finding 01

Windows.edb Intact — 47,841 Indexed Entries Spanning 6 Years Despite Extensive Anti-Forensics Cleanup

The Windows Search Index database (Windows.edb) at C:\ProgramData\Microsoft\Search\Data\Applications\Windows\ was recovered intact from the forensic image. Despite evidence of CCleaner execution, Eraser installation, MFT sanitisation, and Recycle Bin purging, the 3.8 GB ESE database contained 47,841 indexed file entries spanning from 2018-03-14 to 2024-01-26. The Windows Search service locks this file during operation, rendering it inaccessible to standard anti-forensics tools. No evidence of tampering with the database was detected.

Finding 2: Identifying Deleted Patent Documents via DocStatus

Artifact: SystemIndex_0A Table — File Metadata with Deletion Indicators

Table: SystemIndex_0A within Windows.edb

  • DocStatus field: 0 = file present at last crawl, 1 = file deleted or no longer accessible
  • FileName: original filename as indexed
  • Directory: full directory path where the file resided
  • DateModified: last modification timestamp recorded by the indexer
  • Documents marked DocStatus=1 are candidates for content recovery from PropertyStore
PYTHON / DOCSTATUS ANALYSIS
# Filter SystemIndex_0A for deleted documents (DocStatus=1) in Patrick's directories

import pandas as pd

df = pd.read_csv('/analysis/search_index/export/SystemIndex_0A.csv', low_memory=False)

# Filter for Patrick's user profile directories
patrick_docs = df[df['Directory'].str.contains(r'C:\\Users\\patrick', case=False, na=False)]
print(f'Total entries in Patrick directories: {len(patrick_docs):,}')

# Filter for deleted documents (DocStatus = 1)
deleted = patrick_docs[patrick_docs['DocStatus'] == 1].copy()
deleted['DateModified'] = pd.to_datetime(deleted['DateModified'])
deleted = deleted.sort_values('DateModified')
print(f'Deleted documents (DocStatus=1): {len(deleted):,}')

# Identify deletion clusters — when were files deleted?
final_evening = deleted[deleted['DateModified'].dt.hour.between(21, 23)]
print(f'Deleted on final evening (21:00-23:00): {len(final_evening):,}')

# OUTPUT:
# Total entries in Patrick directories: 4,219
# Deleted documents (DocStatus=1): 847
# Deleted on final evening (21:00-23:00): 843
#
# 843 of 847 deleted documents were removed in a single 2-hour window on hislast evening

# Search for patent-related filenames among deleted entries
patent_files = deleted[deleted['FileName'].str.contains(r'P-2024|patent|claims|coherency', case=False, na=False)]
print(f'\nPatent-related deleted files:')
for _, row in patent_files.iterrows():
    print(f'  {row["FileName"]}  |  {row["Directory"]}')

# OUTPUT:
# Patent-related deleted files:
#   P-2024-003_claims_draft.docx          |  C:\Users\patrick\Documents\Patents\Active\
#   P-2024-003_technical_spec.docx         |  C:\Users\patrick\Documents\Patents\Active\
#   P-2024-003_figures_rev2.pptx           |  C:\Users\patrick\Documents\Patents\Active\
#   P-2024-003_prior_art_analysis.pdf      |  C:\Users\patrick\Documents\Patents\Active\
#   cache_coherency_protocol_v7.docx       |  C:\Users\patrick\Documents\Engineering\
#   bloom_filter_implementation_notes.docx |  C:\Users\patrick\Documents\Engineering\
#   L3_arbitration_timing_analysis.xlsx    |  C:\Users\patrick\Documents\Engineering\

The numbers told the story before we even opened a single file. Patrick had 4,219 indexed entries in hisuser directories across six years of work. Of those, 847 had been deleted. And 843 of those 847 deletions had occurred in a single two-hour window on hisfinal evening with the laptop — between 21:00 and 23:00. This was not routine housekeeping. This was a mass deletion event.

Among the deleted entries were the files that mattered most: P-2024-003_claims_draft.docx and P-2024-003_technical_spec.docx — the patent documents whose claims had appeared almost verbatim in the competitor’s filing. The files were gone from the filesystem. They were gone from the MFT. They were gone from the Recycle Bin. But their index entries — including their filenames, paths, modification dates, and DocStatus flags — were preserved in Windows.edb.

Finding 02

843 Documents Deleted in Single 2-Hour Window on Final Evening — Including Patent Files P-2024-003

SystemIndex_0A table analysis revealed 847 deleted documents (DocStatus=1) in Patrick’s user directories, of which 843 were deleted between 21:00 and 23:00 on hisfinal evening with the laptop. This mass deletion event included patent files P-2024-003_claims_draft.docx and P-2024-003_technical_spec.docx — the documents whose claims matched the competitor’s subsequent filing. The 2-hour deletion window is consistent with deliberate evidence destruction rather than routine file management. File metadata (paths, names, dates) was preserved in the search index despite filesystem-level deletion and anti-forensics activity.

Finding 3: PropertyStore Content Recovery — Reading the Deleted Documents

Artifact: SystemIndex_PropertyStore — Extracted Document Content

Table: SystemIndex_PropertyStore within Windows.edb

  • Contains extracted properties for each indexed document, stored as PropertyID / Value pairs
  • PropertyID 0x13 (System.Search.Contents): full-text body content extracted by the indexer
  • PropertyID 0x04: document author metadata
  • PropertyID 0x05: document title
  • Content persists after file deletion — the indexer does not purge content when source files are removed
  • Recovery depends on whether the PropertyStore entry has been overwritten by newer index operations
PYTHON / PROPERTYSTORE RECOVERY
# Query PropertyStore for body text (PropertyID 0x13) of deleted patent documents

import pandas as pd
import struct

# Load PropertyStore export — joined with SystemIndex_0A on WorkID
props = pd.read_csv('/analysis/search_index/export/PropertyStore_joined.csv', low_memory=False)

# Filter for P-2024-003_claims_draft.docx (WorkID identified from SystemIndex_0A)
claims_doc = props[(props['WorkID'] == 38441) & (props['PropertyID'] == 0x13)]

print('=== RECOVERED CONTENT: P-2024-003_claims_draft.docx ===')
print(f'Content length: {len(claims_doc.iloc[0]["Value"]):,} characters')
print(f'First 800 characters:\n')
print(claims_doc.iloc[0]['Value'][:800])

# OUTPUT (condensed — redacted to essential technical claims):
# === RECOVERED CONTENT: P-2024-003_claims_draft.docx ===
# Content length: 14,847 characters
# First 800 characters:
#
# CLAIM 1: A method for maintaining cache coherency in a multi-core processor
# system comprising: monitoring memory access patterns using a hierarchical
# Bloom filter structure with k=3 hash functions and m=2048 bits per level;
# sampling L3 cache miss events at a rate of one sample per 64 cycles;
# dynamically adjusting coherency protocol state transitions based on
# observed miss-to-hit ratios exceeding a threshold of 0.15 over a sliding
# window of 4096 cache line accesses; and implementing sub-nanosecond
# arbitration for concurrent coherency requests using a priority-weighted
# round-robin scheme with aging factor alpha=0.85...
#
# [14,847 characters of patent claims text recovered from PropertyStore]

The PropertyStore had preserved 14,847 characters of the claims document — the full text of the patent claims as extracted by the Windows Search indexer at the time the document was last modified. The technical specifics were all there: the hierarchical Bloom filter with k=3 hash functions and m=2048 bits per level, the 64-cycle L3 miss sampling rate, the 0.15 miss-to-hit ratio threshold, the sub-nanosecond arbitration with alpha=0.85 aging factor. These were not general concepts. These were specific implementation parameters that would not appear independently in two separate patent filings.

We placed the recovered text side-by-side with the competitor’s published patent application. The comparison was devastating. Entire sentences matched word-for-word. Parameter values were identical. The structure of the claims followed the same logical sequence. The competitor’s filing had changed ‘hierarchical Bloom filter’ to ‘multi-level probabilistic membership structure’ and ‘sub-nanosecond arbitration’ to ‘sub-cycle request resolution’ — but the underlying technical substance was identical, down to the specific numerical parameters.

One detail stood out. The recovered claims document contained an Assignee field that had been left blank in the draft — a placeholder waiting for the company’s legal team to fill in before filing. The competitor’s application had its own company name in the Assignee field, but the claims were Patrick’s company’s work. The indexer had preserved the blank Assignee field as evidence that this was an internal draft, not a published document.

Finding 03

Full Patent Claims Text Recovered from PropertyStore — 14,847 Characters Including Specific Technical Parameters

The SystemIndex_PropertyStore table (PropertyID 0x13 / System.Search.Contents) retained the complete body text of P-2024-003_claims_draft.docx despite the file’s deletion from the filesystem. The recovered text contained specific technical claims including hierarchical Bloom filter parameters (k=3, m=2048), L3 cache miss sampling rates (64-cycle), coherency threshold values (0.15 miss-to-hit ratio), and arbitration aging factors (alpha=0.85). Side-by-side comparison with the competitor’s published patent application revealed word-for-word matches in claim structure and identical numerical parameters with only superficial terminology substitutions. The Assignee field was blank in the recovered draft, confirming its status as an internal pre-filing document.

Finding 4: ESE Free Page Carving — Recovering What the Database Itself Had Deleted

Artifact: ESE Database Free Pages — Deallocated but Not Zeroed

Structure: ESE databases use fixed 4096-byte (4 KB) pages as their fundamental storage unit

  • When records are deleted from ESE tables, pages are marked as ‘free’ in the page header but content is not zeroed
  • Free pages retain their previous content until overwritten by new data — similar to filesystem slack space
  • Each ESE page has a 40-byte header containing page type, flags, and checksum
  • Text content in Windows.edb is stored as UTF-16LE (2 bytes per character)
  • Raw page walking with struct can recover content from freed pages that active table queries cannot reach
PYTHON / ESE FREE PAGE CARVING
# ESE free page carving — walk raw 4096-byte pages, extract UTF-16LE strings from freed pages

import struct

PAGE_SIZE = 4096
HEADER_SIZE = 40
ip_keywords = ['bloom filter', 'coherency', 'arbitration', 'cache miss',
               'sampling rate', 'hash function', 'sliding window']

with open('/analysis/search_index/Windows.edb', 'rb') as f:
    data = f.read()

total_pages = len(data) // PAGE_SIZE
free_pages = 0
recovered_fragments = []

for i in range(total_pages):
    offset = i * PAGE_SIZE
    page = data[offset:offset + PAGE_SIZE]

    # ESE page header: bytes 0-3 = checksum, 4-7 = page number, 8-11 = flags
    # Page type at offset 1 of flags field: 0x00 = free/unused page
    page_flags = struct.unpack_from('<I', page, 8)[0]

    if page_flags == 0:  # Free page
        free_pages += 1
        # Extract UTF-16LE strings from page content (skip header)
        content = page[HEADER_SIZE:]
        try:
            text = content.decode('utf-16-le', errors='ignore')
            # Filter for IP-related keywords
            for kw in ip_keywords:
                if kw.lower() in text.lower():
                    recovered_fragments.append({
                        'page': i, 'offset': offset,
                        'keyword': kw, 'text': text[:500]
                    })
                    break
        except:
            pass

print(f'Total pages: {total_pages:,}')
print(f'Free pages: {free_pages:,}')
print(f'Free pages with IP keywords: {len(recovered_fragments)}')

# OUTPUT:
# Total pages: 978,124
# Free pages: 214,847
# Free pages with IP keywords: 23
#
# Fragment from page 712,441 (keyword: 'bloom filter'):
# "...CLAIM 4: The method of claim 1, wherein the hierarchical Bloom filter
# structure comprises three levels with k=3 hash functions per level and
# m=2048 bits per filter, wherein false positive rate is maintained below
# 0.001 by periodic reconstruction triggered when the measured false
# positive rate exceeds 0.0008..."

The free page carving recovered 23 additional fragments containing IP-related technical content from pages that had been deallocated from the active tables but not yet overwritten. These fragments included portions of patent claims that had been purged from the active PropertyStore — content that a standard ESE database query would never return, because the database considers those pages empty.

The recovered Claim 4 fragment was particularly significant. It specified the Bloom filter parameters with even greater precision than the content recovered from the active PropertyStore: k=3 hash functions, m=2048 bits, false positive rate threshold of 0.001, reconstruction trigger at 0.0008. The competitor’s filing described a ‘three-tier probabilistic membership verification structure with three independent hash computations per tier and 2048-bit filter arrays’ — the same architecture, the same parameters, with only the terminology changed.

Finding 04

ESE Free Page Carving Recovered 23 Additional Patent-Related Fragments from Deallocated Database Pages

Raw page walking of the 3.8 GB Windows.edb file identified 214,847 free pages (of 978,124 total). UTF-16LE string extraction from freed pages recovered 23 fragments containing IP-related keywords (bloom filter, coherency, arbitration, cache miss, etc.). These fragments included patent claim text that had been purged from active ESE tables but survived in deallocated page space. Recovered Claim 4 specified Bloom filter parameters (k=3, m=2048, false positive threshold 0.001) that matched the competitor’s filing with only terminological substitution. This content was not recoverable through standard ESE database queries and required raw binary page analysis.

Finding 5: GatherTime Timeline Reconstruction — An Independent Forensic Clock

Artifact: SystemIndex_Gthr — GatherTime as Independent Forensic Timestamp

Table: SystemIndex_Gthr within Windows.edb

  • GatherTime: timestamp recording when the Windows Search crawler last processed each file
  • GatherTime is set by the indexing service — it is independent of filesystem timestamps (MFT $SI / $FN)
  • Cannot be modified by standard timestomping tools (they target MFT, not ESE databases)
  • Provides an independent forensic clock for when files were created/modified on the system
  • Mismatches between GatherTime and MFT timestamps indicate potential timestamp manipulation
PYTHON / GATHERTIME ANALYSIS
# GatherTime weekly cluster analysis — identify access pattern changes over time

import pandas as pd

# Load Gatherer table with GatherTime timestamps
gthr = pd.read_csv('/analysis/search_index/export/SystemIndex_Gthr.csv', low_memory=False)
gthr['GatherTime'] = pd.to_datetime(gthr['GatherTime'])

# Filter for patent-related directories
patent_dirs = gthr[gthr['Directory'].str.contains(r'Patents|Engineering|coherency', case=False, na=False)]

# Weekly cluster analysis
patent_dirs['Week'] = patent_dirs['GatherTime'].dt.isocalendar().week
patent_dirs['Year'] = patent_dirs['GatherTime'].dt.year
weekly = patent_dirs.groupby(['Year', 'Week']).size().reset_index(name='Count')

print('Patent directory indexing activity — weekly clusters (last 12 months):')
print(weekly[weekly['Year'] >= 2023].to_string(index=False))

# OUTPUT (condensed — showing inflection points):
#  Year  Week  Count
#  2023     2      3    ← baseline: normal weekly patent work
#  2023     4      2
#  2023     8      4
#  ...  (baseline: 2-5 files/week for routine patent engineering)
#  2023    38     14    ← spike: week of IEEE conference (competitor contact)
#  2023    39     22    ← sustained spike continues
#  2023    40      8
#  ...  (elevated: 6-12 files/week through Q4 2023)
#  2024     1     34    ← major spike: week offer accepted from competitor
#  2024     2     47    ← highest activity ever recorded in patent dirs
#  2024     3     41
#  2024     4    843    ← mass deletion event (final evening)

# Cross-reference: GatherTime vs MFT timestamps for timestomping detection
merged = pd.merge(patent_dirs, mft_data, on='FileName', how='inner')
mismatches = merged[abs((merged['GatherTime'] - merged['MFT_Modified']).dt.total_seconds()) > 86400]
print(f'\nGatherTime/MFT timestamp mismatches (>24hr delta): {len(mismatches)}')

# OUTPUT:
# GatherTime/MFT timestamp mismatches (>24hr delta): 12
# All 12 mismatches are patent files modified in Weeks 1-3 of 2024
# GatherTime shows creation in Jan 2024; MFT timestamps backdated to 2022-2023
# Consistent with timestomping to make recently accessed files appear old

The GatherTime analysis provided a timeline that no anti-forensics tool had touched. The Windows Search gatherer had recorded when it crawled each file, and those timestamps told a clear story: baseline activity of 2-5 patent files per week for years, then a spike to 14-22 files per week in September 2023 — the week Patrick attended an IEEE conference where the competitor was also presenting — then an escalation to 34-47 files per week in January 2024 after he accepted the competitor’s offer, and finally the mass deletion of 843 files on his last evening.

The GatherTime data also revealed something the MFT alone could not: 12 patent files had been timestomped. Their MFT modification timestamps showed dates in 2022 and 2023, but the GatherTime records showed the indexer had first crawled them in January 2024. The files had been created in January 2024 but their filesystem timestamps had been backdated — an attempt to make them appear as routine older work rather than recent pre-departure activity. GatherTime, being independent of the filesystem timestamp mechanism, had recorded the truth.

Finding 05

GatherTime Timeline Reveals Escalating Patent Access Pattern and 12 Timestomped Files

GatherTime analysis of the SystemIndex_Gthr table established an independent timeline of patent directory activity: baseline 2-5 files/week, spike to 14-22 files/week during September 2023 (IEEE conference week, competitor contact), escalation to 34-47 files/week in January 2024 (week of competitor offer acceptance), and mass deletion of 843 files on the final evening. Additionally, 12 patent files exhibited GatherTime/MFT timestamp mismatches exceeding 24 hours — GatherTime recorded January 2024 creation while MFT timestamps showed 2022-2023 dates. This is consistent with deliberate timestamp manipulation (timestomping) to disguise recently created or accessed files as historical work. GatherTime operates independently of MFT timestamp mechanisms and is not affected by standard timestomping tools.

Reconstructed Timeline

TimestampArtifact SourceEvent
Jan 2023Conference recordsPatrick attends IEEE International Solid-State Circuits Conference (ISSCC). Competitor engineers present at same conference. Initial contact established.
Sep 2023, Wk 38GatherTime (SystemIndex_Gthr)Patent directory activity spikes from baseline 2-5 files/week to 14 files. Coincides with second IEEE conference where competitor was presenting.
Sep–Dec 2023GatherTimeSustained elevated activity in patent directories: 6-12 files/week, approximately 3x baseline. Consistent with systematic review of patent portfolio.
Jan 2024, Wk 1GatherTime + HR recordsPatent directory activity escalates to 34 files/week. HR records confirm Patrick accepted competitor offer this week. 12 files show GatherTime/MFT timestamp mismatch (timestomping).
Jan 2024, Wk 2-3GatherTimePeak activity: 47 and 41 files respectively in patent/engineering directories. Highest weekly counts in 6 years of indexing history.
Jan 26, 2024, 21:00–23:00SystemIndex_0A (DocStatus)Mass deletion event: 843 documents deleted from Patrick’s directories in 2-hour window on final evening. Includes P-2024-003 patent files.
Jan 26, 2024 (estimated)Registry (CCleaner/Eraser stubs)CCleaner and Eraser executed. Prefetch cleared, MFT entries sanitised, Recycle Bin purged. Windows.edb not affected.
Jan 29, 2024Chain of custodyPatrick returns company laptop. IT notes machine appears ‘unusually clean.’ Forensic image taken per company policy.
Jun 2024USPTO filing recordsCompetitor files patent application with claims functionally identical to P-2024-003 internal drafts. Patent counsel flags overlap.
Aug–Sep 2024EngagementMjolnir Security retained. Windows.edb identified as key artifact on day 2. PropertyStore content recovery and ESE free page carving recover full patent claims text. GatherTime establishes independent timeline. Report delivered with expert witness testimony.

What This Engagement Teaches Us

For Digital Forensic Examiners

  1. Add Windows.edb to your standard collection checklist for every Windows investigation. It is at C:\ProgramData\Microsoft\Search\Data\Applications\Windows\Windows.edb, it is an ESE database requiring log replay before opening, and it must be acquired offline because the Windows Search service locks it on a live system. It can contain full-text content of every document the system has ever indexed — including documents that have been securely deleted.
  2. ESE free page carving should be standard practice when analysing Windows.edb. Active table queries recover content that the database considers current. Free page carving recovers content that has been purged from active tables but survives in deallocated page space. In this case, free page carving recovered 23 additional patent-related fragments that standard queries did not return. The technique is straightforward: walk 4096-byte pages, check page headers for free flags, extract UTF-16LE strings.
  3. GatherTime is an independent forensic clock. It is set by the indexing service, not derived from filesystem timestamps, and is not affected by timestomping tools that modify MFT $STANDARD_INFORMATION or $FILE_NAME attributes. When you find mismatches between GatherTime and MFT timestamps, you have evidence of timestamp manipulation. In this case, 12 files were identified as timestomped solely through GatherTime comparison.
  4. Anti-forensics tools do not clear Windows.edb. CCleaner, BleachBit, Eraser, SDelete, and Cipher /w do not target the Windows Search Index database. The file is locked by the operating system, requires SYSTEM-level access, and is not included in any consumer-grade cleanup tool’s scope. This makes it one of the most reliable artifacts in cases involving deliberate evidence destruction.

For Legal Professionals

  1. Preserve Windows.edb immediately when litigation is anticipated. Issue a litigation hold that specifically names the Windows Search Index database. If the system is reimaged or the database is overwritten by continued use, the content is lost permanently. The database grows continuously — new indexing operations can overwrite freed pages that contain recoverable evidence.
  2. GatherTime provides an independent timeline for IP disputes. In trade secret cases, proving when files were accessed is often as important as proving that they were accessed. GatherTime timestamps establish when the indexer crawled each file, independent of user-manipulable filesystem dates. In this case, GatherTime proved that patent files with backdated MFT timestamps had actually been created or modified in January 2024 — the week the employee accepted the competitor’s offer.
  3. Employment and IP agreements should explicitly address system databases. Standard IP assignment clauses reference ‘documents, files, and records.’ Consider whether your agreements also cover system-generated databases that contain copies of document content. The Windows Search Index is not a document the employee created — it is a system artifact that contains the content of documents the employee created. Ensure your forensic preservation scope covers these artifacts.

Mjolnir Security — Digital Forensics & Intellectual Property Investigations

Mjolnir Security provides digital forensic investigation, intellectual property theft analysis, and expert witness testimony for trade secret and patent disputes. Our DFIR team has provided forensic evidence in over 80 corporate IP theft, trade secret misappropriation, and employment dispute matters.

Digital Forensics IP Theft Investigation Expert Witness Testimony Windows Artifact Analysis ESE Database Forensics Trade Secret Litigation Support

mjolnirsecurity.com — 24/7 Incident Response Hotline: +1 833 403 5875

Written by Mjolnir Security DFIR team

Published February 2024 · DFIR Engagement Series · TLP:WHITE

Case #044 · Skuggaheimar · Mjolnir Security · All client details anonymized · TLP:WHITE