Select Language

Security Evaluation of Password Generation, Storage, and Autofill in Browser-Based Password Managers

A comprehensive security analysis of 13 popular password managers, evaluating password generation randomness, storage security, and autofill vulnerabilities.
computationalcoin.com | PDF Size: 1.0 MB
Rating: 4.5/5
Your Rating
You have already rated this document
PDF Document Cover - Security Evaluation of Password Generation, Storage, and Autofill in Browser-Based Password Managers

1. Introduction

Password-based authentication remains the dominant method for web authentication despite its well-documented security challenges. Users face a cognitive burden in creating and remembering strong, unique passwords, leading to password reuse and weak credential creation. Password managers promise to alleviate this burden by generating, storing, and autofilling passwords. However, their security has been questioned in prior research. This paper presents an updated, comprehensive security evaluation of thirteen popular browser-based password managers, five years after significant vulnerabilities were last reported. The study covers the entire password manager lifecycle: generation, storage, and autofill.

2. Methodology & Scope

The evaluation encompassed thirteen password managers, including five browser extensions (e.g., LastPass, 1Password), six built-in browser managers (e.g., Chrome, Firefox), and two desktop clients for comparison. The methodology involved:

  • Generating and analyzing a corpus of 147 million passwords for randomness and strength.
  • Replicating and extending prior evaluations of password storage security.
  • Testing autofill mechanisms for vulnerabilities like clickjacking and XSS.
  • Assessing default security settings and encryption practices.

3. Password Generation Analysis

This is the first comprehensive analysis of password generation algorithms in password managers.

3.1. Character Distribution & Randomness

Analysis of the 147-million-password corpus revealed several instances of non-random character distributions in generated passwords. Some managers exhibited biases in character selection, deviating from a uniform random distribution. For a truly random generator, the probability of selecting any character from a set of size $N$ should be $P(char) = \frac{1}{N}$. Deviations from this indicate algorithmic flaws.

3.2. Vulnerability to Guessing Attacks

The most critical finding was that a subset of generated passwords were vulnerable to brute-force attacks:

  • Online Guessing: Passwords shorter than 10 characters were found to be weak against online, rate-limited attacks.
  • Offline Guessing: Passwords shorter than 18 characters were susceptible to offline cracking attempts following a database breach, where an attacker can make unlimited guesses.

This contradicts the core promise of password managers to create strong passwords.

4. Password Storage Security

While improvements were noted compared to evaluations from five years prior, significant issues persist.

4.1. Encryption & Metadata Handling

Several password managers were found to store metadata in unencrypted form. This includes website URLs, usernames, and timestamps. While the password itself might be encrypted, this metadata provides a valuable map for attackers, revealing a user's online accounts and habits, which can be used for targeted phishing or social engineering attacks.

4.2. Insecure Defaults

Certain managers had insecure default settings, such as enabling autofill on all sites by default or using weaker encryption protocols. This places the security burden on users to discover and change these settings, which most do not do.

5. Autofill Mechanism Vulnerabilities

The autofill feature, designed for convenience, introduces a significant attack surface.

5.1. Clickjacking & UI Redressing

Multiple password managers were vulnerable to clickjacking attacks. An attacker could create a malicious webpage with invisible layers that trick a user into clicking on the password manager's autofill dialog, thereby revealing credentials to the attacker's site instead of the intended legitimate site.

5.2. Cross-Site Scripting (XSS) Risks

Autofill mechanisms that inject credentials into web page forms without rigorous origin checks can be exploited via XSS vulnerabilities on otherwise trusted sites. If a benign site has an XSS flaw, an injected script could trigger the password manager to fill credentials into a hidden form field controlled by the attacker.

6. Results & Comparative Analysis

Corpus Size

147M

Passwords Analyzed

Managers Tested

13

Browser & Desktop

Critical Flaw

<18 chars

Vulnerable to Offline Cracking

Key Finding: The landscape has improved since prior studies (e.g., Li et al., 2014; Silver et al., 2013), but fundamental security flaws remain across multiple vendors. No single password manager was flawless across all three evaluated stages (generation, storage, autofill). Built-in browser managers and dedicated extensions both exhibited distinct patterns of vulnerabilities.

7. Recommendations & Future Directions

The paper concludes with actionable recommendations:

  • For Users: Avoid password managers with known generation flaws or insecure autofill defaults. Prefer managers that allow granular control over autofill behavior.
  • For Developers: Implement cryptographically secure random number generators (CSPRNGs) for password generation. Encrypt all metadata. Implement robust origin checks and user consent mechanisms for autofill (e.g., requiring a click on a non-UI-redressable element).
  • For Researchers: Explore the integration of formal methods to verify autofill logic and the application of machine learning to detect anomalous autofill requests indicative of an attack.

8. Original Analysis & Expert Commentary

Core Insight: The Oesch and Ruoti study delivers a sobering reality check: the security tools we trust to consolidate our digital keys have themselves been built with alarmingly shaky foundations. Five years after major flaws were exposed, the industry's progress is incremental at best, failing to address systemic issues in all three core pillars—generation, storage, and autofill. This isn't just a bug report; it's an indictment of complacency in a critical security vertical.

Logical Flow: The paper's power lies in its holistic lifecycle approach. It correctly identifies that a chain is only as strong as its weakest link. Finding non-randomness in generation ($P(char) \neq \frac{1}{N}$) fundamentally undermines the entire premise before storage or autofill are even considered. The replication of past storage/autofill tests then shows a pattern: while superficial vulnerabilities may be patched, architectural flaws (like unencrypted metadata or promiscuous autofill) persist. This logical progression from flawed creation to insecure handling to risky deployment paints a complete and damning picture.

Strengths & Flaws: The study's primary strength is its massive, data-driven approach to password generation—a first in the literature. The 147-million-password corpus provides irrefutable statistical evidence of algorithmic weakness, moving beyond theoretical concerns. However, the analysis has a blind spot: it largely treats password managers as isolated clients. The modern reality is cloud-sync and mobile apps. As noted in the IEEE Symposium on Security and Privacy proceedings on cloud security models, the threat surface extends to sync protocols, server-side APIs, and mobile OS integration, which this study does not evaluate. Furthermore, while it mentions "insecure defaults," it doesn't quantify the user adoption rate of secure settings—a critical factor in real-world risk, as usability studies from the USENIX SOUPS conference consistently show most users never change defaults.

Actionable Insights: For enterprise security teams, this research mandates a shift from blanket recommendations of "use a password manager" to vendor-specific, configuration-specific guidance. Managers with weak generators must be blacklisted. Procurement checklists must now include verification of CSPRNG use and metadata encryption. For developers, the path forward is clear: adopt a "zero-trust" principle for autofill, requiring explicit, context-aware user consent for every fill action, similar to the permission models advocated by the World Wide Web Consortium (W3C) for powerful web APIs. The future lies not in trying to perfectly secure an overly permissive autofill, but in designing a minimally permissive, user-controlled one. The industry's failure to self-correct over five years suggests regulatory or standards-body intervention (e.g., by NIST or FIDO Alliance) may be necessary to enforce baseline security requirements for these guardians of our digital identity.

9. Technical Details & Experimental Results

Password Generation Analysis: The entropy $H$ of a generated password of length $L$ from a character set $C$ is ideally $H = L \cdot \log_2(|C|)$ bits. The study found instances where the effective entropy was lower due to biased character selection. For example, if a generator intended to use a 94-character set but certain characters appeared with probability $p \ll \frac{1}{94}$, the actual entropy is reduced: $H_{actual} = -\sum_{i=1}^{94} p_i \log_2(p_i)$ per character, where $\sum p_i = 1$.

Experimental Chart Description: A key chart in the study would plot the cumulative fraction of passwords cracked against the number of guess attempts (log scale) for generated passwords of different lengths (e.g., 8, 12, 16 chars). The curve for sub-10-character passwords would show a steep rise, indicating rapid compromise under online attack simulations (e.g., 1000 guesses). The curve for sub-18-character passwords would show a significant fraction cracked after $10^{10}$ to $10^{12}$ offline guesses, placing them within the capability of determined attackers with modern hardware, as benchmarked by tools like Hashcat and rainbow tables.

10. Analysis Framework & Case Study

Framework for Evaluating Password Manager Security:

  1. Generation Integrity: Statistically test output for randomness (e.g., NIST STS, Dieharder tests) and calculate effective entropy. Verify minimum length defaults align with current NIST guidelines (>= 12 chars).
  2. Storage Security: Inspect local storage (e.g., browser IndexedDB, SQLite files) and network traffic for encrypted vs. plaintext data. Audit the encryption library and key derivation function (e.g., is it using PBKDF2 with sufficient iterations, or Argon2?).
  3. Autofill Security Posture: Map the autofill trigger mechanism. Test for UI redressing by creating overlapping iframes. Test origin matching logic by deploying sites with similar domain names (e.g., `example.com` vs. `example.com.evil.net`). Check if autofill requires a user gesture on a non-predictable page element.

Case Study - Clickjacking Vulnerability: Consider Manager X, which shows an autofill button over a login form. An attacker creates a malicious page with an invisible iframe loading `bank.com`. The iframe is positioned so the Manager X autofill button appears over a hidden "submit-to-attacker" button on the malicious page. The user clicks to autofill, but instead clicks the attacker's button, sending `bank.com` credentials to the attacker's server. This demonstrates a failure in the manager's click-event binding and origin validation.

11. Future Applications & Research Outlook

The findings open several avenues for future work:

  • Hardware-Backed Generation & Storage: Integration with Trusted Platform Modules (TPMs) or Secure Enclaves (e.g., Apple's Secure Element) for generating random seeds and storing encryption keys, moving secrets out of purely software realms.
  • Context-Aware, Risk-Based Autofill: Leveraging machine learning to analyze page context (DOM structure, certificate details, site reputation) to assess autofill risk. A high-risk context could require additional authentication (biometric) or block autofill entirely.
  • Standardized Security APIs: Development of a browser-standardized, permissioned API for password managers (e.g., a successor to the `chrome.loginState` API) that provides secure, sandboxed access to credentials with clear user consent prompts, reducing the attack surface from arbitrary DOM injection.
  • Post-Quantum Cryptography Preparedness: Research into migrating password manager encryption to algorithms resistant to quantum computer attacks, as the encrypted vault is a long-lived asset highly attractive to harvest-now-decrypt-later adversaries.
  • Decentralized & Self-Custody Models: Exploring the use of decentralized identity protocols (e.g., based on W3C Verifiable Credentials) to reduce reliance on a central vault, distributing risk and giving users greater control.

12. References

  1. Oesch, S., & Ruoti, S. (2020). That Was Then, This Is Now: A Security Evaluation of Password Generation, Storage, and Autofill in Browser-Based Password Managers. USENIX Security Symposium.
  2. Li, Z., He, W., Akhawe, D., & Song, D. (2014). The Emperor's New Password Manager: Security Analysis of Web-based Password Managers. IEEE Symposium on Security and Privacy (SP).
  3. Silver, D., Jana, S., Boneh, D., Chen, E., & Jackson, C. (2013). Password Managers: Attacks and Defenses. USENIX Security Symposium.
  4. National Institute of Standards and Technology (NIST). (2017). Digital Identity Guidelines (SP 800-63B).
  5. Stock, B., & Johns, M. (2013). Protecting the Intranet Against "JavaScript Malware" and Related Attacks. International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA).
  6. Herley, C. (2009). So Long, And No Thanks for the Externalities: The Rational Rejection of Security Advice by Users. Proceedings of the New Security Paradigms Workshop (NSPW).
  7. World Wide Web Consortium (W3C). (2021). Permissions Policy. https://www.w3.org/TR/permissions-policy-1/
  8. FIDO Alliance. (2022). FIDO2: WebAuthn & CTAP. https://fidoalliance.org/fido2/