Table of Contents
1. Gabatarwa
Kalmomin sirri na rubutu har yanzu su ne babbar hanyar tabbatar da ainihi, duk da haka yanayin da mutum ya ƙirƙira su ya sa su zama masu rauni ga hare-haren da ke dogara da bayanai. Hanyoyin tsari na zamani (SOTA) da suka wanzu, gami da sarƙoƙin Markov, samfuran tushen tsari, RNNs, da GANs, suna da iyakancewa wajen ɗaukar tsarin kalmar sirri mai sarƙaƙiya, mai kama da harshe amma daban. An yi wahayi daga tsarin koyon farko-daidaitawa mai canzawa a cikin Sarrafa Harshe na Halitta (NLP), wannan takarda ta gabatar da PassTSL (tsarin kalmomin sirri na mutum ta hanyar Koyon Mataki Biyu). PassTSL yana amfani da tsarin gine-ginen transformer don fara koyon ƙirar ƙirƙirar kalmar sirri gabaɗaya daga babban bayanai iri-iri (koyon farko) sannan ya ƙware model ɗin don takamaiman mahallin da ake nufi ta amfani da ƙaramin bayanai masu dacewa (daidaitawa). Wannan hanyar tana nufin rage gibin tsakanin ci-gaban fasahar NLP da ƙalubalen musamman na tsarin kalmar sirri.
2. Hanyar Aiki: Tsarin PassTSL
Babban ƙirƙira na PassTSL shine tsarinsa na koyon mataki biyu, yana kwatanta dabarun nasara a cikin samfura kamar BERT da GPT.
2.1. Lokacin Koyon Farko
An fara horar da model ɗin akan babban tarin kalmar sirri na gabaɗaya (misali, haɗaɗɗun bayanai daga keta sirri da yawa). Manufar ita ce a koyi mahimman dogaro a matakin haruffa, tsarin musanya gama gari (misali, 'a' -> '@', 's' -> '$'), da tsarin yuwuwar da ke yaɗuwa a cikin nau'ikan kalmomin sirri daban-daban. Wannan mataki yana gina ingantaccen model na tushe na halayen ƙirƙirar kalmar sirri na mutum.
2.2. Lokacin Daidaitawa
Daga nan sai a daidaita model ɗin da aka riga aka horar da shi zuwa takamaiman bayanan kalmar sirri da ake nufi. Ta amfani da ɗan ƙaramin samfuri daga saitin da ake nufi, ana daidaita sigogin model ɗin. Takardar ta binciko dabara don zaɓin bayanan daidaitawa bisa bambancin Jensen-Shannon (JS) tsakanin rarraba koyon farko da na manufa, da nufin zaɓar mafi kyawun samfuran don daidaitawa.
2.3. Tsarin Model & Cikakkun Bayanai na Fasaha
An gina PassTSL akan tsarin transformer decoder, yana amfani da tsarin kula da kai don auna muhimmancin haruffa daban-daban a cikin jerin lokacin da ake hasashen harafi na gaba. Model ɗin yana ɗaukar kalmar sirri a matsayin jerin haruffa (alamomi). Horon ya ƙunshi manufa irin na tsarin harshe da aka rufe (MLM) yayin koyon farko, inda model ɗin ya koyi hasashen haruffan da aka rufe bazuwa a cikin jerin kalmar sirri, yana ɗaukar mahallin biyu.
3. Tsarin Gwaji & Sakamako
3.1. Bayanan Gwaji da Ma'auni
An gudanar da gwaje-gwaje akan manyan bayanan kalmar sirri guda shida na duniya da aka yi keta sirri. An kwatanta PassTSL da kayan aikin zato kalmar sirri guda biyar na SOTA, gami da samfuran tushen Markov (misali, PCFG), tushen RNN, da samfuran GAN.
3.2. Aikin Zato Kalmar Sirri
PassTSL ya fi dukkan ma'auni gaba sosai. Haɓakar yawan nasarar zato a madaidaicin batu ya kasance daga 4.11% zuwa 64.69%, yana nuna ingancin hanyar mataki biyu. Sakamakon ya nuna cewa koyon farko akan babban tarin bayanai yana ba da fa'ida mai girma fiye da samfuran da aka horar daga farko akan saitin manufa guda ɗaya.
Ribarin Aiki akan SOTA
Kewayon: 4.11% - 64.69%
Mahalli: Haɓaka a cikin yawan nasarar zato kalmar sirri a madaidaicin batu na kimantawa.
3.3. Kimanta Ma'aunin Ƙarfin Kalmar Sirri (PSM)
An aiwatar da PSM bisa ƙididdigar yuwuwar PassTSL. An kimanta shi da PSM na tushen hanyar sadarwar jijiyoyi da kuma zxcvbn na tushen ƙa'ida. Babban ma'auni shine ciniki tsakanin "kurakurai masu aminci" (ƙima ƙarfin) da "kurakurai marasa aminci" (ƙima ƙarfin). A daidai adadin kurakurai masu aminci, PSM na tushen PassTSL ya samar da ƙananan kurakurai marasa aminci, ma'ana ya fi daidaito wajen gano kalmomin sirri masu rauni na gaske.
3.4. Tasirin Zaɓin Bayanan Daidaitawa
Binciken ya gano cewa ko da ƙaramin adadin bayanan daidaitawa da aka yi niyya (misali, 0.1% na adadin bayanan koyon farko) na iya haifar da matsakaicin haɓaka sama da 3% a cikin aikin zato akan saitin manufa. An nuna dabara na zaɓin tushen bambancin JS yana da tasiri wajen zaɓar samfuran daidaitawa masu fa'ida.
4. Muhimman Fahimta & Bincike
Babban Fahimta: Babban nasarar takardar ita ce gane cewa ƙirƙirar kalmar sirri wani nau'i ne na musamman, mai ƙuntatawa na samar da harshe na halitta. Ta ɗaukar shi haka kuma ta yi amfani da kayan aikin NLP na zamani—musamman tsarin transformer da tsarin koyon mataki biyu—marubutan sun cimma canji a cikin amincin tsari. Wannan ba ƙarin haɓaka kawai ba ne; tsari ne na tsalle wanda ke sake ayyana iyakar iyakar abin da zai yiwu a cikin karyawar kalmar sirri mai yuwuwa.
Tsarin Ma'ana: Hujjar tana da sauƙi mai jan hankali: 1) Kalmomin sirri suna raba kaddarorin ƙididdiga da ma'ana tare da harshe. 2) Mafi nasarar samfuran harshe na zamani suna amfani da koyon farko akan manyan tarin rubutu sannan kuma daidaitawa na musamman ga aiki. 3) Don haka, amfani da wannan tsarin ga kalmomin sirri ya kamata ya samar da samfura mafi girma. Sakamakon gwaje-gwaje a cikin bayanai iri-iri guda shida ya tabbatar da wannan ma'ana ba tare da shakka ba, yana nuna ci gaba mai daɗe kuma sau da yawa mai ban mamaki akan samfuran tsararraki na baya kamar sarƙoƙin Markov har ma da hanyoyin jijiyoyi na farko kamar RNNs da GANs.
Ƙarfi & Kurakurai: Babban ƙarfi shine aikin da aka nuna, wanda yake da ƙarfi. Amfani da bambancin JS don zaɓin samfuran daidaitawa dabara ce mai wayo, mai amfani. Duk da haka, binciken yana da kurakurai. Ya yi watsi da ƙwaƙwalwar lissafi da bayanai na samfuran transformer. Koyon farko yana buƙatar babban tarin kalmar sirri, yana tayar da damuwa na ɗabi'a da na aiki game da tushen bayanai. Bugu da ƙari, yayin da ya fi sauran samfura, takardar ba ta bincika dalilin da yasa tsarin kula da transformer ya fi kyau ga wannan aikin fiye da, a ce, ƙwaƙwalwar ƙofar LSTM. Shin ɗaukar dogaro mai nisa ne, ko wani abu? Wannan al'amarin "akwatin baƙi" ya rage.
Fahimta Mai Aiki: Ga masu aikin tsaro, wannan binciken yana ƙara ƙararrawa. Dole ne ma'aunin ƙarfin kalmar sirri na tsaro su ci gaba fiye da tsarin ƙamus-da-ƙa'ida (kamar zxcvbn) don haɗa irin waɗannan samfuran koyon zurfi don kimanta haɗari daidai. Ga masu bincike, hanyar gaba a bayyane take: bincika ƙarin ingantattun gine-gine (misali, samfuran da aka tsarkake), bincika koyon tarayya don koyon farko ba tare da tattara bayanai masu mahimmanci ba, da kuma amfani da waɗannan samfuran ba kawai don karyewa ba amma don samar da shawarwarin manufofin kalmar sirri masu ƙarfi. Zamanin tsaron dabara mai sauƙi ya ƙare; gwagwarmayar makamai yanzu ta tabbata a cikin yankin AI.
5. Cikakkun Bayanai na Fasaha & Tsarin Lissafi
Samfurin transformer a cikin PassTSL yana amfani da tarin $N$ yadudduka iri ɗaya. Kowane Layer yana da ƙananan yadudduka guda biyu: tsarin kula da kai mai kai da yawa da cibiyar sadarwar ciyarwar gaba mai cikakken haɗin kai. Ana amfani da haɗin ragowar da daidaita Layer a kusa da kowane ƙananan Layer.
Aikin kula da kai yana taswira tambaya ($Q$), saitin maɓalli-darajo biyu ($K$, $V$) zuwa fitarwa. Ana ƙididdige fitarwa azaman jimlar ƙima mai nauyi, inda nauyin da aka ba kowane ƙima ya ƙaddara ta hanyar aikin daidaitawa na tambaya tare da maɓallin da ya dace. Don kai na kula guda ɗaya: $$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$ inda $d_k$ shine girman maɓallai.
Manufar koyon farko ta ƙunshi hasashen alamun da aka rufe. Idan aka ba da jerin kalmar sirri na shigarwa $X = (x_1, x_2, ..., x_T)$, ana maye gurbin ƙaramin saiti na alamomi tare da alamar `[MASK]` ta musamman. An horar da model ɗin don hasashen alamomin asali na waɗannan wuraren da aka rufe, yana haɓaka yuwuwar log: $$\mathcal{L}_{PT} = \sum_{i \in M} \log P(x_i | X_{\backslash M})$$ inda $M$ shine saitin wuraren da aka rufe.
Daidaitawa yana daidaita sigogin model ɗin $\theta$ akan bayanan manufa $D_{ft}$ don rage mummunan yuwuwar log na jerin: $$\mathcal{L}_{FT} = -\sum_{(X) \in D_{ft}} \log P(X | \theta)$$
6. Tsarin Bincike: Nazarin Lamari Ba tare da Lambar Ba
Yanayi: Ƙungiyar tsaro a babban kamfani na fasaha tana son tantance juriyar kalmomin sirri na ma'aikata akan harin na zamani.
- Shirya Bayanai: Ƙungiyar ta haɗa babban tarin kalmar sirri na gabaɗaya bisa doka daga tushen keta sirri na jama'a da yawa da aka ɓoye sunayensu (don koyon farko). Sun kuma sami ɗan ƙaramin samfuri na hasashin kalmar sirri na kamfaninsu (don daidaitawa), suna tabbatar da cewa babu kalmomin sirri na rubutu da aka fallasa ga masu bincike.
- Aiwatar da Model: Sun tura tsarin kamar PassTSL.
- Mataki A (Koyon Farko): Horar da model ɗin transformer na tushe akan tarin gabaɗaya. Model ɗin ya koyi tsarin duniya kamar "password123", "qwerty", da sauye-sauyen leetspeak na gama gari.
- Mataki B (Daidaitawa): Ta amfani da dabara na bambancin JS, zaɓi 0.1% na bayanan koyon farko waɗanda suka fi kama da ƙididdiga da samfurin kalmar sirri na kamfaninsu. Daidaita model ɗin da aka riga aka horar akan wannan ƙaramin saitin da aka zaɓa da aka haɗa da samfurin kamfaninsu. Wannan yana daidaita model ɗin zuwa tsarin musamman na kamfani (misali, amfani da sunayen samfuran ciki, takamaiman tsarin kwanan wata).
- Kimantawa: Model ɗin da aka daidaita yana samar da jerin zato. Ƙungiyar ta kwatanta ƙimar karyewa da tsaronsu na yanzu (misali, hashcat tare da saitin ƙa'ida na yau da kullun). Sun gano PassTSL ya karye kalmomin sirri 30% ƙarin a cikin zato na farko 10^9, yana bayyana babban rauni da hanyoyin gargajiya suka rasa.
- Aiki: Bisa fitarwar model ɗin, sun gano mafi yawan tsarin da ake zato kuma suka aiwatar da canjin manufar kalmar sirri da aka yi niyya (misali, haramta kalmomin sirri waɗanda ke ɗauke da sunan kamfanin) kuma suka ƙaddamar da yaƙin neman ilimi na mai da hankali kan masu amfani.
7. Ayyukan Gaba & Hanyoyin Bincike
- Tsaro Mai Ƙarfafawa & Tsaftar Kalmar Sirri: Ana iya haɗa samfuran PassTSL cikin musaya ƙirƙirar kalmar sirri na ainihin lokaci azaman ma'auni masu ƙarfi sosai, suna hana masu amfani zaɓar kalmomin sirri waɗanda model ɗin zai iya zato cikin sauƙi. Wannan ya wuce ƙa'idodin tsaye zuwa ƙin yuwuwa mai ƙarfi.
- Samar da Kalmar Sirri na Adawa: Juya model ɗin don samar da kalmomin sirri waɗanda suke matsakaicin rashin yuwuwa bisa ga rarraba da aka koya, yana ba da shawarar kalmomin sirri masu ƙarfi na gaske ga masu amfani, kama da yadda samfuran samarwa kamar CycleGAN suke koyon fassara tsakanin yankuna.
- Koyo na Tarayya & Kiyaye Sirri: Aikin gaba dole ne ya magance ƙalubalen sirrin bayanai. Fasaha kamar koyon tarayya, inda ake horar da model ɗin a cikin tushen bayanai marasa tsari ba tare da musayar kalmomin sirri na danye ba, ko amfani da sirri daban-daban yayin horo, suna da mahimmanci don ɗaukar ɗabi'a.
- Binciken Kalmar Sirri ta Tsarin Tsarin: Faɗaɗa tsarin don tsara kalmomin sirri masu alaƙa da sauran bayanan mai amfani (misali, sunayen masu amfani, tambayoyin tsaro) don gina ƙarin cikakkun samfuran bayanan mai amfani don hare-hare da aka yi niyya ko, akasin haka, don kimanta haɗarin abubuwa da yawa.
- Inganta Ingantacciyar Aiki: Bincika cikin tsarkake model, ƙididdigewa, da ƙarin ingantattun hanyoyin kula (misali, Linformer, Performer) don sanya waɗannan samfuran masu ƙarfi su zama masu turawa akan na'urori na gefe ko a cikin aikace-aikacen yanar gizo mara jinkiri.
8. Nassoshi
- Vaswani, A., da sauransu. (2017). Hankali Duk Abinda Kake Bukata. Ci gaba a cikin Tsarin Bayanai na Jijiyoyi 30 (NIPS 2017).
- Weir, M., da sauransu. (2009). Karyawar Kalmar Sirri ta Amfani da Nahawun Mahallin Kyauta. IEEE Taron Tsaro da Sirri.
- Melicher, W., da sauransu. (2016). Mai Sauri, Mai Sauƙi, da Daidai: Tsarin Yuwuwar Zato Kalmar Sirri ta Amfani da Hanyoyin Sadarwar Jijiyoyi. USENIX Taron Tsaro.
- Hitaj, B., da sauransu. (2019). PassGAN: Hanyar Koyon Zurfi don Zato Kalmar Sirri. Hankali Aiwatarwa.
- Wheeler, D. L. (2016). zxcvbn: Ƙimar Ƙarfin Kalmar Sirri na Ƙananan Kasafin Kuɗi. USENIX Taron Tsaro.
- Devlin, J., da sauransu. (2018). BERT: Koyon Farko na Transformers Bidirectional Zurfi don Fahimtar Harshe. arXiv preprint arXiv:1810.04805.
- Zhu, J.Y., da sauransu. (2017). Fassarar Hoto zuwa Hoto mara Haɗin gwiwa ta amfani da Cibiyoyin Adawa na Zagaye-Daidaitacce. IEEE Taron Ƙasa da Ƙasa na Kwamfuta (ICCV). (Nassoshi na CycleGAN don ra'ayi na samarwa).
- Cibiyar Ƙididdiga ta Ƙasa (NIST). (2017). Jagororin Shaidar Lambobi (SP 800-63B). (Don mahallin mahallin tabbatar da ainihi).